Dan Fu, a Stanford University PhD candidate, is with us today. In the conversation with Dan, we go through the drawbacks of state space models for language modelling as well as the quest for substitute building elements that can aid lengthen context without being computationally impractical. Dan takes us through the H3 architecture and the Flash Attention approach, which can lessen a model’s memory footprint and make fine-tuning possible. We also discuss his work on enhancing language models with synthetic languages, the problem of long sequence length affecting both training and inference in models, and the quest for something sub-quadratic to perform language processing more efficiently than the brute force method of attention.
State space models for language modelling
Previous article
Next article
Youtube @blockgeni
Israel’s use of AI offers a terrifying glimpse at where warfare could be headed
05:55
The AI Revolution Is destroying Thousands of Languages
06:40
CEO of Ripple predicts crypto market reaching 5 Trillion this year
02:01
Microsoft and OpenAI are Planning a $100 billion Supercomputer
03:37
Crypto Options preferred by Goldman’s Hedge Fund Clients
03:20
New Guidelines on Government Use of AI
02:43
Are College AI Degree Programs Really Worth it ?
03:12
Sam Altman Wants Trillion Dollars to Transform the Chip and AI Business
04:31
China's 1st AI Child
01:37
How Blockchain is changing the Gaming industry
04:54