Will research in getting closer to unlimited-context windows in LLMs be released by Feb, 11, 2024?
72
684
1K
resolved Feb 14
Resolved
NO

Important: The criteria is not based on waiting for it to actually happen but whether it is provability true or false to be possible by end of Feb, 3 2024. To be clear provable means a primary source followed by a rational argument for or against. By end of Feb 3, 2024 all arguments and primary sources in comments will be discussed and the final decision will depend on how each of the arguments tend with each other and how people respond to my final post of the overview of the arguments in total weight if not clear from the source or argument itself. This will resolve YES/NO based on YES/NO of my final post of the overview of the arguments. If nobody proves it YES/NO with a rational argument and a primary source in the comments, this resolves N/A. I'm looking for an in-depth betting pool, where my dorks at.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ255
2Ṁ73
3Ṁ57
4Ṁ42
5Ṁ31
Sort by:
reposted

The closing time of the market has been extended due to me not being active enough. Dealing with family emergency so will get back as soon as possible. Let me know if that means I should resolve N/A for the time being. Increased the timing by a week and i apologize for any inconvenience. DM me if you want to be reimbursed or other concerns.

@ThePhilosopher You are too kind to find the time to explain. From my perspective family takes precedence. Best wishes.

predicted NO

@duck_master I checked out the paper and I have to say it looks like a nothingburger. The researchers did not evaluate their technique on any well-known long text benchmark or versus any proper long context method (I mean, I'm sure they did, they just decided not to publish because the results were'nt good). The only evaluations in the paper are

1) using perprlexity as a benchmark score, which is very wrong

2) Showing some very marginal improvements on regular context length tasks (why?)

3) Showing that the technique indeed allows for very long context length (no shit sherlock) and doesn't lose accuracy, but only in cases where actually maintaining long-range context is not needed. Also not comparing to other simple techniques that can achieve the same outcome there.

There are many advantages to the AI field abandoning its reliance of peer review, but also this shit happens if you don't have peer review.

Shall be replying to all comments in a few days as activity is picking up and there is enough to think about to explore research with. Remember it has to have happened in the current month and I don’t assume anything about what it means for unlimited context windows to exist although that can become its own swarm of sub-markets for 1-month intervals from the insights hinted at here. Here is mostly just about whether progress is being made in unlimited context-windows this month, this a 1 month tracker that will give rise to new ones based on what we find out here and that anyone can reference back to.

You can already use a bunch of techniques to feed an LLM input of infinite context. The only problem is that it's not very good at actually being able to understand all that input, as you make all kinds of compromises to performance to achieve this extended input. Here's the paper went the furthest in achieving very long context length while retaining performance, imo. https://arxiv.org/abs/2206.05852

It does have inifinite context length but once again, that's a meaningless thing to have.

What would be the benefits of large contexts? From what I use, the larger context models are somewhat worse than the smaller context but better trained models

What do you mean by unlimited context window? By definition, if it can implement a recurrent model, it has infinite context. Vanilla attention can’t do that, but with something as simple as Landmark attention or any kind of periodic pooling token, voila it’s infinite context.

Now, obviously people don’t consider this infinite since the hidden state you’re propagating is losing information gradually. It’s trivial to prove what kinda of model can implement recurrence, what people really care about is performance.

And if you’re talking about unlimited context with zero performance loss, you’re just never gonna beat vanilla attention (which can’t propagate information for more than n_layers*window tokens)

IMO the resolution to this market is already “trivially yes” or “trivially no” depending on how rigorous your definition of infinite context and “LLM” is.

EDIT:

Trivially yes: “an rnn provably has infinite context”

Trivially no: “you can provably not propagate information infinitely without performance loss”

flash attention is haaaaard to beat. O(N) memory, O(N^2) computation but it's all fused multiply adds from hot cache- are we confident that this is the bottleneck for state of the art long context models?

@HastingsGreer I'm working in a non-language transformer context,, but at least there 100,000 token attention layers are not the bottleneck

  1. StreamingLLM and Parallel Context Windows (PCW): Researchers from MIT, Meta AI, and Carnegie Mellon have developed StreamingLLM, an efficient framework enabling infinite-length language modeling in LLMs​​​​. Similarly, the Parallel Context Windows (PCW) method, introduced in a recent paper, alleviates the context window restriction for any off-the-shelf LLM without additional training​​. Both methods represent significant steps towards handling longer text sequences.

  2. Other Developments: Other companies and research groups are also focusing on this area. For instance, Anthropic expanded their context window from 8k to 100k tokens, and OpenAI increased GPT-3.5-Turbo's context window from 4k to 16k tokens​​. Microsoft Research announced a transformer-based LLM with a context window of 1 billion tokens, indicating an industry-wide push towards longer context windows​​.

  3. Challenges and Performance Issues: Despite these advances, there are debates about whether simply extending the context window or improving AI memory is more beneficial​​. Furthermore, studies have shown that models with larger context windows do not necessarily perform better than those with smaller windows, suggesting that more research is needed to optimize performance with longer contexts​​.

bought Ṁ30 of NO

@LUCATheory The StreamingLLM paper does not give unlimited (or even large) context windows. Giving LLMs unlimited context windows in the traditional sense means that the models would have the inherent ability to consider and process all parts of an extremely long text (like a whole book) at once when making decisions or generating responses. What StreamingLLM does is more technical: it's to do with efficiently managing and moving the focus of attention through the text, rather than increasing the size of the attention window itself.

Practically speaking, I expect an LLM enhanced with StreamingLLM that has been inputted a 3000 page document would not be able to answer a question whose answer is found at page 1500.

I have not read the PCW paper but I imagine it's the same type of result.

@chatterchatty Love this!! I think this will count if 1) it came out between when this market started and when it will end and 2) it goes beyond current methods prior to this year. I’ll take a deep dive and see right now for its own sake cause this sounds awesome but also in terms of above.

predicted NO

@ThePhilosopher To be clear: my point is it doesn't result in an unlimited, nor even a large context window. It will (probably) not be able to read, eg, a very long novel, and tell you on what page a certain thing happens.

Mamba isn't already this? It has no inherent superlinear cost scaling in length and apparently generalizes pretty well.

bought Ṁ100 of NO

@osmarks I don't think there's empirical evidence about Mamba yet, but it's one thing to look out for.

Comment hidden
reposted
Comment hidden

More related questions