Will Google have the best LLM by EOY 2023?

1.2k

5.5kṀ410k

resolved Jan 1

Resolved

ALL

As with my other related questions, by default will judge based on the leaderboard here, based on Elo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Chatbot Arena Leaderboard - a Hugging Face Space by lmsys

Discover amazing ML apps made by the community

If Google deplolys a new model in 2023 that might or might not qualify, but it is not yet ranked on the leaderboard at year's end due to time required for evaluation, I will hold off on resolving until that has happened until a maximum of February 1.

If Google releases a model that the public, or least those who have signed up for its early testing programs, cannot access by the deadline, that does not count - I will use my ability to access it absent any special treatment as a proxy here, or if I get special treatment I will ask others.

As with other questions, I reserve the right to correct what I see as an egregious error in either direction, either by twitter poll or outright fiat, including if the model is effectively available but does not appear on the leaderboard for logistical reasons.

(Same clarification as the related market: If Google does take the top spot or becomes clearly best, this resolves to YES on the spot, this is by EOY not 'at' EOY.)

Technology

LLMs

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ3,942
2		Ṁ2,145
3		Ṁ1,882
4		Ṁ1,742
5		Ṁ1,664

People are also trading

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Who will have the best LLM at the end of 2025 (as decided by ChatBot Arena)?

Will China have the best open LLM at EOY?

93% chance

Will China have the best LLM by the end of 2025?

10% chance

What will be true of OpenAI's best LLM by EOY 2025?

What will be true of Anthropic's best LLM by EOY 2025?

Will Google cancel an LLM-based product by end of 2025?

23% chance

Will we get a new LLM paradigm by EOY?

18% chance

Which company's model will capture the largest share of the enterprise LLM market by EOY 2025?

Will Apple release its own LLM on par with state of the art LLMs before 2026?

Sort by:

predictedNO

For next year ☺️

predictedNO

Related question:

@ZviMowshowitz Regarding deployment, does a "trusted tester program" count as "those who have signed up for its early testing programs"? Or does the testing program have to be more public?

predictedNO

Oh snap. Per today's technical report the ultra version of the mod is clearly an advance over gpt-4. Native multimodal, better performance on almost all benchmarks except 2. (On hellaswag Google found a way to cheat that possibly happened with gpt-4).

So the YES are right in spirit....but the Ultra variant is not out until next year. Damn close one.

predictedYES

@GeraldMonroe "If Google deplolys a new model in 2023 that might or might not qualify, but it is not yet ranked on the leaderboard at year's end due to time required for evaluation, I will hold off on resolving until that has happened until a maximum of February 1."

I guess the word "deploy" is unclear but it seems like Google has ultra in 2023 and based on the title so long as it ranks higher than other models this should resolve to Yes.

predictedNO

@Ap I read this to mean that if the model is available to non-Googlers for evaluation in 2023, but there isn't actually time to complete an evaluation by the end of 2023, then it could still count as the best model as long as the eval is done by Feb 1.

predictedYES

@zQ4Z82W I think both are reasonable interpretations. @ZviMowshowitz could you clarify?

These markets are so much fun, even when you don't know that much about the issue. You can profit by just setting reasonable limit orders in expectation of wild moves up and down.

@NicoDelon How do you profit with limit orders? I haven’t done my homework and don’t really get them.

predictedYES

@JaimeSantaCruz The FUH 100k market went between 42 and 52 for quite a while yes? I guess if you bought no at 50 and yes at 45, you would profit?? 🤷 I'm also waiting to find out how using these works, I've only done a couple test limit orders

predictedYES

@VAPOR Same here, I guess I’ll have to study a little.

@VAPOR

I guess if you bought no at 50 and yes at 45, you would profit??

Yes, you'd gain a 10% profit.

The price of a yes share is the probability * 1 mana. The price of a no share is (1 - the probability) * 1 mana. This means that the price of a yes share plus the price of a no share always equals 1 mana.

You can make a profit by buying yes shares when the probability is low and selling yes shares when the probability is high. Likewise, you can make a profit by buying no shares when the probability is high and selling no shares when the probability is low. If you currently hold yes shares then buying X no shares is effectively the same as selling X yes shares. Likewise, if you currently hold no shares then buying X yes shares is effectively the same as selling X no shares.

A limit order is a way to tell the system to automatically buy shares if the probability reaches or exceeds a certain point. For example, you could set it up to buy M mana's worth of yes shares whenever the probability is equal to or less than P. Likewise, you could set it up to buy M mana's worth of no shares whenever the probability is equal to or greater than P. If a market is fluctuating a lot then you can set limit orders to buy yes shares at a lower probability and buy no shares at a higher probability and profit from the difference.

predictedYES

@HankyUSA one question, if I sell a bunch of shares normally, I bought low, it went up, to sell, I lose money. Was this entirely because it went up?

If I accidentally buy yes instead of no, then immediately sell to bet the other way(P is similar??), I lose money. Was that solely my price move from my bet, or is there some minimum charge for betting, if so, are limit orders equally affected.

predictedNO

@VAPOR Bots like acceleration immediately pounce on big probability swings, which is brutal in accidental cases like that. (And if you then panic sell everything you bought, you can get screwed a second time when it swings back further than it started).

In the simple case where no one bets after you, you can sell what you just bought without losing mana.

predictedNO

@VAPOR

one question, if I sell a bunch of shares normally, I bought low, it went up, to sell, I lose money. Was this entirely because it went up?

I can't tell what you're describing. If you bought yes shares at a low probability and sold them at a higher probability then you should have gained a profit. If you bought no shares at a low probability and sold them at a higher probability, then you should have suffered a loss.

Perhaps Manifold's loan system caused you some confusion. Every day Manifold automatically loans you some mana with no interest. I don't know the formula it uses to calculate how much to loan you, but I think it is some percent of the difference between the value of your holdings and the magnitude of your outstanding loans ((HOLDINGS - DEBTS) * X%). If the formula produces a positive value, then that much mana is added to your balance as a loan. If the formula produces a negative value, then that much mana is subtracted from your balance as a loan repayment. Keep in mind that the value of your holdings can change without any action on your part because it's calculated based on the market value of the shares you hold.

When you sell shares, some of the revenue from the sale goes towards repaying your loans. I don't know how much. Perhaps you bought some shares and sold them after they increased in value, but some of the revenue from the sale went to repay your debts, so you didn't receive as much mana as you initially spent, and you interpreted that as a loss, when in reality you had effectively already received some of the mana for the sale in the form of loans that you repaid at the time of sale.

If I accidentally buy yes instead of no, then immediately sell to bet the other way(P is similar??), I lose money. Was that solely my price move from my bet, or is there some minimum charge for betting, if so, are limit orders equally affected.

There are no fees for betting. By the way, there are fees to create questions, but question creators get bonuses based on the number of traders that participate in their questions. These bonuses come from Manifold effectively printing mana, not from the traders themselves.

I think if you buy shares and immediately sell them before anyone else buys or sells shares, then no loss or profit should result. This can be difficult to do because bots might react to your purchase before you can reverse it. Also if you buy shares against someone's limit order, then they'll automatically buy opposing shares before you can sell yours resulting in a loss for you. Everything I just said also applies to selling and then immediately rebuying shares.

Note that calculations happen at a level of precision not displayed in the user interface.

predictedYES

@HankyUSA thanks both, answered a lot, if I buy ten yes accidentally, don't sell, probably be bought by a bot, buy 20 no?

What happens if you buy yes and no, I don't think you have shares in both... So?

predictedNO

@VAPOR there's always some risk in bounding limit orders. If the price breaks through one side of the limit order when you are holding a large amount, and never crosses back, you can lose more than you intended. They only work when there is reasonable volatility on a market that you don't expect sudden movement (say, something describing an event a decent way into the future that can't drop or soar much on a single piece of information due to the nature of the market)

predictedYES

I meant as long as a bot doesn't buy anything, I buy 10 yes in error, I buy 20 no instead, thus I own 10 shares no now, with no loss. I'm still establishing that in my mind

predictedNO

@VAPOR

if I buy ten yes accidentally, don't sell, probably be bought by a bot, buy 20 no?
What happens if you buy yes and no, I don't think you have shares in both... So?

That's correct; you cannot hold both YES shares and NO shares at the same time. The system doesn't support doing so because there's no rational reason for a trader to do that. The prices of both YES shares and NO shares are tied to the probability in a way that ensures they always sum to 1 mana. A trader would always prefer to have the 1 mana than to have the 2 opposing shares.

Proof that 1 YES + 1 NO = 1 mana:

1 YES = PROB * 1 mana
1 NO = (1 - PROB) * 1 mana
1 YES + 1 NO = PROB * 1 mana + (1 - PROB) * 1 mana = (PROB + 1 - PROB) * 1 mana = 1 mana

If you currently hold a YES share then buying a NO share is effectively the same as selling your YES share. Likewise, if you currently hold a NO share then buying a YES share is effectively the same as selling your NO share.

Let's consider your example. You buy 10 YES shares. While holding 10 YES shares, you attempt to buy 20 NO shares. What actually happens is you implicitly sell your 10 YES shares and buy only 10 NO shares. If nobody else affects the price between you buying 10 YES shares and you attempting to buy 20 NO shares, then it will be like you only ever bought 10 NO shares.

Note that when you buy shares you have to enter the amount of mana to spend, not the number of shares to buy. When you explicitly sell shares you enter the number of shares to sell, not an amount of mana.

At the end of the day, the intended way for you to profit in a prediction market is to predict the future better than others. Obviously everyone else is trying to predict the future better than you, so it isn't easy. If the market probability is lower than what you think is the true likelihood, then buy YES shares. If the market probability is higher than what you think is the true likelihood, then buy NO shares. Put your mana where your mouth is, but don't put all your eggs in one basket. Even when you lose mana, you can still learn lessons. Just don't bet more than you're willing to lose.

predictedNO

Gemini Pro is described as "better than GPT3.5", and 30 seconds with it confirms it is not going to beat GPT-4. Given that Gemini Ultra is described as coming out "early next year", this is a pretty safe "no" bet.

@LoganZoellner just a single prompt was enough to me. It didn't even allowed me to prompt something Claude takes easily

Arb?

predictedYES

That's not a rhetorical question, actually asking. Is this arb?

Same question here, this one is 3 months after release. Is it arb?

predictedNO

@Joshua really depends on if those markets will count the Pro version as the "release" or if they only say the ultra version counts. Sounds like ultra version could be on par with GPT-4 but isn't going to be released for a bit

@Joshua Elo isn't exactly the same as other benchmarks. But I have no real idea how the Elo "battles" choose a winner.

@Joshua I think the case for YES and the case for NO are essentially the same in these two markets, so in theory they should be resolved the same way. That being said, I wouldn't count on the market creators to sync up like that.