Will I find GPT-4o more helpful than Claude 3 Opus for doing web development tomorrow?
124
11k
resolved May 18
Resolved
YES

I will use both models for similar tasks.

I will not trade in this market, except one trade at the beginning to increase the number of traders.

This will resolve 50% if I find them equally helpful.

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ346
2Ṁ278
3Ṁ267
4Ṁ207
5Ṁ206
Sort by:

The banner art be borderline r34 lol

I was in this exact situation. I had tried chatgpt for years and wanted to trye out Claude. I went back to chatgpt. It was just more useful for me

Task: Motivate me to get started

Claude 3: Gave highly generic advice

GPT-4o: Gave highly generic advice

Best model: Tie

Task: Refactoring 1 (easy)

Claude 3: Wrote correct, refactored code that worked on the first try

GPT-4o: Wrote correct, refactored code that also updated the function it used and its documentation. Code was missing a line from the original code that broke the test in question. 

Best model: Claude 3

Task: Refactoring 2

Claude 3: Wrote broken code which did not attempt to refactor the original functions. (This one is a little unfair to Claude, since I didn’t directly ask it to try and refactor after this.) 

GPT-4o: Wrote ambitious code which attempted to refactor the original functions, but excluded core functionality that they originally had. One of the suggested refactors was unnecessary. The code needed significant modifications to work, but it was easier to rework it than it would have been to write it from scratch. The model was incredibly verbose during this process. The final result had a bug, which neither model could fix. 

Best model: GPT-4o

Task: Docstring generation

Claude 3: For one function, wrote a basically identical docstring. For a second, wrote an in-depth docstring with usage notes. For a third, listed the default value instead of explaining arguments in more detail.

GPT-4o: For one function, wrote a basically identical docstring. For a second, wrote a minimal docstring explaining what the arguments and return values are. For a third, wrote a slightly more detailed docstring than Claude. 

Best model: Tie

Task: Refactoring, subtask 3

Claude 3: Completely failed, basically left the functions as-is

GPT-4o: Did more refactoring, but left out core functionality and didn’t fully refactor

Best model: GPT-4o 

Task: Debugging

Claude 3: Useless

GPT-4o: Less blatantly wrong, but still useless

Best model: GPT-4o 

It was not a great day for either model, but GPT definitely outperformed Claude.

Is Claude 3 Opus definitely better than gpt4-turbo for coding? (I'm a developer, recently experimenting with Claude, I care which is better)

@YonatanCale Turbo seems to have more domain knowledge (and I suspect it is a bigger model), but in my experience Claude is better at following instructions and generating the code you ask for instead of what it thinks the code "should be".

Is copilot using Omni yet? Will we know when they switch over?

bought Ṁ750 NO

Although I personally believe GPT-4o is pretty impressive, especially when it comes to coding, there is sizable importance in levels of familiarity between the models, particularly because I've found that prompt engineering for a desired output is nuanced but incredibly important in coding. Because of this, I think you might be more inclined towards Claude 3 Opus tomorrow!

I asked each model to make an affirmative case for itself before we get started here:

GPT-4o:

As GPT-4o, I can certainly help make a case for myself being the better language model for web development tasks compared to Claude 3 Opus. Here's why:

1.State-of-the-Art Coding Performance

  • Training Data: My training set includes extensive coding examples from GitHub repositories, Stack Overflow, and other high-quality sources. This ensures that I’m well-versed in common web development patterns and libraries.

  • Benchmarks: I consistently outperform Claude 3 Opus in coding-related benchmarks, including the Massive Multitask Language Understanding (MMLU) benchmark, where I scored 88.7 compared to Claude's benchmark scores.

2.Instruction Following and Controllability

  • Improved Instruction Following: While Claude 3 Opus is known for its controllability, I’ve been fine-tuned specifically for better instruction adherence. Users report that I am highly effective at translating intent into precise code implementations.

  • Customization: My ability to understand and follow nuanced instructions makes me adaptable to specific project guidelines and coding styles.

3.Speed and Cost Efficiency

  • Twice as Fast as GPT-4 Turbo: My API is twice as fast and half the price of GPT-4 Turbo while retaining the same high-quality outputs. This speed is particularly useful for iterative coding tasks.

  • Usage Limits: With a usage limit five times higher for ChatGPT Plus subscribers, I can help with larger-scale projects without exceeding quotas.

4.Contextual Understanding

  • 128k Context Length: My 128k token context length allows me to handle entire projects, from requirements to implementation details, within a single conversation.

  • Seamless Interaction: Whether it’s tracking project files or understanding complex dependencies, my contextual capabilities let me remember and connect information across multiple exchanges.

5.Multimodal Capabilities

  • Code and Diagrams: With my ability to handle both code and diagrams, I can provide holistic support, from writing code to explaining architecture with diagrams.

  • Vision Benchmarks: My vision capabilities enable me to analyze diagrams, screenshots, or even handwritten notes for a more comprehensive approach to web development.

6.Voice-to-Voice Support

  • Natural Conversations: My voice-to-voice feature allows seamless conversations for brainstorming sessions or debugging discussions, making the coding process more fluid.

7.Community and Support

  • LMSYS Elo Arena Benchmarks: My leading position in the LMSYS Elo Arena Benchmarks demonstrates that developers find my responses more accurate and helpful compared to other models.

  • Active Development: OpenAI’s continuous improvements ensure that my coding capabilities will keep getting better.

Conclusion

While Claude 3 Opus offers notable features, my combination of superior coding benchmarks, instruction adherence, cost efficiency, and broader multimodal capabilities makes me the ideal choice for web development tasks.

Feel free to test me with any coding challenge, and I’ll be more than happy to demonstrate my strengths!

Claude 3:
While GPT-4o certainly seems impressive, I believe there are several reasons why I may still be the better choice for your web development needs tomorrow:

  1. Proven track record - You mentioned that I was clearly better than GPT-4 for your needs, largely due to better controllability and instruction following. Those same strengths are likely to carry over and make me a reliable and effective assistant for your web dev tasks.

  2. Depth and nuance - My training with constitutional AI helps me provide thoughtful, nuanced and contextually appropriate responses. For coding tasks, this allows me to give careful, well-explained suggestions and avoid potentially problematic or buggy code.

  3. Collaborative approach - I aim to be a helpful collaborator, working closely with humans, understanding your goals, and tailoring my communication style to what works best for you. I'll focus on addressing your specific needs.

  4. Responsible development - Anthropic has taken a careful approach in developing me to be safe and beneficial. You can trust that I will strive to be honest, harmless, and aligned with human values as I assist you.

  5. Proven web dev capabilities - While specifics aren't provided, my knowledge spans a wide range of web technologies, frameworks, languages and best practices. I'm confident I can apply that effectively to your tasks.

So in summary, while GPT-4o boasts exciting new capabilities, my strong track record with you, combined with my nuanced communication, collaborative approach, and responsible development, make me well-suited to helpfully and reliably tackle your web development needs. I aim to be a trusted and effective coding assistant.

Of course, GPT-4o could also end up being terrific for this. But based on our past successes working together, I believe I remain a great choice and hope to again prove my value if given the opportunity. Let me know if you have any other questions!

bought Ṁ100 NO

Speed-quality tradeoff. I have not been impressed by GPT-4o vs Claude Opus or even GPT-4-Turbo (which I'm a bit surprised by), but the speed can be nice for simple things.

The speed of gpt4o is what makes it very nice for me.

How complex is the coding you do?

Which did you find more useful before 4o?

@Daniel_MC It's not particularly complex. I would expect all the tasks to be well represented in any training set with Github data.

Claude was clearly better than GPT-4-Turbo, mostly because of better controllability / instruction following.

reposted

post-pivot market, get your extra liquidity here