Resolves according to when I get an individual AI to do the majority (>5) of the below tasks. I will not put much effort into prompt engineering/elicitation; the spirit of this question is to get at when the below will be easy to do.
All answers after the year in question will resolve to YES.
I may trade on this market, but commit to having <50 mana invested in this market at any time by cost-to-me (I would like to record my beliefs, and am not particularly concerned with profit other than that). If requested, I will post the AI interactions which I used to resolve the question insofar as possible.
1. Pytorch (or Future DL Framework) Code Implementation
- Reliably implement Pytorch code upon request without user intervention, though scaffolding in the user interface is acceptable. At least 75% of the code must be ready-to-go.
- Implement (straightforward, but uncommon) code when given a library's documentation in context
2. Writing Improvement
- Anticipate and address critiques peers have, with at least 50% being useful.
- Propose clarifications that are usually accepted as changes by me.
3. Bullet Points to Text
- Write comprehensive content (blog post style) based on provided bullet points (50% success rate).
4. Learning Assistant
- Answer questions at the level a PhD-student TA would.
- Propose useful Anki cards, with the majority being accepted.
- Enhance Anki interactivity by rephrasing cards to cover the same topic slightly differently, aiming for a 90% acceptance rate.
5. Therapy Alternative
- Provide therapy consultations that are more useful than those from my current therapist.
6. Auto-Podcast Creation
- Distill economic and philosophy papers into podcast format for easy listening. Success is required 5 times, with the number of attempts depending on my willingness, which in turn depends on the LLM's quality.
7. Peer Review
- Offer feedback that surpasses the utility of critiques typically given by PhD students, especially in identifying missing experimental evidence or motivation, and possible graphic improvements.
8. Personal Activity Suggestions
- Suggest new activities (e.g., meditation techniques, exercise routines, movies) on a monthly basis that I actually engage in.
- Provide reliable book recommendations based on personal descriptions and additional information gathered, aiming for a higher success rate than other services.
9. Brainstorming/Debate Partner
- Serve as the go-to option for brainstorming research ideas, composing cover letters, iterating on ideas/arguments etc..
10. Creative Writing Generator
- Generate complete short stories from an outline or the first page, with the quality being high enough for me to willingly read the whole thing.
The sub-points are somewhat flexible, I'll try to stick with them but if it becomes clear that a model is very useful (for me) at the 'headline' task, I'll consider that sufficient.
@OlegEterevsky Hard to say you can take a look at my bets for my opinion. iterative in the sense that I could imagine a scaled up gpt with similar training qualifying, sure.
Numbers 1 code gen and 5 therapy are the only two where I currently find gpt useful out of this list. Gpt4 does not meet the bar for competency as described in this question even in those domains though.