Is it possible to align scaffolded LLMs with human values? | Manifold

Is it possible to align scaffolded LLMs with human values?

5

110Ṁ71

2100

70%

chance

1H

6H

1D

1W

1M

ALL

Get

1,000

to start trading!

Sort by:

@TomDAVID How might you judge this?

People are also trading

are LLMs easy to align because unsupervised learning imbues them with an ontology where human values are easy to express

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Will "LLMs for Alignment Research: a safety priority?" make the top fifty posts in LessWrong's 2024 Annual Review?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?

EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will LLMs be the best reasoning models on these dates?

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?

Related questions

are LLMs easy to align because unsupervised learning imbues them with an ontology where human values are easy to express

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Will "LLMs for Alignment Research: a safety priority?" make the top fifty posts in LessWrong's 2024 Annual Review?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Will relaxed adversarial training be used in practice for LLM alignment or auditing before 2028?

EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will LLMs be the best reasoning models on these dates?

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?

© Manifold Markets, Inc.•Terms•Privacy