A Massive Public List of Large Language Model Outputs Will Be Created and Posted on Github or Equivalent by End of Q1 2023
6
45
110
resolved Apr 3
Resolved
YES

For example, a list of 1000 or more essays or 1000 or more code samples generated by ChatGPT/GPT3.5. Does not have to be from ChatGPT, it's just that this is the app leveraging an LLM with the most hype at the moment.


202303120501PMCST - Criteria updated based upon discussion below. https://manifold.markets/PatrickDelaney/a-massive-public-list-of-large-lang#qdFx0ZFpCDDQ9ie0SPO3

  • "each essay must have a minimum 1000 characters, ideally 1500 each to have a safety factor."

202303210433PMCST - Further criteria on this market, to make sure we're doing this correctly...the list must be posted here as a comment prior to 11:59PM CST on 31 March 2023. Buying more NO to subsidize the market further.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ229
2Ṁ10
Sort by:
predicted NO
predicted NO

Created a new market on this, raising the bar to wordcount rather than char count, and timeline set at end of 2023. GPT4 or equivalent. https://manifold.markets/PatrickDelaney/a-massive-public-list-of-gpt4-equiv

predicted NO

All right, got it, sorry about the confusion earlier:

Total responses above 1000 chars: 135262

So yeah, we're well above the character limit threshold.

Character limit threshold is important, not word limit, because for, "bot detection," at least for GPT-2, you get around an 80% true positive detection rate using the existing GPT-2 detection inference model. From anecdotal experience, it seems to work just as well with GPT-3.5, not sure about GPT-4 yet, but this will be really helpful to try to test things out.

predicted NO

@PatrickDelaney

#!/usr/bin/python

import json

total_char_counts = []
charcount_dict = {}

INPUT_FILE = 'gpt4all_curated_data_full_2022_03_27.jsonl'

with open(INPUT_FILE) as f:
    count=0
    for line in f:
        count=count+1
        # put response char counts into a dict, tagged by line number
        json_obj = json.loads(line)
        response = json_obj['response']
        char_count = len(response)
        print(char_count)
        charcount_dict[count] = char_count

# save the response charcounts by line to a csv
print("charcount_dict: ",charcount_dict)
file1 = open("charcounts.txt", "w")
L = str(charcount_dict)
file1.writelines(L)
file1.close()

THRESHOLD = 1000
over_threshold_chars = {k:v for (k,v) in charcount_dict.items() if v > THRESHOLD}
# top responses with greater than THRESHOLD chars
over_threshold_chars_sorted = sorted(over_threshold_chars.items(), key=lambda x:x[1])
file2 = open("charcounts_threshold.txt", "w")
L = str(over_threshold_chars_sorted)
file2.writelines(L)
file2.close()

file2 = open("charcount_above_threshold.txt","w")
len(over_threshold_chars_sorted)
L = "Total responses above " + str(THRESHOLD) + ": " + str(len(over_threshold_chars_sorted))
file2.writelines(L)
file2.close()
predicted NO

@JimHays I looked at the Kaggle dataset with the following code and only found 28 responses over 1000 characters. Here's my Python code. I will be throwing this up on Github later today. I have not looked at the HuggingFace database yet but I believe that will probably suffice.

Part of why I'm doing it this way is I'm presuming there may be others who followed this market who actually may be interested in the stack trace of the whole market for their own research purposes.

#!/usr/bin/python

import json

total_word_counts = []
wordcount_dict = {}

INPUT_FILE = 'gpt4all_curated_data_full_2022_03_27'

with open(INPUT_FILE) as f:
    count=0
    for line in f:
        count=count+1
        # put response word counts into a dict, tagged by line number
        json_obj = json.loads(line)
        response = json_obj['response']
        word_count = len(response.split())
        print(word_count)
        wordcount_dict[count] = word_count

# save the response wordcounts by line to a csv
print(wordcount_dict)
file1 = open("wordcounts.txt", "w")
L = str(wordcount_dict)
file1.writelines(L)
file1.close()

THRESHOLD = 1000
over_threshold_words = {k:v for (k,v) in wordcount_dict.items() if v > THRESHOLD}
# top responses with greater than THRESHOLD words
over_threshold_words_sorted = sorted(over_threshold_words.items(), key=lambda x:x[1])
file2 = open("wordcounts_threshold.txt", "w")
L = str(over_threshold_words_sorted)
file2.writelines(L)
file2.close()

file2 = open("count_above_threshold.txt","w")
len(over_threshold_words_sorted)
L = "Total responses above " + str(THRESHOLD) + ": " + str(len(over_threshold_words_sorted))
file2.writelines(L)
file2.close()
predicted NO

@PatrickDelaney Here were the actual lines that were above the 1000 word threshold:

[(796729, 1006), (796495, 1006), (271929, 1032), (796168, 1037), (798529, 1037), (258566, 1047), (799946, 1048), (537775, 1052), (727814, 1053), (669473, 1060), (801638, 1081), (798940, 1086), (216746, 1091), (800552, 1109), (800298, 1118), (805636, 1119), (265116, 1154), (200516, 1160), (668132, 1160), (794725, 1176), (804119, 1184), (636749, 1227), (799324, 1238), (806052, 1271), (698488, 1306), (795497, 1350), (724168, 1449), (238466, 1656)]
predicted NO

@PatrickDelaney Crap, I ran the same test on https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3/blob/main/data.jsonl and I'm also getting 28 above 1000 test responses.

Am I missing any links?

predicted NO

@PatrickDelaney OK I forgot to run this one. https://github.com/radi-cho/botbots/ I need to write some new code for that.

predicted YES

@PatrickDelaney I think I identified what's going on. In your comment 3 above from here you said you are using a 1000 word threshold, but the market description says it is a 1000 character threshold.

predicted YES

If that's lower than what you needed, you may want to make a new market with a higher threshold.

predicted NO

@JimHays I just realized I counted wrong, I need to count characters, not words. brb.

@PatrickDelaney Ah! Awesome. Also, if that doesn't satisfy, I'm all for extending the market and giving this problem a satisfactory end

predicted NO

Sorry for the delay, life stuff this weekend...working on verifying things...seems like it's a YES but just want to make absolute sure.

Hi

predicted NO

Here's my collection of 23 different essays that I had generated...paid someone $5 on Fivvr for this.

https://docs.google.com/spreadsheets/d/1brzZjGP3JSTJbBN77nukXErNq6zeT3A3HMuDB_gJ1yc/edit?usp=sharing

If anyone can find or generate an additional 977 from an updated GPT-based LLM which is at least somewhat equivalent to GPT-3 or greater, I will accept it.

I will accept LLaMA, Bard, Bing, at this point to keep things interesting.

I really do not want to have to resolve this as NO.

predicted YES

Better yet, the training data for GPT4All

Found here: https://www.kaggle.com/datasets/konradb/gpt4all-repo

Under:
Data Card > Reproducibility > Raw Data > Full Dataset with P3

@JimHays is this train data or data produced as output?

predicted YES

@firstuserhere This is training data for GPT4ALL. My understanding is that they trained it on ChatGPT input/output pairs. I haven't downloaded the dataset to inspect it myself

predicted NO

@JimHays taking a look.

predicted NO

@JimHays So what I'm seeing is the format of the output is the following:

{

"prompt": "Russia Finishes Building Iran Nuclear Plant MOSCOW (Reuters) - Russia and Iran said Thursday they had finished construction of an atomic power plant in the Islamic Republic -- a project the United States fears Tehran could use to make nuclear arms. \nIs this a piece of news regarding world politics, sports, business, or science and technology? ",

"response": "This is a piece of news regarding world politics and science and technology.",

"source": "bigscience/p3"

}

That's only a 76 character response. If you've ever run an inference locally for a language model on your own machine with a CPU, for example GPT-2, you can see that the amount of time it takes to generate a response goes up significantly with the length of the response. In other words, generating 76 characters per response is easy, whereas the benchmark above of 1000 is difficult and, "expensive."

Unless there's something else in that repo that I'm missing...?

predicted YES

@PatrickDelaney
I think the example you found may have just been particularly short? I'm not sure if the link below is exactly the same as the one I linked above, but I ended up downloading this, and I think it fits both the letter of the market as well as the spirit:

https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3/blob/main/data.jsonl

It has 437,606 rows. In the first ~1000 rows that I looked at, about half of them were over 1000 characters.

predicted NO

@JimHays Nice , Ok I will take a look.

predicted NO

@JimHays seems super close.

{

"prompt": "<p>I installed Cassandra and Spark with SparkSQL on my machine.\nSpark SQL support JOIN Keyword</p>\n\n<p><a href=\"https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html\" rel=\"nofollow\">https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html</a></p>\n\n<blockquote>\n <p>Supported syntax of Spark SQL The following syntax defines a SELECT\n query.</p>\n \n <p>SELECT [DISTINCT] [column names]|[wildcard] FROM [kesypace\n name.]table name [JOIN clause table name ON join condition] [WHERE\n condition] [GROUP BY column name] [HAVING conditions] [ORDER BY column\n names [ASC | DSC]]</p>\n</blockquote>\n\n<p>I have the following code</p>\n\n<pre><code>SparkConf conf = new SparkConf().setAppName(\"My application\").setMaster(\"local\");\nconf.set(\"spark.cassandra.connection.host\", \"localhost\");\nJavaSparkContext sc = new JavaSparkContext(conf);\nCassandraConnector connector = CassandraConnector.apply(sc.getConf());\nSession session = connector.openSession();\n\nResultSet results;\nString sql =\"\";\n\n\nBufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));\nsql = \"SELECT from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID ALLOW FILTERING;\";\nresults = session.execute(sql);\n</code></pre>\n\n<p>I get the following error</p>\n\n<blockquote>\n <p>Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:25\n missing EOF at ',' (SELECT from siem.report[,] siem...) 11:14 AM \n com.datastax.driver.core.exceptions.SyntaxError: line 1:25 missing EOF\n at ',' (SELECT from siem.report[,] siem...) at\n com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)\n at\n com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)\n at\n com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)\n at\n com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)\n at\n com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)\n at\n com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)\n at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at\n sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n at\n sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n at java.lang.reflect.Method.invoke(Method.java:483) at\n com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)\n at com.sun.proxy.$Proxy59.execute(Unknown Source) at\n com.ge.predix.rmd.siem.boot.PersistenceTest.test_QuerySparkOnReport_GIACOMO_LogDao(PersistenceTest.java:178)\n at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at\n sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n at\n sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n at java.lang.reflect.Method.invoke(Method.java:483) at\n org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)\n at\n org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)\n at\n org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)\n at\n org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)\n at\n org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:73)\n at org.springframework.test.context.junit4.statements</p>\n</blockquote>\n\n<p>also tried with</p>\n\n<pre><code>SELECT from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID ALLOW FILTERING\n</code></pre>\n\n<p>also tried with</p>\n\n<pre><code>SELECT from siem.report R JOIN siem.netstat N on R.REPORTUUID = N.NETSTATREPORTUUID ALLOW FILTERING\n</code></pre>\n\n<p>can someone help me? I'm really using SparkSQL or CQL?</p>\n\n<h1>UPDATE</h1>\n\n<p>I tried </p>\n\n<pre><code>public void test_JOIN_on_Cassandra () {\n\n SparkConf conf = new SparkConf().setAppName(\"My application\").setMaster(\"local\");\n conf.set(\"spark.cassandra.connection.host\", \"localhost\");\n JavaSparkContext sc = new JavaSparkContext(conf);\n\n\n SQLContext sqlContext = new SQLContext(sc);\n try {\n //QueryExecution test1 = sqlContext.executeSql(\"SELECT from siem.report\");\n //QueryExecution test2 = sqlContext.executeSql(\"SELECT from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID\");\n QueryExecution test3 = sqlContext.executeSql(\"SELECT from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID\");\n\n } catch (Exception e) {\n e.printStackTrace();\n }\n\n // SchemaRDD results = sc.sql(\"SELECT from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID\");\n\n}\n</code></pre>\n\n<p>and I get</p>\n\n<blockquote>\n <p>== Parsed Logical Plan == 'Project [unresolvedalias(<em>)]\n +- 'Join Inner, Some(('siem.report.REPORTUUID = 'siem.netstat.NETSTATREPORTUUID)) :- 'UnresolvedRelation\n <code>siem</code>.<code>report</code>, None +- 'UnresolvedRelation <code>siem</code>.<code>netstat</code>, None\n == Analyzed Logical Plan == org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid\n call to toAttribute on unresolved object, tree: unresolvedalias(</em>)\n 'Project [unresolvedalias()]\n +- 'Join Inner, Some(('siem.report.REPORTUUID = 'siem.netstat.NETSTATREPORTUUID)) :- 'UnresolvedRelation\n <code>siem</code>.<code>report</code>, None +- 'UnresolvedRelation <code>siem</code>.<code>netstat</code>, None\n == Optimized Logical Plan == org.apache.spark.sql.AnalysisException: Table not found: <code>siem</code>.<code>report</code>;\n == Physical Plan == org.apache.spark.sql.AnalysisException: Table not found: <code>siem</code>.<code>report</code>;</p>\n</blockquote>",

"response": "It seems that the error is caused by the syntax of the SQL query. Here are a few suggestions:\n\n- Try removing the \"ALLOW FILTERING\" clause, as it can have a negative impact on performance and is not needed in this case.\n- Make sure that the keyspace and table names are spelled correctly and that they exist in your Cassandra database.\n- If the table names contain uppercase letters, make sure to enclose them in double quotes like this: \"siem\".\"Report\".\n- If you're still having issues, try running the same query directly in CQLSH to see if it works there. This will help you determine if the issue is with the Spark SQL syntax or with the Cassandra database itself.\n\nAlso, to answer your question, you're using Spark SQL to execute CQL queries against Cassandra. So you're essentially using both technologies together.",

"source": "pacovaldez/stackoverflow-questions"

}

That character count has around 900, I think that would suffice.

I am going to review this later to make absolutely sure, but yeah, given the size it seems right and assuming everything is all here, this will qualify as YES. @firstuserhere

Cool! Great find!

bought Ṁ200 of YES

https://www.kaggle.com/datasets/abhishek/gpt2-output-data?resource=download
This dataset of 5000 outputs from GPT-2 has 1510 outputs which reached the maximum output length of 1024. So I believe this meets the threshold of 1000+ outputs of 1000+ characters.

@JimHays there was a qualitative gap post gpt-2, that's why i interpreted this market as only gpt-3 onwards, but then again, that's just me finding usefulness in it. Maybe this counts

predicted YES

@firstuserhere I had initially also thought it applied only to more recent models based on the examples given (and probably that was the original intention), but on-reading it, the description specifies an LLM, but doesn't give further qualifications beyond that. We'll see what @PatrickDelaney thinks, I guess

predicted NO

@JimHays Yeah I was hoping this would be GPT 3.5 or greater, given that the original specification stated:

For example, a list of 1000 or more essays or 1000 or more code samples generated by ChatGPT/GPT3.5.

predicted NO

@JimHays I sort of feel like I'm going to need to resolve the market to at least partial credit for this one, given that myself and a bot are the only NO betters, and you did at least find something new and made an effort. That being said, not sure what's fair...I would really prefer to encourage taking all of my Mana on this market and finding a repo like this which uses a more modern larger parameter LLM.