The goal of this market is to assess whether free dictation tools are ready for regular use in professional writing.
In June 2023, I will attempt dictation of a 1000-word English text, run a spell checker / grammar checker to correct errors, and then diff the text with a ground truth of the same text. If there are any remaining mistakes revealed by the diff, I will then proofread the text manually and repeat the diff until all errors are corrected. I will then repeat the same process, but typing by hand in both qwerty and dvorak. This market resolves YES if the time to get to a perfect match with the ground truth is lower by dictation than by hand typing (the faster of querty and dvorak), once time spent proofreading is included.
Please leave suggestions for realistic text and dictation engines to use in the comments. Text should include some language that dictation software usually has trouble with, such as their/there/they're and two/to/too, but be representative of professional writing. In the interest of fairness, I will attempt all dictation engines suggested (which I can use for free) and take the fastest. I will also consider use of dictation engines included in software I already use (i.e. zero marginal cost to me.) I currently use Linux in personal use and MS Office / Win10 at work.
Because resolution requires trust, I will not be betting in this market.
Related questions
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ57 | |
2 | Ṁ32 | |
3 | Ṁ20 | |
4 | Ṁ16 | |
5 | Ṁ14 |
Legal typing (Qwerty in Word) with initial corrections and formatting 21:10.85
Scientific typing (Qwerty in Word) with initial corrections and formatting 25:42.02
Corrections (Qwerty in Word), v1: 5:10.73
Corrections (Qwerty in Word), v2: 3:11.87
Total for typing: 55:14
STT in Word
Legal (STT in Word) 19:47
Scientific (STT in Word) 21:26
Corrections, v1 by hand: 21:43
Corrections, v1 via GPT4: 00:20 (time to write prompt)
Corrections, v2 by hand following v1 via GPT4: 16:30
Lower bound for STT in Word: 62:56
Lower bound for STT in Word with GPT: 58:03
I have decided on a test text: (1) the preliminary opinion from DUPREE v. YOUNGER
CERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE FOURTH CIRCUIT
No. 22–210. Decided May 25, 2023, without the note (765 words). https://en.wikisource.org/wiki/Dupree_v._Younger
and the first page of the first interesting actual journal article in Nature this week:
(2) Calvin, A., Eierman, S., Peng, Z. et al. Single Molecule Infrared Spectroscopy in the Gas Phase. Nature (2023). https://doi.org/10.1038/s41586-023-06351-7 (786 words).
To get a well-written but obscure text in English with high probability, just visit https://en.wikisource.org/wiki/Special:Random
@duck_master Testing with this method I came up with https://en.wikisource.org/wiki/Proclamation_1506B - recorded the first two paragraphs. And then ran whisper with the small.en model on that (which is the biggest that will fit into my laptop's vram).
Recording + fixing took me 3 minutes, typing took me 2 minutes 58 seconds (that's slightly slower than @brp if we're counting).
I then went through and compared the two versions and the original, my typed version had no (I corrected as I went) errors and my corrected whisper still had 7 - things like extra commas, small wording changes, and capitalization differences. Correcting those without the benefit of diff may not be easy, on the other hand, They may not matter so much for dictation?
This is far better than whisper did on the Jabberwocky poem, which I didn't even bother comparing because I had to correct every third word.
Whisper doesn't have any integration to actually use it in real time - much less as a keyboard substitute - so I'm not sure it really counts as dictation software?
@lukalot I will be doing my own typing and proofreading. A quick online typing speed test says I type at 62 wpm qwerty and 35 wpm dvorak (correcting for errors as I go). For this question I will use the faster of the two.
Ok! I've tried comparing my own typing speed to several internet demos of voice dictation, and while I'm usually faster, they can be pretty close (especially with code examples).
If you chose the text that you're going to use that could be relevant (is it just random words or actual sentences with punctuation?), but I think I'm more optimistic than the current market anyway so I'll wager yes. thanks for the info
candidates, quite possibly not there yet but close enough that they could plausibly compete:
openai whisper (currently just a component; inclusion depends on if you're filtering by full deployability or just raw accuracy without ui consideration)
dragon voice recognition (I would short this one)
google voice input (I would short this one)
very, very important question here: what microphone will you use? even for humans, microphones are really not all the same for understandability. current voice rec algos are a bit weak.
@L Thank you. This comment is gold. I hadn't thought about the microphone at all. Was going to use whatever is built into my USB headset (PTC Plantronics C3210), as it seems to elicit fewer complaints than the built-in mic of my computer or phone.
I found a website advertising it as "Frequency Response 20 - 20000 Hz". A search for "PTC 208" (written on the device) turns up some regulatory forms in New Zealand, so it might be subject to regulations regarding gain, wave distortion, etc.