Give me feedback on my latest paper

No bounty left

I'm currently working on a PhD. The main thing I've produced so far is this paper outlining my hypothesis that Transformer models (the basis of LLMs) represent the best extant bounded approximation of Solomonoff Induction and some brief arguments for why it matters.

Some things I'll award this bounty for:

Feedback, positive or negative
Ideas for followup papers
Ideas for experiments/practical work that would help confirm/disconfirm my hypothesis
Links to relevant work
Anything else relevant to my work that I find helpful!

I'll award every response I find helpful, and more helpful comments will get more money. I might increase the bounty if I value your contribution enough. Feel free to make offers like "I'll write up my thoughts in a public blog post for Ṁ10000".

Market context

Personal

Academia

Get

1,000

to start trading!

People are also trading

Recommend me papers!

Ṁ461 bounty

8 Comments

Sort by:

+Ṁ500

High-level comments:

Talking about unbounded computation in reference to NNs, seems like the neural tangent kernel should be relevant. What does the NTK for a transformer look like (sparse hits for google on "neural tangent kernel transformer". [potential ref](https://arxiv.org/abs/2006.14548))? Is the story that this would act even more like SolInd? Would it get around the limitations of section 5? Food for thought.

Medium Level comments:

Section 4.1-2: Feels like these two subsections are part of the same argument, and could be combined.

Section 5.1: Feels like this subsection is titled more broadly than its contents indicate. Are there other examples of tasks that transformers fail on that are reminiscent of things a SolInd should get right?

Low-level comments:

My sense is that "transformers" is not usually capitalized in the literature.

"identification of patterns identified" feels like the same root word is being used too close to itself.

"incomputable" you use this word a few times, it seems like you are referencing the uncomputability of Solomonoff's induction. Why not state this more formally and cite a proof of the relevant theorem? To be even more constructively pedantic, is it even really true to say it is "incomputable without infinite memory and the capability to execute an infinite number of programs in a finite time period". What does infinite memory even mean?

"Any input string s given to M will result in some output string x". Unless M never halts.

"every computable string" -> shouldn't this just be, more simply, "every string"?

+Ṁ400

And a more high-level thing. It seems implausible that regular is the upper limit of reliable computational tasks of transformers, given that they emulate human language not-that-awfully, and human language is... well, it's technically mildly-context-sensitive due to some rarer patterns, but at least the fact that regular mechanisms give bad performance and you need context-free ones to model basically anything worthwhile suggests their context-free performance can't be that bad.

(For computational linguistics aficionados: yes, you can circumvent this with derivation trees, but I dare you declare that LLMs build the neural network equivalent of derivation trees, I dare you.)

+Ṁ100

For simplicity, we will assume that M uses a binary alphabet, but these results generalise to any alphabet.

This is really penny-pinching, but "any" should probably be "any larger": unary alphabets have some weird properties that disappear in larger alphabets (for instance, it is trivial to prove that any context-free formal language in unary alphabet is regular, where obviously it is not so for non-unary alphabets). Alternatively, if you do think your finding applies to unary alphabet, an explanation why should be there, even if a short one (for the n-ary with n>2, the coding explanation of why "if binary works this works" is well-known). Of course, a Turing machine with only one symbol is, how do I put it, not exactly interesting, but still.

with the decimal expansion of the neuron’s value (in some base)

Another penny-pinching: isn't it a terminological contradiction to assume base<>10 and speak of "decimal" expansion? What's after the comma (point for you American weirdos) in, say, (101.1011)2 isn't its "decimal" part, because the number isn't in decimal system. Or is a different meaning of the word "base" meant?

People are also trading

Recommend me papers!

Ṁ461 bounty

People are also trading

People are also trading

Related questions