
It has to be smarter than GPT-4.
Be able to enrich its rules database when it encounters new facts and be able to delete something from own database if it was proven wrong (incompatible with axioms and mathematics). The decisions to add or delete should be made by the system itself.
To draw the line about replacement rules:
Binary logic does not count, because it only operates with bits, which are not universal statements. The system should be able to operate with complicated statements like "density of prime numbers reduces as n->inf". This statement might be presented as an object of relations between entities, but it is passed to the logic core as a whole.
On the abstract level language predictors don't do that, they calculate probability of a word, so any gpt/llm from Wolfram does not qualify.