AI system keeps getting more emergent capabilities in non-technical domain. In the medical field, current AI system show promising capabilities in very narrow tasks like radio analysis, surpassing almost any human. But will it be the case for more general domain like psychotherapy ? Will resolve to YES if an AI system shows psychotherapy result better than 95% of human psychotherapist in practice or in clinical study.
@HansPeter @Guillaume The study appears quite weak, TBH. A big red flag is the methods description:
> The first dimension’s items (41 items) of SI scale were formulated to be answered with true or false (0–1 scores per item; range 0–41), while the answer options of the second dimension (23 items) include 4 points, three of which are false and one is correct (0–1 scores per item; range 0–23).
I don't think any question that has a "correct" yes/no/choose one answer can actually measure social intelligence - what I understand by social intelligence requires nuance and context not a firm yes/no answers to short scenarios. Also the authors don't share the scale (it is reference to an "unpublished dissertation") and don't provide even a single example of an item they used. They claim the score is similar to "George Washington University Brief Scale of SI" which appears to be a scale from the 1930s. I was able to find only parts of it online (https://ehive.com/collections/5889/objects/534738/social-intelligence-test-george-washington-university-series , https://americanhistory.si.edu/collections/nmah_692403) and it doesn't exactly give me confidence that the scale made sense even in the 30s. The validity checks reported in the paper just test that the individual questions correlate together, not that they measure anything real.
Second red flag is no information about the prompts, temperature and other settings used and no uncertainty in the AI answers - although it is quite likely that unless you set low temperature, you should get some variability across multiple runs of the same prompt.
Persistent typos throughout the study (e.g. "Google Bird" in abstract, "Google Brand" in Table 1) also suggest pretty weak quality control overall.
If Noom isn't already using at least some AI, something like it certainly could. There's a lot of dialectical behavioral theory and cognitive behavioral theory in the coaching. Wouldn't completely replace "warm touch" talk therapy -- at least not until it becomes impossible for humans to want their stories to be heard -- but great for day-to-day maintenance.
I worry about how to resolve this objectively. Is there a measuring method in common use that says how well psychotherapists are doing at their jobs, such that if an AI does better than 95% of psychotherapists who subject themselves to that method you can resolve this yes?
If an AI says "this radiological scan shows likely cancer" there either is or is not cancer, and humans can look at that scan and either catch it or not, and if they don't catch it the truth will still eventually become known so there's a good basis for comparison. Whereas with therapists, it's more "if you're not getting good results with the first one, don't give up, you've got to find one who is a good fit for you", and different therapists work well for different people.
If an AI can radically alter its "personality" and approach to be custom for each patient in a way most psychotherapists, who (as with most humans) can be rather attached to their personalities, will not do, that could mean the AI approach is way more effective.... but still hard to measure.
@equinoxhq Yes not as binary as the radios for sure. But I think a good metric aside studies could be the speak of professional in practice themselves. If a very large majority report comparing themselves/using AI and getting a least slightly superior (or radically superior) insight from AI than their own, I would count that as a hint toward positive resolution. Also if AI continuous following of subjects also radically contribute to better outcomes this would be another hint. Maybe the expression ‘insight’ is better than ‘result’ here.
Also my guess is that field of psychology could evolve until there and maybe new metric could appear that are more reliable than today… We’ll see !
@Guillaume I was kind of hoping you'd go "yes, actually there is an objective method of measuring the effectiveness of psychotherapy, if you were more familiar with the field you would know this". Oh well...
@equinoxhq I highly doubt there is a metric appart subject feedback. But I must precise I’m not a psychologist nor a mental health worker so feel free to amaze me haha
No human has access to 1 on 1 counselling like AI could offer, but it's also possibly prone to error or be too general. But consistently using an AI as a tool for cognitive behavioural therapy, such as for anxiety and other common problems, hasn't it already had some utility for folk?
At least Reddit users said they had, until chatgpt started getting more cautious and saying "go to a doctor I'm just an AI" more often. Sorry for the anecdotes