MHM Magazine

Issue 3 | 2023 | MENTAL HEALTH MATTERS | 15 MHM of chatbots started overcoming the original Turing test as early as 2014. There have been a number of incidents where people have been uncertain as to whether content they are seeing, whether it be in the form of text or even now in the form of images, is something that was created by a human or by a machine. Additionally, the level of intelligence (in as much as we understand the concept of intelligence) that is displayed by these artificial intelligences is growing at an incredible rate. The concept of what an “expert” is, is something that is being challenged and we will need to explore what it means for us in all of our respective fields as we see artificial intelligence encroaching. To show just how deeply this touches us, we need only look at the MMLU (Massive Multitask Language Understanding) benchmark that is used to test Large Language Models on their general and specific knowledge. In this benchmark 57 specialised areas are represented with just over 14,000 multiple choice questions with four possible answers. As such, one would expect someone to get at least 25% by just randomly guessing the answers. Based on tests done with unspecialised humans that have taken the test, the average accuracy obtained by them comes to 34.5%. With experts however in their own field this value becomes much higher. DIFFERENT LANGUAGES BENCHMARK DONE IN 26 MEDICINE SPECIALISATION EXPERTS PASS EXAM SCORE 87.0% 89.8% ACCURACY 95TH PERCENTILE ALL AREAS COME TO ACCURACY ON AVERAGE 85.5% CURRENTLY GPT-4 ACCURACY ON AVERAGE 84.1% ACCURACY ON AFRIKAANS ALMOST AS GOOD AS ENGLISH 62.0% ACCURACY ON TELUGU LOWEST SCORING LANGUAGE GPT-4 NOT EXPERT IN ALL FIELDS CLEAR THAT IS THE GOAL ESTIMATE THAT “EXPERT LEVEL KNOWLEDGE” When looking at the knowledge that is included in the medicine specialisation, the questions are taken from the US Medical Licensing Exam for which experts that pass the exam score an 87% accuracy at the 95th percentile. From the paper written on the test by the authors, they estimate that “expert level knowledge” in all areas should come to about 89.8% accuracy on average. Currently, GPT-4 is able to attain an average accuracy of 85.5% in all areas measured. Additionally, this benchmark was not only done in English, but in 26 different languages. The accuracy on Afrikaans was almost as good as English at 84.1%, but the lowest scoring language was Telugu where it only managed to obtain 62% accuracy. This means that we cannot yet claim GPT-4 is an expert in all fields, but it is clear that this is the goal. What does this mean for us? It does not seem that artificial intelligence is likely to take all jobs that require knowledge and skills that are represented in natural language, but it is definitely going to impact these jobs. Practitioners that explore this new technology and understand how it can supplement their practice are more likely to be successful in future in knowing where these tools can help them and their patients, than those who simply ignore them or use them indiscriminately. What we need to do is explore tools that are available, evaluate them to see whether they are fit for purpose in the South African context, and use what is valuable to lighten the load on our existing ecosystem. With this in mind, I would like to put forward four technology interventions that have potential, and could be explored in this way. These are: • Daylio: A mood and activity tracker that allows one to write a journal, as well as capture custom activities and moods that allows for the most relevant data for the patient in question to be tracked. • Woebot: a Chatbot that uses Cognitive Behavioural Therapy, Interpersonal Psychotherapy and Dialectical Behaviour Therapy techniques along with natural language and artificial intelligence to create a safe space to explore negative emotions and find coping strategies. It also allows one to journal and track moods. • Replika: a more general purpose chatbot that has some mental health activities integrated into it, but is more focussed on addressing loneliness as it tries to build a relationship with the user by learning about them over time. • Cass: a chatbot that that is extensively trained on recorded interactions between patients and clinicians and attempts to replicate this experience. The approach taken by this tool is to integrate it into an existing support system to allow the chatbot to act as a first-line interaction with the patient, which then allows it to offer a triage service. This is not a tool that a patient just downloads, but instead is based on an enterprise relationship and integration into an organisation’s wellness ecosystem. Daylio and Woebot (recently only available in the US) are free apps that can be downloaded by a patient and used at no cost, Replica has a free and paid version ($7.99/ month or $49.99/year), and Cass doesn’t sell directly to patients and instead needs to be integrated into a practice. These are not the only tools that are available, but what is becoming clear is that technology is constantly progressing, and with it comes new opportunities to supplement ones practice. Technology is not going to replace clinicians just yet, but will hopefully be able to take on the role of partner in care rather than competition. Links to the above mentioned apps: https://play.google.com/store/apps/ details?id=net.daylio https://play.google.com/store/apps/ details?id=ai.replika.app https://woebothealth.com/ https://www.cass.ai/ References available on request.