Login | April 28, 2024

Extensive Stanford study finds legal AI is very bad. Surprise! (not)

RICHARD WEINER
Technology for Lawyers

Published: March 1, 2024

Stanford University, which has given us so many of the basement-dwelling techies who do such a poor job writing AI algorithms, has done a massive study on some of the damage that they have caused--this time particularly in the legal field.
Stanford led a research team that created 200,000 legal “prompts,” which are questions to the generative AI, to the three big boyz in the field: OpenAI’s ChatGPT 3.5, Google’s PaLM 2, and Meta’s Llama2.
A generative AI in this context is called a “model.”
The models came up with hallucinated (wrong, just made up) answers to legal questions a layperson would ask a preponderance (75%) of the time.
This requires a little explanation of the interior of these models.
The models in question are all general-purpose models which are not exclusively trained into the law, the latter like the models currently in use by Thomson (West) and LexisNexis.
Those legal platforms would be used by lawyers almost exclusively.
The general models, on the other hand, would be used by the general populace, particularly non-lawyers looking for answers to legal problems.
When Chief Justice John Roberts stated a few weeks ago that AI could help people who can’t afford lawyers with their legal questions, this is what he was talking about. He was, unfortunately, pretty wrong/naïve about the whole thing.
Now, the general models are indeed being trained in the law.
However, hat training is being conducted inside models that are not created by/for lawyers, and so they have some resistance to that training. They want easy answers to easy questions, particularly in areas, like the law, that are far outside the experience of the people who write the original algos and their model training materials.
So, when asked difficult questions, the models lose their shit and start hallucinating. Basically.
There are built-in weaknesses that cause the models to malfunction.
They can’t see court cases that only exist inside paywalls like Westlaw. They can’t think like lawyers: When asked to compare two cases, the models got all the analyses wrong.
Many legal matters are outside of their training time frames. For instance, they can neither find cases older than about 100 years and they can’t find laws that have been updated in the last year or two.
And on and on. This is a serious societal problem, and, while the companies are definitely working on solutions, they have yet to figure out how to do so.
Here’s a PDF of the preprint study, if you’re interested: https://arxiv.org/pdf/2401.01301.pdf


[Back]