The Laboratorium (3d ser.)

A blog by James Grimmelmann

Soyez réglé dans votre vie et ordinaire afin
d'être violent et original dans vos oeuvres.

Generative Misinterpretation

I have a new article draft, Generative Misinterpretation, with Ben Sobel and David Stein, forthcoming in the Harvard Journal on Legislation. We argue against recent proposals to use LLMs in the judicial process. We combine an empirical critique, showing that experimental results in using LLMs to perform interpretation are brittle and arbitrary, with a jurisprudential critique, explaining why commonly offered justifications for using LLMs don’t work in the context of judging. Here is the abstract:

In a series of provocative experiments, a loose group of scholars, lawyers, and judges has endorsed generative interpretation: asking large language models (LLMs) like ChatGPT and Claude to resolve interpretive issues from actual cases. With varying degrees of confidence, they argue that LLMs are (or will soon be) able to assist—–or even replace—–judges in performing interpretive tasks like determining the meaning of a term in a contract or statute. A few go even further and argue for using LLMs to decide entire cases and to generate opinions supporting those decisions.

We respectfully dissent. In this Article, we show that LLMs are not yet fit for purpose for use in judicial chambers. Generative interpretation, like all empirical methods, must bridge two gaps to be useful and legitimate. The first is a reliability gap: are its methods consistent and reproducible enough to be trusted in high-stakes, real-world settings? Unfortunately, as we show, LLM proponents’ experimental results are brittle and frequently arbitrary. The second is an epistemic gap: do these methods measure what they purport to? Here, LLM proponents have pointed to (1) LLMs’ training processes on large datasets, (2) empirical measures of LLM outputs, (3) the rhetorical persuasiveness of those outputs, and (4) the assumed predictability of algorithmic methods. We show, however, that all of these justifications rest on unstated and faulty premises about the nature of LLMs and the nature of judging.

The superficial fluency of LLM-generated text conceals fundamental gaps between what these models are currently capable of and what legal interpretation requires to be methodologically and socially legitimate. Put simply, any human or computer can put words on a page, but it takes something more to turn those words into a legitimate act of legal interpretation. LLM proponents do not yet have a plausible story of what that “something more” comprises.

Comments welcome!

scholarship