The Laboratorium (3d ser.)

A blog by James Grimmelmann

Soyez réglé dans votre vie et ordinaire afin
d'être violent et original dans vos oeuvres.

The Files are in the Computer

I have a new draft essay, The Files are in the Computer: On Copyright, Memorization, and Generative AI. It is a joint work with my regular co-author A. Feder Cooper, who just completed his Ph.D. in Computer Science at Cornell. We presented an earlier version of the paper at the online AI Disrupting Law Symposium symposium hosted by the Chicago-Kent Law Review in April, and the final version will come out in the CKLR. Here is the abstract:

The New York Times’s copyright lawsuit against OpenAI and Microsoft alleges that OpenAI’s GPT models have “memorized” Times articles. Other lawsuits make similar claims. But parties, courts, and scholars disagree on what memorization is, whether it is taking place, and what its copyright implications are. Unfortunately, these debates are clouded by deep ambiguities over the nature of “memorization,” leading participants to talk past one another.

In this Essay, we attempt to bring clarity to the conversation over memorization and its relationship to copyright law. Memorization is a highly active area of research in machine learning, and we draw on that literature to pro- vide a firm technical foundation for legal discussions. The core of the Essay is a precise definition of memorization for a legal audience. We say that a model has “memorized” a piece of training data when (1) it is possible to reconstruct from the model (2) a near-exact copy of (3) a substantial portion of (4) that specific piece of training data. We distinguish memorization from “extraction” (in which a user intentionally causes a model to generate a near-exact copy), from “regurgitation” (in which a model generates a near-exact copy, regardless of the user’s intentions), and from “reconstruction” (in which the near-exact copy can be obtained from the model by any means, not necessarily the ordinary generation process).

Several important consequences follow from these definitions. First, not all learning is memorization: much of what generative-AI models do involves generalizing from large amounts of training data, not just memorizing individual pieces of it. Second, memorization occurs when a model is trained; it is not something that happens when a model generates a regurgitated output. Regurgitation is a symptom of memorization in the model, not its cause. Third, when a model has memorized training data, the model is a “copy” of that training data in the sense used by copyright law. Fourth, a model is not like a VCR or other general-purpose copying technology; it is better at generating some types of outputs (possibly including regurgitated ones) than others. Fifth, memorization is not just a phenomenon that is caused by “adversarial” users bent on extraction; it is a capability that is latent in the model itself. Sixth, the amount of training data that a model memorizes is a consequence of choices made in the training process; different decisions about what data to train on and how to train on it can affect what the model memorizes. Seventh, system design choices also matter at generation time. Whether or not a model that has memorized training data actually regurgitates that data depends on the design of the overall system: developers can use other guardrails to prevent extraction and regurgitation. In a very real sense, memorized training data is in the model–to quote Zoolander, the files are in the computer.

A Statement on Signing

I am serving on Cornell’s Committee on Campus Expressive Activity. We have been charged with “making recommendations for the formulation of a Cornell policy that both protects free expression and the right to protest, while establishing content-neutral limits that ensure the ability of the university community to pursue its mission.” Our mission includes formulating a replacement for Cornell’s controversial Interim Expressive Activity Policy, making recommendations about how the university should respond to violations of the policy, and educating faculty, staff, and students about the policy and the values at stake.

I have resolved that while I am serving on the committee, I will not sign letters or other policy statements on these issues. This is a blanket abstention. It does not reflect any agreement or disagreement with the specifics of a statement within the scope of what the committee will consider.

This is not because I have no views on free speech, universities’ mission, protests, and student discipline. I do. Some of them are public because I have written about them at length; others are private because I have never shared them with anyone; most are somewhere in between. Some of these views are strongly held; others are so tentative they could shift in a light breeze.

Instead, I believe that a principled open-mindedness is one of the most important things I can bring to the committee. This has been a difficult year for Cornell, as for many other colleges and universities. Frustration is high, and trust is low.A good policy can help repair some of this damage. It should help students feel safe, respected, welcomed, and heard. It should help community members be able to trust the administration, and each other. Everyone should be able to feel that the policy was created and is being applied fairly, honestly, and justly. Whether or not we achieve that goal, we have to try.

I think that signing my name to something is a commitment. It means that I endorse what it says, and that I am prepared to defend those views in detail if challenged. If I sign a letter now, and then vote for a committee report that endorses something different, I think my co-signers would be entitled to ask me to explain why my thinking had changed. And if I sign a letter now, I think someone who disagrees with it would be entitled to ask whether I am as ready to listen to their views as I should be.

Other members of the committee may reach different conclusions about what and when to sign, and I respect their choices. My stance reflects my individual views on what signing a letter means, and about what I personally can bring to the committee. Others have different but entirely reasonable views.

I also have colleagues and students who have views on the issues the committee will discuss. They will share many of those views, in open letters, op-eds, and other fora. This is a good thing. They have things to say that the community, the administration, and the committee should hear. I don’t disapprove of their views by not signing; I don’t endorse those views, either. I’m just abstaining for now, because my most important job, while the committee’s work is ongoing, is to listen.

Postmodern Community Standards

This is a Jotwell-style review of Kendra Albert, Imagine a Community: Obscenity’s History and Moderating Speech Online_, 25 Yale Journal of Law and Technology Special Issue 59 (2023). I’m a Jotwell reviewer, but I am conflicted out of writing about Albert’s essay there because I co-authored a short piece with them last year. Nonetheless, I enjoyed Imagine a Community so much that I decided to write a review anyway, and post it here.

One of the great non-barking dogs in Internet law is obscenity. The first truly major case in the field was an obscenity case. 1997’s Reno v. ACLU, 521 U.S. 844 (1997), held that the harmful-to-minors provisions of the federal Communications Decency Act were unconstitutional because they prevented adults from receiving non-obscene speech online. Several additional Supreme Court cases followed over the next few years, as well as numerous lower-court cases, mostly rejecting various attempts to redraft definitions and prohibitions in a way that would survive constitutional scrutiny.

But then … silence. From roughly the mid-2000s on, very few obscenity cases have generated new law. As a casebook editor, I even started deleting material – this never happens – simply because there was nothing new to teach. This absence was a nagging question in the back of my mind. But now, thanks to Kendra Albert’s Imagine a Community, I have the answer, perfectly obvious now that they have laid it out so clearly. The courts did not give up on obscenity, but they gave up on obscenity law.

Imagine a Community is a cogent exploration of the strange career of community standards in obscenity law. Albert shows that the although the “contemporary community standards” test was invented to provide doctrinal clarity, it has instead been used for doctrinal evasion and obfuscation. Half history and half analysis, their essay is an outstanding example of a recent wave of cogent scholarship on sex, law, and the Internet, from scholars like Albert themself, Andrew Gilden, I. India Thusi, and others.

The historical story proceeds as a five-act tragedy, in which the Supreme Court is brought low by its hubris. In the first act, until the middle of the twentieth century, obscenity law varied widely from state to state and case to case. Then, in the second act, the Warren Court constitutionalized the law of obscenity, holding that whether a work is protected by the First Amendment depends on whether it “appeals to prurient interest” as measured by “contemporary community standards.” Roth v. United States, 354 U.S. 476, 489 (1957).

This test created two interrelated problems for the Supreme Court. First, it was profoundly ambiguous. Were community standards geographical or temporal, local or national? And second, it required the courts to decide a never-ending stream of obscenity cases. It proved immensely difficult to articulate how works did – or did not – comport with community standards, leading to embarrassments of reasoned explication like Potter Stewart’s “I know it when I see it” in Jacobellis v. Ohio, 378 U.S. 184, 197 (1964).

The Supreme Court was increasingly uncomfortable with these cases, but it was also unwilling to deconstitutionalize obscenity or to abandon the community-standards test. Instead, in Miller v. California, 413 U.S. 15 (1973), it threw up its hands and turned community standards into a factual question for the jury. As Albert explains, “The local community standard won because it was not possible to imagine what a national standard would be.”

The historian S.F.C. Milsom blamed “the miserable history of crime in England” on the “blankness of the general verdict” (Historical Foundations of the Common Law pp. 403, 413). There could be no substantive legal development unless judges engaged with the facts of individual cases, but the jury in effect hid all of the relevant facts behind a simple “guilty” or “not guilty.”

Albert shows that something similar happened in obscenity law’s third act. The jury’s verdict established that the defendant’s material did or did not appeal to the prurient interest according to contemporary standards. But it did so without ever saying out loud what those standards were. There were still obscenity prosecutions, and there were still obscenity convictions, but in a crucial sense there was much less obscenity law.

In the fourth act, the Internet unsettled a key assumption underpinning the theory that obscenity was a question of local community standards: that every communication had a unique location. The Internet created new kinds of online communities, but it also dissolved the informational boundaries of physical ones. Is a website published everywhere, and thus subject to every township, village, and borough’s standards? Or was a national rule now required? In the 2000s, courts wrestled inconclusively with the question of “Who gets to decide what is too risqué for the Internet?”

And then, Albert demonstrates, in the tragedy’s fifth and deeply ironic act, prosecutors gave up the fight. They have largely avoided bringing adult Internet obscenity cases, focusing instead on child sexual abuse material cases and on cases involving “local businesses where the question of what the appropriate community was much less fraught.” The community-standards timbers have rotted, but no one has paid it much attention because they are not bearing any weight.

This history is a springboard for two perceptive closing sections. First, Albert shows that the community-standards-based obscenity test is extremely hard to justify on its own terms, when measured against contemporary First Amendment standards. It has endured not because it is correct but because it is useful. “The ‘community’ allows courts to avoid the reality that obscenity is a First Amendment doctrine designed to do exactly what justices have decried in other contexts – have the state decide ‘good speech’ from ‘bad speech’ based on preference for certain speakers and messages.” Once you see the point put this way, it is obvious – and it is also obvious that this is the only way this story could have ever ended.

Second – and this is the part that makes this essay truly next-level – Albert describes the tragedy’s farcical coda. The void created by this judicial retreat has been filled by private actors. Social-media platforms, payment providers, and other online intermediaries have developed content-moderation rules on sexually explicit material. These rules sometimes mirror the vestigial conceptual architecture of obscenity law, but often they are simply made up. Doctrine abhors a vacuum:

Pornography producers and porn platforms received lists of allowed and disallowed words and content – from “twink” to “golden showers,” to how many fingers a performer might use in a penetration scene. Rules against bodily fluids other than semen, even the appearance of intoxication, or certain kinds of suggestions of non-consent (such as hypnosis) are common.

One irony of this shift from public to private is that it has done what the courts have been unwilling to: create a genuinely national (sometimes even international) set of rules. Another is that these new “community standards” – a term used by social-media platforms apparently without irony – are applied without any real sensitivity to the actual standards of actual community members. They are simply the diktats of powerful platforms.

Perhaps none of this will matter. Albert suggests that the Supreme Court should perhaps “reconsider[] whether obscenity should be outside the reach of the First Amendment altogether.” Maybe it will, and maybe the legal system will catch up to the Avenue Q slogan: “The Internet is for porn.”

But there is another and darker possibility. The law of public sexuality in the United States has taken a turn over the last few years. Conservative legislators and prosecutors have claimed with a straight face that drag shows, queer romances, and trans bodies are inherently obscene. A new wave of age-verification laws sharply restrict what children are allowed to read on the Internet, and force adults to undergo new levels of surveillance when they go online. It is unsettlingly possible that the Supreme Court may be about to speedrun its obscenity jurisprudence, only backwards and in heels.

But sufficient unto the day is the evil thereof. For now, Imagine a Community is a model for what a law-review essay should be: concise, elegant, and illuminating.

The Candle and the Sun

We are pilgrims bearing candles through a vast and dangerous land, on a journey none of us will ever complete.

In the dead of night, when all is darkness, you always have your candle. Though it is dim and flickers, if you attend to what it shows, you will never put a foot wrong.

But if you keep your eyes downcast upon your candle in broad daylight, though the way may lie open before you, you will not see it. The small circle it reaches is not the world, not even a tiny part.

GenLaw DC Workshop

I’m at the GenLaw workshop on Evaluating Generative AI Systems: the Good, the Bad, and the Hype today, and I will be liveblogging the presentations.


Hoda Heidari: Welcome! Today’s event is sponsored by the K&L Gates Endowment at CMU, and presented by a team from GenLaw, CDT, and the Georgetown ITLP.

Katherine Lee: It feels like we’re at an inflection point. There are lots of models, and they’re being evaluated against each other. There’s also a major policy push. There’s the Biden executive order, privacy legislation, the generative-AI disclosure bill, etc.

All of these require the ability to balance capabilities and risks. The buzzword today is evaluations. Today’s event is about what evaluations are: ways of measuring generative-AI systems. Evaluations are proxies for things we care about, like bias and fairness. These proxies are limited, and we need many of them. And there are things like justice that we can’t even hope to measure. Today’s four specific topics will explore the tools we have and their limits.

A. Feder Cooper: Here is a concrete example of the challenges. One popular benchmark is MMLU. It’s advertised as testing whether models “possess extensive world knowledge and problem solving ability.” It includes multiple-choice questions from tests found online, standardized tests of mathematics, history, computer science, and more.

But evaluations are surprisingly brittle; CS programs don’t always rely on the GRE. In addition, it’s not clear what the benchmark measures. In the last week, MMLU has come under scrutiny. It turns out that if you reorder the questions as you give them to a language model, you get wide variations in overall scores.

This gets at another set of questions about generative-AI systems. MMLU benchmarks a model, but systems are much more than just models. Most people interact with a deployed system that wraps the model in a program with a user interface, filters, etc. There are numerous levels of indirection between the user’s engagement and the model itself.

And even the system itself is embedded in a much larger supply chain, from data through training to alignment to generation. There are numerous stages, each of which may be carried out by different actors doing different portions. We have just started to reason about all of these different actors and how they interact with each other. These policy questions are not just about technology, they’re about the actors and interactions.

Alexandra Givens: CDT’s work involves making sure that policy interventions are grounded in a solid understanding of how technologies work. Today’s focus is on how we can evaluate systems, in four concrete areas:

  1. Training-data attribution
  2. Privacy
  3. Trust and safety
  4. Data provenance and watermarks

We also have a number of representatives from government giving presentations on their ongoing work.

Our goals for today are on providing insights on cross-disciplinary, cross-community engagement. In addition, we want to pose concrete questions for research and policy, and help people find future collaborators.

Paul Ohm: Some of you may remember the first GenLaw workshop, we want to bring that same energy today. Here at Georgetown Law, we take seriously the idea that we’re down the street from the Capitol and want to be engaged. We have a motto, “Justice is the end, law is but the means.” I encourage you to bring that spirit to today’s workshop. This is about evaluations in service of justice and the other values we care about.

Zack Lipton

Zack Lipton: “Why Evaluations are Hard”

One goal of evaluations is simply quality: is this system fit for purpose? One question we can ask is, what is different about evaluations in the generative-AI era? And an important distinction is whether a system does everything for everyone or it has a more pinned-down use case with a more laser-targeted notion of quality.

Classic discriminative learning involves a prediction or recognition problem (or a problem that you can twist into one). For example, I want to give doctors guidance on whether to discharge a patient, so I predict mortality.

Generically, I have some input and I want to classify it. I collect a large dataset of input-output pairs and generates a model–the learned pattern. Then I can test how well the model works by testing on some data we didn’t train on. The model of machine learning that came to dominate is that I have a test set, and I measure how well the model works on the test set.

So when we evaluate a discriminative model, there are only a few kinds of errors. For a yes-no classifier, those are false positives and false negatives. For a regression problem, that means over- and under-estimates. We might look into how well the model performs on different strata, either to explore how it works, or to check for disparity on salient demographic groups in the population. And then we are concerned whether the model is valid at all out of the distribution it was trained on–e.g., at a different hospital, or in the wild.

[editor crash, lost some text]

Now we have general-purpose systems like ChatGPT which are provided without a specific task. They’re also being provided to others as a platform for building their own tools. Now language models are not just language models. Their job is not just to accurately predict the next word but to perform other tasks.

We have some ways of assessing quality, but there is no ground truth we can use to evaluate against. There is no single dataset that represents the domain we care about. Evaluation starts going to sweeping benchmarks; the function of what we did in NLP before is to supply a giant battery of tests we can administer to test “general” capabilities of ML models. And the discourse shifts towards trying to predict catastrophic outcomes.

At the same time, these generic capabilities provide a toolkit for building stronger domain-specific technologies. Now people in the marketplace are shipping products without any data. There is a floor level of performance they have with no data at all. Generative AI has opened up new domains, but with huge evaluation challenges.

Right now, for example, health-care systems are underwater. But the clerical burden of documenting all of these actions is immense: two hours of form-filling for every one hour of patient care. At Abridge, we’re building a generative-AI system for physicians to document clinical notes. So how do we evaluate it? There’s no gold standard, we can’t use simple tricks. The problem isn’t red-teaming, it’s more about consistently high-quality statistical documentation. The possible errors are completely open-ended, and we don’t have a complete account of our goals.

Finally, evaluation takes place at numerous times. Before deployment, we can look at automated metrics–but at the end of the day, no evaluation will capture everything we care about. A lot of the innovation happens when we have humans in the loop to give feedback on notes. We use human spot checks, we have relevant experts judging notes, across specialties and populations, and also tagging errors with particular categories. We do testing during rollout, using staged releases and A/B tests. There are also dynamic feedback channels from clinician feedback (star ratings, free-form text, edits to notes, etc.). There are lots of new challenges–the domain doesn’t stand still either. And finally, there are regulatory challenges.

Emily Lanza

Emily Lanza: “Update from the Copyright Office”

The Copyright Office is part of the legislative branch, providing advice to Congress and the courts. It also administers the copyright registration system.

As far back as 1965, the Copyright Office has weighed in on the copyrightability of computer-generated works. Recently, these issues have become far less theoretical. We have asked applicants to disclaim copyright in more-than-de-minimis AI-generated portions of their works. In August, we published a notice of inquiry and received more than 10,000 comments. And a human has read every single one of those comments.

Three main topics:

First, AI outputs that imitate human artists. These are issues like the Drake/Weeknd deepfake. Copyright law doesn’t cover these, but some state rights do. We have asked whether there should be a federal AI law.

Second, copyrightability of outputs. We have developed standards for examination. The generative-AI application was five years ago, entirely as a test case. We refused registration on the ground that human authorship is required; the D.C. District Court agreed and the case is on appeal. Other cases present less clear-cut facts. Our examiners have issued over 200 registrations with appropriate disclaimers, but we have also refused registration in three high-profile cases.

The central question in these more complex scenarios is when and how a human can exert control over the creativity developed by the AI system. We continue to draw these lines on a case-by-case basis, and at some point the courts will weigh in as well.

Third, the use of human works to train AIs. There are 20 lawsuits in U.S. courts. The fair use analysis is complex, including precedents such as Google Books and Warhol v. Goldsmith. We have asked follow-up questions about consent and compensation. Can it be done through licensing, or through collective licensing, or would a new form of compulsory licensing be desirable? Can copyright owners opt in or out? How would it work?

Relatedly, the study will consider how to allocate liability between developers, operators, and users. Our goal is balance. We want to promote the development of this exciting technology, while continuing to allow human creators to thrive.

We also need to be aware of developments elsewhere. Our study asks whether approaches in any other countries should be adopted or avoided in the United States.

We are not the only ones evaluating this. Congress has been busy, too, holding hearings as recently as last week. The Biden Administration issued an executive order in October. Other agencies are involved, including the FTC (prohibition on impersonation through AI-enabled deepfakes), and FEC (AI in political ads).

We plan to issue a report. The first section will focus on digital replicas and will be published this spring. The second section will be published this summer and will deal with the copyrightability of outputs. Later sections will deal with training and more. We aim to publish all of it by the end of the fiscal year, September 30. We will revise the Compendium, and also a study by economists about copyright and generative AI.

Sejal Amin

Sejal Amin (CTO at Shutterstock): “Ensuring TRUST; Programs for Royalties in the Age of AI”

Shutterstock was founded in 2003 and has since become an immense marketplace for images, video, music, 3D, design tools, etc. It has been investing in AI capabilities as well. Showing images generated by Shutterstock’s AI tools. Not confined to any genre or style. Shutterstock’s framework is TRUST. I’m going to focus today on the R, Royalties.

Today’s AI economy is not really contributing to the creators who enable it. Unregulated crawling helps a single beneficiary. In 2023, Shutterstock launched a contributor fund that provides ongoing royalties tied to licensing for newly generated assets.

The current model provides an equal share per image based on their contributions, which are then used in training Shutterstock’s models. There is also compensation by similarity, or by popularity. These models have problems. Popularity is not a proxy for quality; it leads to a rich-get-richer phenomenon. And similarity is also flawed without a comprehensive understanding the world.

For us, quality is a high priority. High-quality content is an essential input into the training process. How could we measure that? Of course, the word quality is nebulous. I’m going to focus on:

  • Aesthetic excellence
  • Safety
  • Diverse representation

A shared success model will need to understand customer demand.

Aesthetic excellence depends on technical proficiency (lighting, color balance) and visual appeal. Shutterstock screens materials for safety both manually and through human review. We have techniques to prevent generation of unsafe concepts. Diversity is important to all of us. We balance and improve representations of different genders, ethnicities, and orientation. Our fund attempts to support historically excluded creator groups. Our goal is shared success.

David Bau

David Bau: “Unlearning from Generative AI Models”

Unlearning asks: “Can I make my neural network forget something it learned?”

In training, a dataset with billions of inputs is run through training, and then can generate potentially infinite outputs. The network’s job is to generalize, not memorize. If you prompt Stable Diffusion for “astronaut riding a horse on the moon” there is no such image in the training set, it will generalize to create one.

SD is trained on about 100 TB of data, but the SD model is only about 4GB of network weights. We intentionally make these nets too small to memorize everything. That’s why they must generalize.

But still, sometimes a network does memorize. Carlini et al. showed that there is substantial memorization in some LLMs, and the New York Times found out that there is memorization in ChatGPT.

In a search engine, takedowns are easy to implement because you know “where” the information is. In a neural network, however, it’s very hard to localize where the information is.

There are two kinds of things you might want to unlearn. First, verbatim regurgitation, second, unwanted generalized knowledge (artist’s style, undesired concepts like nudity or hate, or dangerous knowledge like hacking techniques).

Three approaches to unlearning:

  1. Remove from the training data and retrain. But this is extremely expensive.
  2. Modify the model. Fine-tuning is hard because we don’t know where to erase. There are some “undo” ideas or targeting specific concepts.
  3. Filter outputs. Add a ContentID-like system to remove some outputs. This is a practical system for copyright compliance, but it’s hard to filter generalized knowledge and the filter is removable from open-source models.

Fundamentally, unlearning is tricky and will require combining approaches. The big challenge is how to improve the transparency of a system not directly designed by people.

Alicia Solow-Niederman

Alicia Solow-Niederman: “Privacy, Transformed? Lessons from GenAI”

GenAI exposes underlying weak spots. One kind of weak spot is weaknesses in a discipline’s understanding (e.g., U.S. privacy law’s individualistic focus). Another is weaknesses in cross-disciplinary conversations (technologists and lawyers talking about privacy).

Privacy in GenAI: cases when private data is used in the. If I prompt a system with my medical data, or a law-firm associate uses a chatbot to generate a contract with confidential client data. It can arise indirectly when a non-AI company licenses sensitive data for training. For example, 404 reported that Automattic was negotiating to license Tumblr data. Automattic offered an opt-out, a solution that embraces the individual-control model. This is truly a privacy question, not a copyright one. And we can’t think about it without thinking what privacy should be as a social value.

Privacy out of GenAI: When does private data leak out of a GenAI system? We’ve already talked about memorization followed by a prompt that exposes it. (E.g., the poem poem poem attack.) Another problem is out-of-context disclosures. E.g., ChatGPT 3.5 “leaked a random dude’s photo”–a working theory is that this photo was uploaded in 2016 and ChatGPT created a random URL as part of its response. Policy question: how much can technical intervention mitigate this kind of risk?

Privacy through GenAI: ways where the use of the technology itself violates privacy. E.g., GenAI tools used to infer personal information: chatbots can discern age and geography from datasets like Reddit. The very use of a GenAI tool might lead to violations of existing protections. The example of GenAI for a health-care system is a good example of this kind of setting.

Technical patches risk distracting us from more important policy questions.

Niloofar Mireshghallah

Niloofar Mireshghallah: “What is differential privacy? And what is it not?”

A big part of the success of generative AI is the role of training data. Most of the data is web-scraped, but this might not have been intended to be public.

But the privacy issues are not entirely new. The census collects data on name, age, sex, race, etc. This is used for purposes like redistricting. But this data could also be used to make inferences, e.g., where are there mixed-race couples? Obvious approach is to withhold some fields, such as name, but often the data can be reconstructed.

Differential privacy is a way of formalizing the idea that nothing can be learned about a participant in a database–is the database with the record distinguishable from the database without it? The key concept here is a privacy budget, which quantifies how much privacy can be lost through queries of a (partially obscured) database Common patterns are still visible, but uncommon patterns are not.

But privacy under DP comes at the cost of data utility. The more privacy you want, the more noise you need to add, and hence the less useful the data. And it has a disproprotionate imapct on the tails of the distribution, e.g., more inaccuracy in the census measurements of the Hispanic population.

Back to GenAI. Suppose I want to release a medical dataset with textual data about patients. Three patients have covid and a cough, one patient has a lumbar puncture. It’s very hard to apply differential privacy to text rather than tabular data. It’s not easy to apply clear boundaries between records to text. There are also ownership issues, e.g., “Bob, did you hear about Alice’s divorce?” applies to both Bob and Alice.

If we try applying DP with each patient’s data as a record, we get a many-to-many version. The three covid patients get converted into similar covid patients; we can still see the covid/cough relationship. But it does not detect and obfuscate “sensitive” information while keeping “necessary” information intact. We’ll still see “the CT machine at the hospital is broken.” This is repeated, but in context it could be identifying and shouldn’t be revealed. That is, repeated information might be sensitive! A fact that a lumbar puncture requires local anasthesia might appear only once, but that’s still a fact that ought to be learned, it’s not sensitive. DP is not good at capturing these nuances or these needle-in-haystack situations. There are similarly messy issues with images. Do we even care about celebrity photos? There are lots of contexual nuances.

Panel Discussion

[Panel omitted because I’m on it]

Andreas Terzis

Andreas Terzis: “Privacy Review”

Language models learn from their training data a probability distribution of a sequence given the previous tokens. Can their memorize rare or unique training-data sequences? Yes, yes yes. So thus we ask, do actual LLMs memorize their training data?

Approach: use the LLM to generate a lot of data, and they predict membership of an example in the training data. If it has a high likelihood of being generated, then it’s memorized, if not, then no. In 2021, they showed that memorization happens in actual models, and since then, scale exacerbates the issue. Larger models memorize more.

Alignment seems to hide memorization, but not to prevent it. An aligned model might not return training data, but it can be prompted (e.g., “poem poem poem”) in ways that elicit it. And memorization happens with multimodal models too.

Privacy testing approaches: first, “secret sharer” invovles controlling canaries inserted into the training data. This requires access to the model and can also pollute it. “Data extraction” only requires access to the interface but may underestimate the actual amount of memorization.

There are tools to remove what might be sensitive data from training datasets. But they may not find all sensitive data (“andreas at google dot com”), and on the flipside, LLMs might benefit from knowing what sensitive data looks like.

There are also safety-filter tools. They stop LLMs from generating outputs that violate its policies. This is helpful in preventing verbatim memorization but can potentially be circumvented.

Differential privacy: use training-time noise to provide reduced sensitivity to specific rarer examples. This introduces a privacy-utility tradeoff. (And as we saw in the morning, it can be hard to adopt DP to some types of data and models.)

Deduplication can reduce memorization, because the more often an example is trained on, the more likely it is to be memorized. The model itself is likely to be better (faster to train on and less resources on memorizing duplicates.)

Privacy-preserving LLMs train on data intended to be public, and then finetine locally on user-contributed data. This and techniques in the previous slides can be combined to provide layered defense.

Dave Willner

Dave Willner: “How to Build Trust & Safety for and with AI”

Top-down take from a risk management perspective. We are in a world where a closed system is a very different thing to build trust and safety for than an open system, and I will address them both.

Dealing with AI isn’t a new problem. Generative AI is a new way of producing content. But we have 15-20 years of experience in moderating content. There is good reason to think that generative-AI systems will make us better at moderating content; they may be able to substitute for human moderators. And the models offer us new sites of intervention, in the models themselves.

First, do product-specific risk assessment. (Standard T&S approach: actors, behaviors, and content.) Think about genre (text, image, multimodal, etc.) Ask how frequent some of this content is. And how is this system specifically useful to people who want to generate content you don’t want them to?

Next, it’s a defense in depth approach. You have your central model and a series of layers around it. So the only viable approach is to stack as many layers of mitigations as possible.

  • Control access to the system, you don’t need to let people trying to abuse the system have infinite chances.
  • Monitor your outputs to see what the model is producing.
  • You need to have people investigating anomalies and seeing what’s happening. That can drive recalibration and adjustment.

In addition, invest in learning to use AI to augment all of the things I just talked about. All of these techniques rely on human classification. This is error-prone work that humans are not good at and that takes a big toll on them. We should expect generative-AI systems to play a significant role here; early experiments are promising.

In an open-souce world, that removes centralized gatekeepers … which means removing centralized gatekeepers. I do worry we’re facing a tragedy of the commons. Pollution from externalities from models is a thing to keep in mind, especially with the more severe risks. We are already seeing significant CSAM.

There may not be good solutions here with no downsides. Openness versus safety may involve hard tradeoffs.

Nicholas Carlini

Nicholas Carlini: “What watermarking can and can not do”

A watermark is a mark placed on top of a piece of media to identify it as machine generated.

For example: an image with a bunch of text put on top of it, or a disclaimer at the start of a text passage. Yes, we can watermark, but these are not good watermarks; they obscure the content.

Better question: can we usefully watermark? The image has a subtle watermark that are present in the pixels. And the text model was watermarked, too, based on some of the bigram probabilities.

But even this isn’t enough. The question is what are your requirements? We want watermarks to be useful for some end task. For example, people want undetectable watermarks. But most undetectable watermarks are easy to remove–e.g., flip it left-to-right, or JPEG compress it. Other people want unremovable watermarks. By whom? An 8-year-old or a CS professional? Unremovable watermarks are also often detectable. Some people want unforgeable watermarks, so they can verify the authenticity of photos.

Some examples of watermarks designed to be unremovable.

Here’s a watermarked image of a tabby cat. An ML image-recognition model recognizes it as a tabby cat with 88% confidence. Adversarial perturbation can make the image look indistinguishable to us humans, but it is classified as “guacamole” with 99% confidence. Almost all ML classifiers are vulnerable to this.Synthetic fake images can be tweaked to look like real ones with trivial variations, such as texture in the pattern of hair.

Should we watermark? It comes down to whether we’re watermarking in a setting where we can achieve our goals. What are you using it for? How robustly? Who is the adversary? Is there even an adversary?

Here are five goals of watermarking:

  1. Don’t train on your own outputs to avoid model collapse. Hope that most people who copy the text leave the watermark in.
  2. Provide information to users so they know whether the pope actually wore a puffy coats. There are some malicious users, but mostly not many.
  3. Detect spam. Maybe the spammers will be able to remove the watermark, maybe not.
  4. Detect disinformation. Harder.
  5. Detect targeted abuse. Harder still. As soon as there is a human in the loop, it’s hard to make a watermark stick. Reword the text, take a picture of the image. Can we? Should we? Maybe.

Raquel Vazquez Llorente

Raquel Vazquez Llorente: “Provenance, Authenticity and Transparency in Synthetic Media”

Talking about indirect disclosure mechanisms, but I consider detection to be a close cousin. We just tested detection tools and broke them all.

Witness helps people use media and tech to protect their rights. We are moving fast to a world where human and AI don’t just coexist but intermingle. Think of inpainting and outpainting. Think of how phones include options to enhance image quality or allow in-camera editing.

It’s hard to address AI content in isolation from this. We’ve also seen that deception is as much about context as it is about content. Watermarks, fingerprints, and metadata all provide important information, but don’t provide the truth of data.

Finally, legal authentication. There is a lot of work in open-source investigations. The justice system plays an essential role in protecting rights and democracy. People in power have dismissed content as “fake” or “manipulated” when they want to avoid its impact.

Three challenges:

  1. Identity. Witness has insisted that identity doesn’t need to be a condition of authentication. A system should avoid collecting personal information by default, because it can open up the door to government overreach.
  2. Data storage, ownership, and access. Collecting PII that connects to activists means they could be targeted.
  3. Access to tools and mandatory usage. Who is included and excluded is a crucial issue. Provenance can verify content, but actual analysis is important.

Erie Meyer

Erie Meyer: “Algorithmic Disgorgement in the Age of Generative AI”

CFPB sues companies to protect them from unfair, deceptive, or abusive practices: medical debt, credit reports, repeat-offender firms, etc. We investigate, litigate, and supervise.

Every one of the top-ten commercial banks uses chatbots. CFPB found that people were being harmed by poorly deployed chatbots that sent users into doom loops. You get stuck with a robot that doesn’t make sense.

Existing federal financial laws say that impeding customers from solving problems can be a violation of law. If the technology fails to recognize that consumers are invoking their federal rights, or fails to protect their private information. Firms also have an obligation to respond to consumer disputes and competently interact with customers. It’s not radical to say that technology should make things better, not worse. CFPB knew it needed to do this report because it publishers its complaints online. In those complaints, they searched for the word “human” and it pulled up a huge number of complaints.

Last year, CFPB put out a statement that “AI is Not an Execuse for Breaking the Law.” Bright-line rules benefit small companies by giving them clear guidance without needing a giant team to study the law. They also make it clear when a company is compliant or not, and make it clear when an investigation is needed.

An example: black-box credit models. Existing credit laws require firms making credit decisions to tell consumers why they made a decision. FCRA has use limitations, accuracy and explainability requirements, and a private right of action. E.g., targeted advertising is not on the list of allowed uses. CFPB has a forthcoming FCRA rulemaking.

When things go wrong, I think about competition, repeat offenders, and relationships. A firm shouldn’t get an edge over its competitors from using ill-gotten data. Repeat offenders are an indication that enforcement hasn’t shifted the firm’s incentives. Relationships: does someone answer the phone, can you get straight answers, do you know that Erica isn’t a human?

The audiences for our work are individuals, corporations, and the industry as a whole. For people: What does a person do when their data is misused? What makes me whole? How do I get my data out? For corporations, some companies violate federal laws repeatedly. And for the industry, what do others in the industry learn from enforcement actions?

Finally, disgorgement: I’ll cite a case from the FTC in the case against Google. The reason not to let Google settle was that while the “data” was deleted, the data enhancements were used to target others.

What keeps me up at night is that it’s hard to get great legislation on the books.

Elham Tabassi

Elham Tabassi: “Update from US AI Safety Institute”

NIST is a non-regulatory agency under the Department of Commerce. We cultivate trust in technology and promote innovation. We promote measurement science and technologically valid standards. We work through a multi-stakeholder process. We try to identify what are the valuable effective measurement techniques.

Since 2023, we have:

  • We released an AI Risk Management Framework.
  • We built a Trustworthy AI Resource Center and launched a Generative AI Public Working Group
  • EO 14110 asks NIST to develop a long list of AI guidelines, and NIST has been busy working on those drafts, with a request for information and will be doing listening sessions.
  • NIST will start with a report on synthetic content authentication and then develop guidance
  • There are a lot of other tasks, most of which correspond with the end of July, but these will continue in motion after that.
  • We have a consortium with working groups to implement the different EO components
  • Released a roadmap along with the AI RMF.

Nitarshan Rajkumar

Nitarshan Rajkumar: “Update from UK AI Safety Institute”

Our focus is to equip governments with an empirical understanding of the safety of frontier AI systems. It’s built as a startup within government, with seed funding of £100 million, and extensive talent, partnerships, and access to models and compute.

UK government has tried to mobilize international coordination, starting with an AI safety summit at Bletchley Park. We’re doing consensus-building at a scientific level, trying to do for AI safety what IPCC has done for climate change.

We have four domains of testing work:

  • Misuse: do advanced AI systems meaningfully lower barriers for bad actors seeking to cause harm?
  • Societal imapcts: how are AI systems actually used in the real world, with effects on individuals and society?
  • Autonomous systems: this includes reproduction, persuasion, and create more capable AI models
  • Safeguards: evaluating effectiveness of advanced AI safety systems

We have four approaches to evaluations:

  • Automated benchmarking (e.g. Q&A sets). Currently broad but shallow baselines.
  • Red-teaming: deploying domain experts to manually interact with the model
  • Human-uplift studies: how does AI change the capabilities of novices?
  • Agents and tools: it’s possible that agents will become a more common way of interacting with AI systems

Panel Discussion

Katherine: What kinds of legislation and rules do we need?

Amba: The lobbying landscape complicates efforts. Industry has pushed for auditing mandates to undercut bright-line rules. E.g., facial-recognition auditing was used to undercut pushes to ban facial-recognition. Maybe we’re not talking enough about incentives.

Raquel: When we’re talking about generative-AI, we’re also talking about the broader information landscape. Content moderation is incredibly thorny. Dave knows so much, but the incentives are so bad. If companies are incentivized by optimizing advertising, data collection, and attention, then content moderation is connected to enforcing a misaligned system. We have a chance to shape these rules right now.

Dave: I think incentive problems affect product design rather than content moderation. The ugly reality of content moderation is that we’re not very good at it. There are huge technique gaps, humans don’t scale.

Katherine: What’s the difference between product design and content moderation?

Dave: ChatGPT is a single-player experience, so some forms of abuse are mechanically impossible. That kind of choice has much more of an impact on abuse as a whole.

Katherine: We’ve talked about standards. What about when standards fail? What are the remedies? Who’s responsible?

Amba: Regulatory proposals and regimes (e.g. DSA) that focus on auditing and evaluation have two weaknesses. First, they’re weakest on consequences: what if harm is discovered? Second, internal auditing is most effective (that’s where the expertise and resources are) but it’s not a substitute for external auditing. (“Companies shouldn’t be grading their own homework.”) Too many companies are on the AI-auditing gravy train, and they haven’t done enough to show that their auditing is at the level of effectiveness it needs to be. Scrutinize the business dynamics.

Nicholas: In computer security, there are two types of audits. Compliance audits check boxes to sell products, and actual audits where someone is telling you what you’re doing wrong. There are two different kinds of companies. I’m worried about the same thing happening here.

Elham: Another exacerbation is that we don’t know how to do this well. From our point of view, we’re trying to untangle these two, and come up with objective methods for passing and failing.

Question: Do folks have any reflection on approaches more aligned with transparency?

Nicholas: Happy to talk when I’m not on the panel.

Raquel: A few years ago, I was working on developing an authentication product. We got a lot of backlash from human-rights community. We hired different sets of penetration testers to audit the technology, and then we’d spend resource on patching. We equate open-source with security, but the amount of times we offered people code–but there’s not a huge amount of technical expertise.

Hoda: Right now, we don’t even have the right incentives to create standards except for companies’ bottom line. How do your agencies try to balance industry expertise with impacted communities?

Elham: Technologies change fast, so expertise is very important. We don’t know enough, and the operative word is “we” and collaboration is important.

Nitarshan: Key word is “iterative.” Do the work, make mistakes, learn from them, improve software, platform, and tooling.

Elham: We talk about policies we can put in place afterwards to check for safety and security. But these should also be part of the discussion of design. We want technologies that make it easy to do the right thing, hard to do the wrong thing, and easy to recover. Think of three-plug power outlets. We are not a standard-development organization; industry can lead standard development. The government’s job is to support these efforts by being third-party neutral objectives.

Question: What are the difference in how various institutions understand AI safety? E.g., protect company versus threats to democracy and human rights?

Nitarshan: People had an incorrect perception that we were focused on existential risk and we prominently platformed societal and other risks. We think of the risks as quite broad.

Katherine: Today, we’ve been zooming in and out. Safety is really interesting because we have tools that are the same for all of these topics–same techniques for privacy and copyright don’t necessarily work. Alignment, filters, etc. are a toolkit that is not necesarily specified. It’s about models that don’t do what we want them to do.

Let’s talk about trust and safety. Some people think there’s a tradeoff between safe and private systems

Dave: That is true especially early on in the development of a technology when we don’t understand it. But maybe not in the long run. For now, observation for learning purposes is important.

Andreas: Why would the system need to know more about individuals to protect them?

Dave: It depends on privacy. If privacy means “personal data” than no, but if privacy means “scrutiny of your usage” then yes.

Katherine: Maybe I’m generating a picture of a Mormon holding a cup of coffee. Depending on what we consider a violation, we’d need to know more about them, or to know what they care about. Know the age and context of a child.

Andreas: People have the control to disclose what they want to be know, that can also be used in responding.

Question: How do you think about whether models are fine to use only with certain controls, or should we avoid models that are brittle?

Dave: I’m very skeptical of brittle controls (terms of service, some refusals). Solving the brittleness of model-level mitigations is an important technical problem if you want to see open-source flourish. The right level to work at is the level you can make stick in the face of someone who is trying to be cooperative. Miscalibration is different than adversarial misuse. Right now, nothing is robust if someone can download the model and run it themselves.

Erie: What advice do you have for federal regulators who want to develop relationships with technical communities? How do you encourage whistleblowers?

Amba: Researchers are still telling us that problems with existing models are still unsolved. There are risks that are still unsolved; the genie is still out of the bottle. We’re not looking out to the horizon. Privacy, security, and bias harms are here right now.

Nicholas: I would be fine raising problems if I noticed them; I say things that get me in trouble in many circumstances. There are cases where it’s not worth getting in trouble–when I don’t have anything technically useful to add to the conversation.

Dave: People who work in these parts of companies are not doing it because they love glory and are feeling relaxed. They’re doing it because they genuinely care. That sentiment is fairly widespread.

Andreas: We’re here and we publish. There is a fairly vibrant community of open-source evaluations. In many ways they’re the most trustable. Maybe it’s starting to happen for security as well.

Katherine: Are proposed requirements for watermarking misguided?

Nicholas: As a technical problem, I want to know whether it works. In adversarial settings, not yet. In non-adversarial settings, it can work fine.

Katherine: People also mention homomorphic encryption–

Nicholas: That has nothing do with watermarking.

Katherine: –blockchain–

Nicholas: That’s dumb.

Raquel: There’s been too much emphasis on watermarking from a regulatory perspective. If we don’t embed media literacy, I’m worried about people looking at a content credential and misunderstanding what it covers.

Question: Is there value in safeguards that are easy to remove but hard to remove by accident?

Dave: It depends on the problem you’re trying to solve.

Nicholas: This the reason why depositions exist.

Raquel: This veers into UX, and the design of the interface the user engages with.

Question: What makes a good scientific underpinning for an evaluation? Compare the standards for cryptographic hashes versus the standards for penetration testing? Is it about math versus process?

Nitarshan: These two aren’t in tension. It’s just that right now ML evaluation is more alchemy than science. We can work on developing better methods.

And that’s it, wrapping up a nearly nine-hour day!