alifornia enacts one of the first AI transparency laws, on September 28, 2024, to commence on January 1, 2026. California A.B. 2013 adds Title 15.2 to Part 4 Division 3 of the California Civil Code and requires certain disclosures in connection with the creation of AI training models.
Coverage
The primary targets of the bill are AI developers and generative AI models. Generative AI is defined in the law as, “artificial intelligence that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data.” The term Developer is broadly defined to include anyone that creates and trains an AI, but also those that make a substantial modification to an AI. What constitutes a “substantial modification” is also defined broadly and includes releasing a new version, release, or update that materially changes the functionality or performance of an AI including retraining or even fine tuning an AI.
The bill covers any generative AI that is made available to Californians that is created or substantially modified on or after January 1, 2022, regardless of whether any compensation was earned. Developers have until January 1,2026 to ensure that their currently covered and any future covered AI systems meet the necessary requirements. While it applies to Developers who make their models available in California, it likely will impact all developers as it would not be reasonable to restrict access to an AI model to a state as significant as California in population.
Requirements
California A.B. 2013 requires a developer to post documentation about its generative AI system on its website. The documentation does not require details of what is included but rather is a high-level summary of the datasets that the developer used to create, train, or substantially modify a given covered AI system or service. As part of that high level summary, there are numerous requirements specified in the law. various pieces of information about the data that was used to train the AI and when and how that data was used need to be disclosed.
- The developer must disclose the sources or owners of the datasets they used.
- There must be a description of how the datasets are used to further the intended uses of the artificial intelligence system or service.
- The developer must reveal how many data points were in the datasets used, which can be done in ranges, and an estimated number can be used when dealing with dynamic datasets.
- There must be a description of what types of data points were used in the datasets.
- a. For datasets that use labels, this means the types of labels used with the data points
-
- b. For datasets that do not use labelling, this means an explanation of the general characteristics of the data points.
- The developer must disclose if any copyrighted, trademarked, or patented materials were included in any of the datasets they used. They must also note, if instead, all the content in the datasets they used is entirely in the public domain.
- The developer must indicate if they purchased or licensed the datasets that they used.
- They must indicate if the datasets contain any personal information, as defined in subdivision (v) of Section 1798.140.
- They must indicate if the datasets contain any aggregate consumer information, as defined in subdivision (b) of Section 1798.140.
- The developer must indicate if there was any cleaning, processing, or other form of modification to the data they used for training. They must also include a statement about what the intended purposes of those modifications were in terms of the performance and use of the artificial intelligence or service in question.
- The developer must reveal the time period that the data contained in the datasets they are using were collected. They must also indicate if the collection of data for those datasets is still ongoing.
- The developers must include the dates when a dataset was first used in the development of the artificial intelligence in question.
- The developer must note if the AI used or continuously is using synthetic data generation in its development.
- a. The developers can also optionally include a statement describing the functional need or desired purpose of using synthetic data as part of the training of the AI in relation to the intended purpose of that AI.
Exceptions
There are three exceptions to the above required documentation to cover highly sensitive uses. First, any generative AI whose sole purpose is to help ensure security and integrity are exempt. Second, a generative AI whose sole purpose is operating aircraft in the national airspace is also exempt. Third, a generative AI developed for military, national security, or defense purposes and that is made available only to a federal entity is exempt.
Takeaways
This bill serves as a strong first step towards achieving greater transparency in the rapidly evolving AI landscape. It requires disclosures for the inclusion of copyrighted materials, an area of heightened concern and litigation. The limited scope of the exceptions, the fact that many of the tech giants are based in California, and the practical issues with trying to exclude Californians from using an AI means that most generative AI and their developers will have to comply with these regulations and as a result will have more national and international impact .
This bill does not answer all the current issues raised with generative AI. Significantly, it only requires disclosures, not details. For example, a developer might have to indicate if there were any copyrighted materials in the datasets that it used, but would not be required to reveal any details concerning the materials or the proportion of the copyrighted works to the entirety of the datasets… Further, this law contains no enforcement provisions or penalties making it difficult to hold accountable any developers who do not comply. This issue may be resolved before the 2026 compliance date as this bill seems to be just a first step towards greater transparency and will allow individuals to have more information about a generative AI model with which they might interact.
By Nancy Wolff with JT Fitzpatrick
Filed in: AI
November 22, 2024