Diving Deeper into Artificial Intelligence: Understanding the Risks in Incorporating AI Technology into the Workplace

Nancy E. Wolff

Partner, New York nwolff@cdas.com

Elizabeth Safran

Staff Attorney, New York ealtman@cdas.com
T

he advent of artificial intelligence, or AI, brings greatly diverse opportunities to a wide range of industries and endeavors in many fields. Its applications as an academic resource and its capacity to generate new content have garnered much attention in recent months.  However, there are a number of important legal risks to weigh in using AI technology.

To get up and running—i.e., to respond to the parameters of a user prompt and be able to output a congruous answer or new content that appears authentic —AI systems have been developed using copious amounts of data. This frequently consists of articles, images, and other content indiscriminately fed into an AI system as a dataset, often counting into the millions, if not more. The means by which some AI developers currently accumulate data for training is without legal permission and outside of the bounds of copyright law; as a result numerous lawsuits have been filed against AI companies that violate copyrights on a massive scale. Many AI companies have gathered data for their systems by lifting data wholesale from the Internet including from notorious pirate sources, without obtaining a permission or license from the rightsholders whose work is being used, and typically without their knowledge. Although many content-using and content-creating companies have or are establishing licensing procedures for use of copyrighted material as data for development of AI systems a number of leading AI companies have not. Based on the number of pending lawsuits in the U.S, and abroad, there is uncertainty over the legality of these platforms and inherent risk in using these models. The systems’ developers, rather than their users, have thus far been the targets of these legal actions, but it is important for users, and particularly companies looking to implement AI technology as a basis for their businesses, to be aware of the current disputes over infringement liability  due to use of unlicensed data underlying their AI systems.  At a minimum, if users invest in AI systems that are subsequently held to be unlawful, they risk losing the investment.  More critically, users may be found to be infringing in their own right based on their use of infringing systems, regardless of whether they “knew” the system infringed.

This awareness of infringement becomes even more important in the context in which an AI system’s users are more likely incur liability: where the output generated by an AI system closely hews to or has outright copied a copyrighted work within the pool of data from which it trained and is an obvious infringing derivative work or simply an exact copy. This scenario is, unfortunately, of greater likelihood where an AI system is used in scientific research. Where a specific or niche area of scientific or academic study is concerned, there may only be a limited amount of scholarly material in existence which it could have been used for development. It is conceivable that out of millions of pieces of data, an AI may only have ingested in the 1,000s to 100,000s of articles on a particular scientific subject. In such case, there is a greater risk that the output it will generate may infringe any given training article by copying or lifting portions or the entirety of the work verbatim into its response. Where it is known that copyrighted works were ingested without consent, which is the case with many documented AI systems, infringement may even arise from uses of the system which do not in themselves create “substantially similar” output.

 

In fact, analogous legal precedent already bears this risk out. Numerous photocopying cases from the early to mid-1990s, for example, Basic Books, Inc. v. Kinko’s Graphics Corp., 758 F. Supp. 1522 (S.D.N.Y. 1991) and American Geophysical Union v. Texaco Inc., 60 F.3d 913 (2d Cir. 1994), involved the wholesale and systemic copying of, in the first instance, excerpts from copyrighted books for use as college course packets, and in the second, articles from scientific and technical journals for reference use in further scientific research, which both courts held to be unprotected, infringing practices. Like Kinko’s and Texaco, the use of generative AI in a systemic or institutional business setting portends a finding of infringement. As such, it is important to be aware that AI technology may be limited in its capacity to generate truly “new” content, with infringement liability attaching to scenarios where its output lifts directly from the copyrighted works on which it trained, and even where it does not.

However, infringement isn’t the only risk AI users assume in integrating AI technology into the workplace. Hallucinations and bias raise additional technical and ethical concerns. AI “hallucinations” occur when an AI system generates false information, often confidently and seamlessly, through the guise of fluent, coherent text, on par in its capacity to sound human with correct answers it yields. Hallucinations may arise from poor data quality, training deficiencies, unclear prompts, or even bias stemming from the system’s previous generations. AI bias, in turn, results from biases inherent in its training data—for example, a facial algorithm that is unwittingly trained to recognize white faces more easily than black faces because it was given more examples of the former—which the AI system ultimately perpetuates. In either event, the AI system, far from acting as a conduit to valid and accurate results, integrates its own errors into the output process. Users should be aware of these risks—particularly with the current generation of ChatGPT and other AI systems—as a very real possibility.

Ultimately, across the range of potential pitfalls attending the incorporation of AI technology into users’ companies and business practices, the issues of hallucinations, bias, infringing output material, and unlicensed training data all derive from the current lack of transparency surrounding the training of some AI systems. Considering the current difficulties in understanding what material an AI has been fed, how that material has been obtained, and what, if any safeguards have been put in place to verify the quality and breadth of the material itself, it is important to equip yourself with an understanding of the risks that may arise from using AI. Although this era-shifting technology presents a wealth of possibilities across industries and fields, like any invention, its potential also comes with real-world drawbacks and flaws. Equipped with the knowledge of exactly where those flaws lie, users may proceed with a better sense of AI’s true capacities, weighing the risks involved so as to implement their use of these systems in a more measured and secure way.

Filed in: Art Law, Photography, & Design, Copyright Counsel, Data Privacy, Digital Media, Entertainment, IP/Internet Transactions, Litigation, News, Photography / Arts / Design, Policy and Government Affairs, Pre-Broadcast Review, Fair Use and Clearance, Social Media, Software / Apps, Software and Apps, Technology and Venture, Trademarks and Brands

October 13, 2023