Generative AI and Machine Learning Contracts

Generative AI and machine learning (ML) systems are presenting contracting challenges not always seen in traditional IT contracts.

As with all successful tech contracts, the parties entering the contract (and their lawyers) need to understand the technology to ensure they are best protecting their position and accepting risk they can manage.

To do this we must ensure we understand the way the AI tool is offered to market (usually cloud based), the inputs and outputs for the AI, the training data and any intended use of the tool by the parties. We also need to keep on top of the multiple new models and tools entering the market.

This article considers some issues inherent in generative AI and ML contracts, and strategies for addressing them effectively.

Data Ownership

AI and ML models usually ‘learn from’ large data sets. This often includes prompts and training data provided by customers.

For customers, it is essential to know if the provider will reuse its prompts and data for their other clients. Customers should confirm in the contract any IP rights they hold in the data,  and make sure they understand the scope of the provider’s intended use of the data. The sensitivity of data sets, and the presence of any customer or employee personal information that could be accessed by the provider should be considered and addressed in the contract.

Providers of the AI tech generally must obtain a license to use the customer’s prompts and training data. A specific licence should also be sought for any of the provider’s intended use of the customer data that goes beyond the provider’s supply of the services and tech to the customer. Providers should avoid any assignment of the IP in their existing data sets and tech.

Remember there are various types of data that have value and must be considered: training data (used to train an algorithm or machine learning model), inputs /prompts (data fed into the AI), and outputs (data produced using the AI in response to inputs).

Data Use

Some contracts however focus too much on data ownership – not considering that data itself cannot always be owned (although the compilation of it can be) and IP ownership alone may not restrict the provider or customer’s data use.

A broader set of data rights must be considered. Data exploitation rights, confidentiality obligations, and periods of use will be significant factors in negotiations. A thorough understanding of the parties’ commercial requirements enhances the likelihood of achieving a mutually beneficial and balanced agreement.

A common example is the customer wants to protect its data, but the provider wants to offer the AI system – which has been trained on the customer data – to its other customers. This could be addressed by the provider undertaking that the training data provided by the customer will not be disclosed to new customers. The providers may also give the customer comfort via confidentiality undertakings, data protection measures, protocols for data breaches and damages if the provider breaches the contract.

Confidentiality

Customers will likely want to restrict which and how many of the provider’s personnel can access the customer’s prompts, training data and outputs. Treating outputs as confidential until its sensitivity is known is advisable. On the other hand, providers should aim to limit confidentiality obligations so they can make wide use of the outputs of their tech, and if required the customer’s training data and prompts.

Inaccuracies

Generative AI and ML outputs carry the risk of errors, potentially causing harm to the customers if relied on. Providers can include broad disclaimers to mitigate this liability. Customers should seek assurances regarding accuracy of the outputs (to the extent required) and other errors that could limit the usefulness of the tech for their purposes – for example discrimination issues embedded in the model. These assurances can be set out with the other requirements in the service level agreements (SLAs) covering AI functionality.

Third-Party Risks

There is a risk generative AI outputs could expose customers to third-party actions related to IP infringement. Customers should seek indemnification clauses. At a minimum, IP indemnities concerning software and outputs should be included – although resistance from providers should be expected. Providers will need to include language acknowledging the provider's limited control over AI-generated outputs.

An understanding of the technology and the data involved is critical to negotiating generative AI and machine learning contracts. Parties and their lawyers must understand the data involved and set clear boundaries on its use.

Latest insights

More Insights
Curiosity line teal background

China Cybersecurity and Data Protection: Monthly Update - December 2024 Issue

17 minutes Dec 23 2024

Read More
featured image

EDPB weighs in on key questions on personal data in AI models

1 minute Dec 20 2024

Read More
flower

NEWSFLASH - The UK’s New Consultation on AI and Copyright: Purr-suing Balance?

Dec 19 2024

Read More