As creators and IP holders argue with generative AI companies over the correct protocol to use data for training generative AI systems, a new non-profit firm, Fairly Trained, is offering certifications to companies who train their generative AI models on "consented" data.
"We believe consumers deserve to know which companies think creator consent is important and which don't. So, we certify AI companies that don't use any copyrighted work without a license," the company, dubbed Fairly Trained, said on its homepage.
"Fairly Trained exists to make it clear which companies take a more consent-based approach to training, and are therefore treating creators more fairly," the company added, explaining that it came into being after identifying the emerging divide between two types of generative AI companies - those who get the consent of training data providers, and those who don't, claiming they have no legal obligation to do so.
Fairly Trained is led by CEO Ed-Newton Rex, who was earlier employed at Stability AI and served as the vice president for audio. Rex, according to a Bloomberg report, quit Stability AI after raising concerns over the usage of copyright data for training generative AI systems.
Advisers of the firm include the likes of co-founder and CTO of Siri Tom Gruber, and Maria Pallante, president and CEO of the Association of American Publishers.
Currently, the firm is offering a single certification, which it has named L Certification or the Licensed Model Certification. This certification can be obtained by any generative AI system provider who has used "consented" data to train its systems.
To get the certification, the company signing up for it must ensure that all the training data must have certain prerequisites. First, the data used must be provided to the model developer to be used as training data according to a contractual agreement with a party that has the rights required to enter such an agreement.
Second, the data used for training must be available under an open license for appropriate usage be in the public domain globally, or be fully owned by the model developer.
"Obtaining a license from an organization that itself licenses from creators (e.g. a record label or a stock image library) is considered consent for certification purposes," the company said on its portal.
Any models used to generate any synthetic data to train generative AI systems should also follow the same protocols, it added.
In order to complete the sign-up for the certification, companies need to have a robust data due diligence process in place and maintain records of the training data that was used for each model training.
Any company providing generative AI systems or large language models can start the application process by filling out a short online form, post which Fairly Trained gets in touch with the company to take it through the submission process.
"When you send us your written submission, you pay the submission fee; we then review your submission, potentially asking for further information," the company said on its portal.
If the submission is successful, the company is expected to pay an annual certification fee, ranging from$500 to$6,000 depending on its revenue, to Fairly Trained before it is issued the certificate.
Fairly Trained also warned that if any company - that has already been issued the certificate - changes its training data practices that are against its rules or categories, its certification will be rescinded.
"We reserve the right to withdraw certification without reimbursement if new information comes to light regarding your AI practices that would change the outcome of your certification," the company said on its portal.
So far, eight startups have been certified by Fairy Trained.
Next read this: