San Diego-based Gretel is one of the first companies focusing on the commercial business use of synthetic data. The two-year-old startup this week announced the general availability of its privacy engineering toolkit containing APIs and services that enable users to classify, transform, and generate high-quality synthetic data.
If you don't know what comprises synthetic data, you have plenty of company. Synthetic data is information that's artificially manufactured by machines rather than generated by real-world events. Synthetic data is created algorithmically and is used as a stand-in for test datasets of production or operational data to validate mathematical models and, increasingly, to train machine-learning (ML) models. This substitutional data helps preserve privacy in personal information and can save IT systems a great deal of time, trouble, and money in the process.
When ML models are being created, the data has to be pure. If there are errors, duplications, or other hiccups in real data in building such models, problems inevitably will surface, costing time and money for the company. With more and more artificial intelligence and ML models being used in various use cases, the need for synthetic data is rapidly growing. Analysts have projected that more synthetic than original data will be used to build ML models by the end of the decade.
Being able to classify, transform, and generate high-quality synthetic data removes privacy bottlenecks for numerous development and workflow processes that prevent data sharing and stifle innovation, CEO Ali Golshan told ZDNet.
"We've built a privacy toolkit that's accessible to all developers and scalable to any enterprise-ready project," Golshan said. "With Gretel, anyone can classify, anonymize, and synthesize data that's privacy-proven and highly accurate in just a few clicks. Our advanced privacy guarantees also give users complete control to adjust data privacy levels, based on their project needs, and guard synthetic data against adversarial attacks."
Golshan said the company has tested its products in an open beta program for more than a year. It has incorporated improvements to its toolkit based on feedback from more than 60 enterprise engagements, a community of thousands of users, and open-source users who have downloaded the SDK more than 70,000 times, according to the company.
Gretel has been working with organizations over several vertical industries, Golshan said, including health care, life sciences, finance, and gaming. Some of its recent work includes creating synthetic genomic data and synthetic time-series banking data.
Interest in Gretel's privacy engineering tools is supported by analysts' forecasts that by 2030, synthetic data will completely overshadow real data in AI models, Golshan said
"By building flexible, secure, and easy-to-deploy tools to support data-driven developers, Gretel will open a world of progress across industries," said Max Wessel, Executive Vice President & Chief Learning Officer at SAP.
Gretel's all-in-one privacy stack is comprised of engineering tools that:
Create highly accurate, privacy-proven synthetic data
Seed pre-production systems with safe, statistically accurate datasets
Identify and remove sensitive data to reduce PII-related risks
Augment and de-bias datasets to train ML/AI models fairly
Anonymize sensitive data in real time, for data at scale
Gretel is also previewing an AWS S3 storage connector for its toolkit. For more information, go here. Gretel's services can be accessed through its SaaS cloud offering or CLI for local environments.