Elucidata’s MLOps platform boosts data quality for drug discovery

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

From young startups to major conglomerates, almost every organization in the life sciences industry is looking at AI to drive R&D and come up with novel drugs and therapies. The effort revolves around training predictive models on massive datasets pertaining to the problem at hand. But, for many organizations, gathering high-quality data continues to be a major problem.

Essentially, most scientists need to work with multi-omics, bio-assays, clinical, EHR and other forms of biomedical data that are usually stored in multiple systems within their organizations or sourced externally. This kind of information is not only siloed but also so diverse that it becomes difficult to base an accurate predictive model upon it.

“Deciphering insights from biomedical data is at the heart of addressing the world’s most important breakthroughs in biopharmaceuticals,” Ashish Venkataramani, partner at Eight Roads Ventures, notes. “There is an explosion in the generation of these complex, heterogeneous datasets, driven by innovations in sequencing technologies and the proliferation of connected devices.”

Elucidata solves data quality challenge

To help life sciences organizations make the most of this opportunity, without worrying about the heterogenous or siloed nature of data, Massachusetts-based Elucidata offers an MLOps platform called Polly. The solution provides R&D teams access to clean, curated biomolecular data that can be queried and analyzed over a graphical user interface (GUI) or programmatically. Elucidata today raised $16 million in a Series A round of funding.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

According to the company, Polly gathers, transforms and harmonizes data into standardized machine-readable formats, enabling enterprises to use it for their machine learning applications. The solution has access to 70TB of ML-ready biomedical data, or more than 1.5 million datasets and 4.1 million samples, from more than 30 public and proprietary sources. This covers more than 26 biological data types.

The platform uses connectors to pull data from sources such as TCGA and Gene Expression Omnibus (GEO). Then, it converts the information into flat files and adds ontology-backed metadata and labels to fine-tune it into curated biomedical data, ready for machine learning. Finally, the cleaned information is made available in the customer’s data warehouse.

“Organizations often underestimate the importance of data quality, and as a result, a lot of AI/ML initiatives are compromised. We’re on a mission to derisk such initiatives in life sciences R&D by empowering them with high-quality biomedical data at every stage of the R&D process,” Abhishek Jha, CEO and co-founder of Elucidata, said. 


Elucidata claims to offer the ability to serve companies in various stages of the drug discovery process. In fact, the company says Polly has already seen adoption from over 30 life science industry players, including Genentech, Pfizer and Janssen as well as research institutes like Stanford and the Bill & Melinda Gates Foundation. The platform can not only accelerate the pace of implementing AI initiatives but also reduce the time to downstream analysis by up to 70%, the company adds.

With this round of funding, which was led by Eight Roads Ventures, Elucidata will focus on deepening its product capabilities in translational drug research and allied markets, scaling go-to-market initiatives and accelerating the global expansion of operations. 

“When it comes to AI projects, the quality of labeled data can play a vital role in the difference between success and failure,” Nihal Sinha, MD and partner at F-Prime Capital, which participated in the round, said in a statement. 

“Elucidata is providing life science companies access to high-quality datasets and, as a result, effectively accelerating their R&D efforts towards developing innovative solutions that can improve human health,” Singha added.

According to RBC Capital Markets, by 2025, the compound annual growth rate of data for healthcare will touch 36% — 6% faster than manufacturing, 10% faster than financial services and 11% faster than media and entertainment.

Originally appeared on: TheSpuzz