How Cornerstone AI is making data ‘right’ for healthcare industry

To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.

AI has the potential to transform healthcare. Be it predicting the risk of terminal diseases or developing novel drugs, companies are leveraging data-driven algorithms to improve the quality of patient care in every way possible. The use cases are only expected to grow from here, but there are also certain hurdles along the way. Case in point: The lack of high-quality datasets.

Health organizations cumulatively generate about 300 petabytes of data every single day. This information is stored across systems yet not effectively used due to poor preparation. Basically, data teams, which tend to create manual rules for data cleanup, are struggling to keep up with the growing volumes of information. They spend most of their time, almost 80%, on getting the data ready – making it accurate, connected and standardized – rather than actually exploring and analyzing it for potential, life-saving AI applications. 

Cornerstone AI’s comprehensive solution

To solve this, San Francisco-based company, Cornerstone AI, has launched a solution that automatically characterizes, harmonizes and cleans healthcare data in a fraction of the time taken by traditional methods. The company also announced it has raised $5 million in seed funding.

According to Cornerstone, the algorithm of its platform uses a combination of custom Python and R code to scan each table and data point — inferring their structure and validity — and then organizes the tables for analysis while removing and correcting all notable errors.

“A data team doesn’t have to configure anything in the system other than telling it what the patient ID field is. The system learns the data structure automatically and then learns the patterns in the data automatically. Data teams can be up and running in the system on the first day, reviewing the AI findings in the UI,” said Michael Elashoff, the cofounder and CEO of the company.

After correction, the findings are shared as part of a data quality report.


While the company is still in its early stages, it has deployed its solution with quite a few healthcare companies. In one case, a medical device company, which used to spend six months on data cleaning, was able to speed up the process by 20 times or to just nine days. The system already covers the entire scope of structured and semi-structured healthcare data, starting from medical records, clinical trials, registry data, claims, digital health and sensor data. 

“In a recent validation study we did, the system identified 98% of data issues, with a specificity of approximately 99.9%,” the CEO added, claiming that the platform can run 750 million records in about two hours.

He also clarified that unstructured information, such as faxes or pathology reports, remains outside the scope of the platform, at least as of now.

Plan ahead

With this round of funding, which was led by Healthy Ventures, Cornerstone plans to continue developing its product and rope in more customers, potentially long-term contracts.

“Customers have told us that the machine learning (ML) models that our system builds for data cleaning have applications beyond getting a high-quality dataset,” Elashoff said. “For example, the patterns and relationships the system identifies can be used to identify patients whose treatment response or surgical recovery differs from what it should be. In those cases, the software is identifying potential clinical insights that may have been hidden in the data complexity. So, we are using the funding to build out this functionality to enable companies to get a lot more out of their data.”

Other notable players in the data cleaning and preparation space are Datadog and New Relic, but they are not specific to the healthcare industry, like Cornerstone AI.

“We developed the algorithm specifically to work with medical data, with its high complexity and high error rate. We had to develop brand-new ML techniques to have our models not be thrown off by the very errors we are trying to detect,” the CEO emphasized. 

Beyond this, unlike other systems, the company’s platform generates an explanation for every issue it finds and provides a built-in regulatory-grade audit trail tracking all the changes.

Originally appeared on: TheSpuzz