A blueprint for the perfect Gen AI data layer: Insights from Intuit

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


In VentureBeat’s reporting on generative AI, one company in particular has stood out among enterprise companies for its speed and adeptness at deploying the technology at scale.

That company is Intuit. In September, Intuit introduced an LLM-drive assistant, called Intuit Assist, across all of its products, including TurboTax, QuickBooks, Credit Karma, MailChimp. It announced its own Gen AI operating system in June that orchestrates the large language model (LLM) activity across the entire company – a complete vision that, as far as I’m aware of, came well before that of any other major company.

I recently interviewed Alon Amit, Intuit’s VP of Product Management, about what is arguably the most important part of any company’s journey to realizing Gen AI success: building a best-practice data management layer.

Amit explains that Intuit took several years to work through this data layer, to makes sure its data was well integrated, accurate, governed, and non-replicated. Only after doing this were LLMs able to call upon that data to allow personalized interactions with Intuit’s 100 million small business and consumer customers.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

 

Learn More

During the interview, Amit presented a single slide depicting Intuit’s data layer. The slide indicates the best practice of how a data layer should look, at least according to Intuit.

If you’re an enterprise data leader, I encourage you to click on the video link above, because Amit walks us through step-by-step the most important areas the company is working on, including the areas it needs to perfect in 2024. (The interview was part of our AI Unleashed event; the event’s full video is included above)

Here are some cliff-notes, based on what stood out for me:

1. The Data Map Registry: Intuit built this universal repository for every single data asset, real-time and batch, that gets produced in the company. All data schemas are included. It ensures assets are well governed, including that the owners and purpose of the assets are known. Alon conceded this process hadn’t been perfected, but that Intuit expects to “hit very close to a hundred percent” by the end of next year.

2. Culture of caring about “data as a product”: Aided by this data map, Intuit instilled a culture among its developers, product managers, engineers and others that even beyond the data within products shipped to customers, any data at all that gets generated is considered “product.”

3. Data schema changes are governed uniformly: Any data schemas, of click-stream data or of third-party data coming into Intuit’s data ecosystem, are governed the same way, to ensure they don’t break downstream data systems, such as those needed to support generative AI. This data inflow, seen on the left-side of the chart, includes Intuit’s own “domain events,” for example, which include when Intuit’s developers create an event bus for real-time data flowing from an application. This is all automatically populated within Intuit’s data lake. 

4. Governed data derivation: Derivation is a generic term for essentially any transformation happening on data beyond source data. It includes, for example, computations for analytics, extraction of features for AI models, and attributes for marketing campaigns. So if a developer derives a feature that is already in the data registry, they’ll be informed the feature is already there, to avoid duplication. 

5. Real-time data derivation: This is on the roadmap for 2024. Amit was careful to say that the company isn’t done in its quest for perfection. The company is working to build “real time paved paths for data derivation,” or the ability of developers to make sure that when a customer asks a question, or when an expert is offering support, Intuit will know the actions the user takes in near real-time.

Originally appeared on: TheSpuzz

Scoophot
Logo