Expert Insights on Leveraging Data Quality and Data Governance for AI Readiness

Author: Sue Pittacora

In a recent CDO magazine interview, Sue Pittacora, Chief Strategy Officer at Wavicle Data Solutions, and Sarang Bapat, Director of Data Governance at Mitsubishi Electric Trane US (METUS), sat down for an in-depth discussion about the importance of maintaining data quality for AI applications.

In continuation from the first part of the interview, this conversation with Sue and Sarang dives into how data quality initiatives align with and support a company’s overall data governance strategy.

Throughout the discussion, you can learn about maintaining accuracy of data used in AI systems to achieve effective and trustworthy AI outcomes.

Watch the full interview here or scroll down for a detailed transcript of Sarang’s insights.

Sue: Hello and welcome to the CDO Magazine interview series. I am Sue Pittacora with Wavicle Data Solutions, and I’m delighted to be joined here today by Sarang Bapat, the Data Governance Director at Mitsubishi Electric Trane US, otherwise known as METUS. Welcome, Sarang.

Sarang: Thank you, Sue.

What are the current challenges around data compliance in your industry, and how are you addressing them?

Sarang: In the financial services and regulatory domain, I have attended the REG exams and provided a lot of evidence for the Fed or the OCC. I’ll give you an example from the anti-money-laundering space that I worked with in the past. Some of the big things they want to know are how you came to assess your risk in the first place and what the drivers of your risk and risk appetite are.

In anti-money-laundering domains, the simple things like the geography of the customer (is the customer domestic or international) matters a lot because, depending upon your bank accounts abroad and the whole chain of where money is coming from, it’s considered risky right out of the bat if your business is all over the world.

You’re pretty familiar with that with McDonald’s, right? You guys are all over the world, so that makes it very risky. Another one is what type of products you are using. If it is a basic checking or savings account, it’s not that bad. But if you’re talking about wire transfers, stored value cards, and money orders, those are the perfect vehicles for people to launder money. If you have a bigger portfolio of those, your risk assessment is going to be a lot more rigorous. Same thing with the type of customer — Is it personal or commercial? If you have a lot of entities that do business with you that are shell companies, that’s not the right thing for the regulators if there is no visibility around them.

If you say these are the elements that drove our risk engine around, they’re like, “so now, tell me if this data is coming from your systems. If so, where is the data coming from and where is it going? And who owns it when issues come up? How is that lineage and how good is the quality of that data and how do you know?”

All these questions are talking about data lineage: where data is coming from, where it is going, and who is responsible for it. Is it data stewardship? If you have already set that up, it’s a clear delineation to say, “here is our finance data model and all the data stewards are responsible for the credit, compliance, and other parts of the business.” It’s an easy explanation, and with readily available information, you can let regulators know that you have thought about all of that.

Same thing with quality, lineage, stewardship, and then what kind of controls you have put in place. That’s another question regulators are going to ask: “Where are those controls? Which is part of data governance?” With the regulatory landscape, all of these things become manageable. Even if there are holes, it is manageable because you’re doing things in a predictable way. If you have a well-thought-out data governance strategy, would you miss a thing or two based on their observation? Maybe there might be some observations, but you already have the framework. There might be a tweaking to that existing framework that might be required. But that’s why data governance program would come in handy in that instance.

Sue: Absolutely.

What is the role of AI in data governance at METUS? And how are they coming together?

Sarang: We are at the infancy of our journey. Everybody’s figuring this out. In the true sense, we don’t know how it would help or what might be potential use cases. If you ask the question in a different analogy to say, “do you invest in mutual funds?” And people say, “no, I don’t do that.” It’s like, “so, do you have a 401k?” “Yes, I do.” Then you’re already investing through that into mutual funds, you just don’t know it.

If I’m using my data governance tools today, I already have an AI component built into it. I have adaptive rules that are running. That is learning my data behind the scenes. I get observations to say that I found four different shapes for your zip code, meaning a format. If it’s a five-digit zip, sometimes it’s five dash nine, or five and four together. The tool or the machine observed that there are four different types of zip codes in that data. Is that right? Or behind the scenes, it is finding this field has all kinds of duplicates. I don’t know what the field is, but the machine is learning that the field has tons of duplicates, and it is doing all that based on its learnings on that data we are running every week. After eight or nine of those runs, on the 10th run, it starts to show you those behavior patterns because now it has learned enough about your data and it’s exposing those to you.

Apart from the actual physical custom rules that you are writing and know about, for our data stewards, we always say, “look at the rules that you created, but at the same time, look at all that our machine found,” which is otherwise very difficult for a human to do sometimes.

Data is large; machines are learning and making all kinds of connections behind the scenes. It doesn’t necessarily mean all of that is right for action, but it’s something to consider and worth looking into. I would say everybody has AI built into their ecosystem, they just don’t know it sometimes.

Sue: That’s a great point.

How do you ensure data quality in AI, and how does this fit into your data governance strategy?

Sarang: We hear this term a lot. What are our LLMs doing? A lot of vendors will tell you, “this is RAG technology and retrieval augmented generation that we are doing.” But guess what? If the retrieval is not foundationally accurate and you’re generating based on that retrieval, you’re augmenting more data and pulling inaccurate information. Data wasn’t good and augmenting that inaccurate data is not going to be a good model for LLM use. So that data quality is vital.

In any AI initiative, when you’re talking about running something through your AI engine, and if this is going to help your advisors, for example, it is showing them intelligent information about your customer, maybe the next best product based on their history. The most important thing I talk to my leaders about who are working on AI is: we need to know what underlying data elements we are using, and we need a quality metric on that at all times making sure that quality is right.

If you’re using third-party data and bringing that in, even working with legal on making sure when we get that information — I have the complete glossary for your data that you are landing, I can tell whether all of that data is actually filled with the right field rates because all of this is becoming a part of our ecosystem, on top of which we are building digital products and more intelligent algorithms using AI assuming that the foundational data is accurate. That’s where you definitely need to open the hood and see what three or four key things are going to be used in your RAG algorithm. What is it going to augment data with? It’s like these are the manuals, and our agents were going and looking. So, it’s a data quality problem. The product was deactivated and the flag didn’t get set which caused all the issues. The algorithm is not going to notice that because the assumption is that the base data is accurate.

Sue: It sounds like you do a bit of prioritizing your data.

Sarang: Absolutely.

Sue: Sarang, thank you so much. It was such a pleasure to meet with you today; great insights. To the rest of everyone, please visit cdomagazine.tech for additional interviews. Thank you.

Sarang: Thank you so much, Sue.

This is the second part of a three-part interview with Sarang Bapat covering data quality and data governance strategy topics. Stay tuned for the release of the third part, coming soon, or learn more about our data governance thought leadership here.

Ready to get started on your data governance journey? Get in touch with Wavicle’s experts.