We all want to get our businesses to use the latest technology, but most of the time, we forget about the foundation. In my experience leading a strategic transformation for a local financial institution, the initial executive push was for immediate AI integration to enhance credit decisioning and fraud detection. However, after our initial interaction, I recommended a comprehensive data governance program as the non-negotiable first step. My recommendation was informed by the reality that data governance is the process of turning raw information into an institutional asset, which was lacking at the firm. Without data governance, an organization is merely maintaining a "digital filing cabinet" rather than running a data-driven business.
The following is the summarised methodology I employed to ensure their AI ambitions were built on a stable, compliant, and high-performing foundation.
Phase 1: Assessing Maturity through a Regulatory Lens
We initiated the project by performing a systematic gap analysis of the institution’s IT and data architectures. We chose the BCBS 239 principles as our primary benchmark because they were specifically designed to address inadequacies in risk data aggregation.
Our assessment uncovered a highly fragmented IT landscape. Critical data resided in functional "silos"; for instance, the finance team and the credit risk team used separate systems that did not communicate, leading to inconsistent "versions of the truth". This fragmentation meant that an AI model trained on one silo’s data would be inherently biased or incomplete, creating a systemic risk for the institution.
Phase 2: Centralizing Authority and Accountability
Historically, data ownership in the institution was a byproduct of operations rather than a priority. By establishing a CDO and a Data Governance Council, we created a centralized authority to resolve ownership disputes and drive a data-driven culture. We implemented a data stewardship model in which specific individuals were accountable for data quality throughout its lifecycle. This was critical because real-time AI applications require data that is not only accurate but also has a clear "owner" who can remediate errors before they propagate through automated models.
Phase 3: Developing a Unified Metadata Strategy
A core activity I led was creating a Business Glossary and an integrated taxonomy. We found that business users were often confused by misaligned definitions (e.g., what constitutes an "active account"). We prioritized metadata management by documenting the meaning, purpose, and usage of data as a prerequisite for AI. We chose this approach because high-fidelity training data is only possible if the data is labeled consistently. Without standardized metadata, the data the institution processed would be unsearchable and useless for high-stakes AI decision-making.
Phase 4: Establishing Automated Data Quality channels
Rather than relying on the institution's existing manual remediation, in which staff spent significant time cleaning spreadsheets, we recommended implementing automated data quality gates. We defined measurable dimensions: accuracy, completeness, consistency, validity, and timeliness. This choice was informed by the AI principle of "garbage in, garbage out. Our automated systems were designed to flag or quarantine incoming feeds, such as a sudden, impossible 500% jump in a transaction value, before that data could hit production models and trigger a catastrophic failure.
Phase 5: Implementing End-to-End Data Lineage
The final foundational step was documenting data lineage, which provides a visual trail of data from its origin through every transformation to its final report. We used visualization tools to "stitch" these journeys together, providing the transparency and explainability that regulators now demand under the various available regulations. This ensures that the institution can look inside the "black box" of an AI model and understand how it arrived at a particular outcome based on the underlying datasets, which is essential for auditability and building customer trust.
Strategic Choices
- During the project, we made two critical strategic choices:
GDPR: We prioritized GDPR compliance as our primary privacy framework. This was informed by the fact that GDPR has become the de facto global benchmark for privacy, with nearly 120 countries adopting similar rules. Its focus on purpose limitation and data minimization provided a stricter, more comprehensive foundation for managing the sensitive PII required for AI training. - The "Cloud-Smart" Approach: Instead of a blanket migration to the public cloud, we recommended a hybrid strategy. This was a reaction to the "sticker shock" of soaring cloud bills and the new Digital Operational Resilience Act (DORA). DORA requires financial entities to mitigate risks from third-party technology providers. By keeping latency-sensitive trading systems or data subject to strict residency laws on-premises or in private clouds, we ensured both cost-efficiency and regulatory resilience.
This project served as a definitive masterclass in the reality that data governance is not a "plug-and-play" solution but a deeply bespoke strategic engine. We learned that while the temptation to sprint toward AI is high, sound data quality is the non-negotiable foundation for any digitalization project; without it, the "garbage in, garbage out" cycle becomes a systemic risk rather than a minor technical hurdle.
Our most significant takeaway was that data governance programs are unique to the particular business and entirely non-transferable. We observed that there is no one-size-fits-all approach because the sophistication of governance structures must be strictly tailored to an institution's specific needs, data complexity, and unique regulatory maneuvers.
Ultimately, we discovered that an institution's path to maturity must reflect its specific business model, risk profile, and legacy landscape rather than being forced into a single standard.