How AI is changing the data industry 

Take a single office building on any given morning.

On the third floor, a data engineer is building a transformation model in dbt. She has an AI tool open alongside her code editor, and it is helping her write faster and cleaner. She has used it on every project for the past year. Her manager knows. The client whose data she is working with does not.

One floor up, a consultant is preparing for a board presentation. He has been using ChatGPT to help him think through the framing: different angles, the argument, and working out which version of the story sounds the best. It has saved him two hours, and at the same time, it has not occurred to him to mention it.

And in an office at the top of the building, the head of data is in an early conversation with a vendor about connecting an LLM directly to the company’s data warehouse. 

Three people in the same building, on the same morning, with three fundamentally different relationships with AI. And in all likelihood, no shared conversation about any of it.

This is where most companies find themselves right now. AI has moved into the data stack quickly, and the understanding of what that actually means, in terms of governance, risk, and infrastructure, has not kept pace. 

Not all AI use is created equal

The three people in that building are not doing the same thing with AI, even if it might look that way from the outside. 

The differences between their use cases are in maturity, infrastructure, risk, and governance. Treating them as variations of the same thing produces exactly the kind of vague, one-size-fits-all thinking that leads to poor decisions. 

At i-spark, we think about these differences across three distinct categories, because understanding them clearly is a far more useful starting point for any business making deliberate choices about where and how AI fits into their work.

Category one: ideation and text

This is where most companies begin, and where many still are. 

Using an LLM as a thinking partner: a way to test ideas, test arguments, draft something faster, or get to a better version of a document. The interaction is conversational and unstructured, and the value is immediate enough that most people who try it once keep using it. It has spread largely on its own momentum, with people adopting it individually long before any formal policy caught up with them.

The governance implications at this level are relatively light, provided sensitive data does not find its way into prompts. Getting started requires awareness and a shared understanding within a team of what is appropriate to put into a prompt and what is not. 

That is a conversation worth having early, but it is not a complicated one. The threshold to benefit is low, and for most, the value is already being realised, whether or not anyone has officially sanctioned it.

Category two: code generation and AI-assisted development

The nature of what AI does changes considerably at this level, and the governance conversation becomes both more complex and more consequential. 

AI tools are embedded directly into the development environment itself, working alongside engineers as they write dbt models, generate SQL, review Power BI code, and navigate complex data pipelines with a level of speed and accuracy that would have been difficult to achieve otherwise. 

Tools like Claude Code have pushed this further still, beyond generating isolated snippets of code toward actively working within an entire codebase, understanding context across multiple files, and suggesting changes to accelerate delivery.

Here, the AI tool has access to far more to deliver its output. 

When an AI tool is working inside your codebase, it is reading your architecture, your business logic, your naming conventions, and in many cases, values that blur the boundary between code and data in ways that are easy to overlook. A Power BI file in import mode can embed actual data values directly into the file. A comment in a dbt model might reference a client name, an internal business rule, or a configuration detail that was never intended to leave your environment. These are common features of real-world repositories, present in almost every data project.

The infrastructure decision that precedes any work in this category, therefore, deserves serious thought. Before an AI tool is connected to a codebase, learning whether it is operating inside your data ecosystem or outside it has implications for whether your code and everything embedded within it remains within your control.

Category three: direct data access

At this level, the relationship between AI and your data changes entirely. 

LLMs are connected directly to live data, querying databases, interpreting results, and generating insights in real time without a human analyst translating between a business question and a technical query. The applications are genuinely compelling: a natural language interface to your data warehouse, an AI agent that monitors pipelines autonomously and surfaces anomalies before a human would have noticed them, a system that can answer operational questions in seconds. Companies are beginning to explore what becomes possible when the barrier between asking a question and getting a data-driven answer is reduced to a conversation.

The governance stakes at this level are higher than in either of the previous categories, and the consequences of getting it wrong are more visible. 

Data is being actively processed, interpreted, and in many architectures transmitted across services and infrastructure that may sit outside your direct control. Understanding where that processing happens, under which legal jurisdiction it falls, and what level of access control governs it are foundational questions that need answers before any deployment begins. 

If your business is moving in this direction without a governance framework already in place, you are taking on risks that are rarely visible until something has already gone wrong.

The thread that connects all three: control

Across all three categories, the technology itself is almost secondary to the question of control and specifically, how much of it a company retains as AI becomes more deeply integrated into its data stack. 

A data stack is a chain of code repositories, transformation tools, visualisation layers, orchestration platforms, and increasingly AI tools, and each link in that chain carries its own legal and jurisdictional profile. 

Most major data platforms like Databricks or Snowflake are products of American companies. Some offer EU data residency options; others do not. And even infrastructure physically deployed in the EU by a US-headquartered company can remain subject to US legal demands under legislation like the CLOUD Act, a detail that companies with genuine data residency requirements cannot afford to treat as a formality.

Engaging thoughtfully with these questions means knowing whether your Databricks workspace processes LLM calls within EU boundaries or routes them through US servers. It means understanding the practical difference between a Power BI file in direct query mode, where data remains in Snowflake, and one in import mode, where it can potentially be committed to a git-repo. 

It means building a data stack with enough visibility into each of its components that you can answer the governance questions your clients and regulators will eventually ask, before they ask them.

A note on how we, as i-spark, approach this

i-spark uses AI tools actively and with conviction. Claude Code is part of how we work, and we believe it makes us meaningfully better at what we do. We are not writing this from a position of caution about AI; quite the opposite.

What we are cautious about is the gap between adoption and understanding that characterises so much of how AI is currently being used in the industry. Silence is not the same as safety. Governance conversations happen often too late, with less context and less control for everyone involved.

Our view is that clients deserve to understand what tools are being used in their projects, what those tools have access to, and what the governance implications are. 

Because informed decisions are better decisions, and because a partner who thinks carefully about these questions before they become problems is a fundamentally different kind of partner than one who discovers them after the fact.

The three categories above are a map; knowing where you are and where you want to go is the starting point for using AI in your data stack with clarity and confidence.

Content

Is your data ready for what’s next?
Flexible data solutions that grow with you.

AI-assisted code development: What you should know

This article is part of a series on how i-spark uses AI in our work. For an overview of the three main categories of AI...

How AI is changing the data industry 

Take a single office building on any given morning. On the third floor, a data engineer is building a transformation model in dbt. She has...

Let’s debrief: data & AI | February 2026

February may be the shortest month on the calendar, but the data and AI industry didn’t get the memo. What stood out this month? If...