AI-assisted code development: What you should know

This article is part of a series on how i-spark uses AI in our work. For an overview of the three main categories of AI use: ideation, code assistance, and data analysis, see our companion article.

AI-assisted coding is now standard practice across data initiatives, with code writing and code reviewing. Most development teams use it. Chances are, any vendor you work with is using it too, whether or not they’ve told you.

That last part is worth sitting with for a moment.

What AI-assisted coding actually means in a data project

In a data project environment (think dbt, GitHub, Power BI, Snowflake), AI coding tools connect directly to your code repositories. They read your code, analyse it, and generate new code based on it. The practical benefits are faster delivery, fewer bugs, and lower project costs.

Used well, it genuinely improves both the speed and quality of development work. The keyword is well. AI assistance works when the person using it actually understands what they’re building: the business logic, the architecture, the requirements. It helps experienced developers get better results, faster.

It doesn’t replace the need to know what you’re doing; it amplifies it. A developer who doesn’t understand the code they’re generating is a risk regardless of which tools they use.

“Connecting an AI tool to your repository” is not the same as adding a new developer to a Slack channel. There are some implications worth looking into before you sign off on it.

Three things that deserve your attention

1. Where your data actually ends up

Code and data are supposed to be separate. In practice, the line blurs more than most people realise.

Your repository contains architecture and business logic: how your data models are structured, how your reports are built. That’s your intellectual property. It’s specific to your business, and it tells a story about how your company works.

It may also contain quasi-data: comments referencing specific customers, hardcoded values, and test scenarios pulled from real records. These are common in real-world repositories, and they’re sensitive even when they appear to be code. We know this isn’t best practice, but it happens in nearly every real-world repository. 

Depending on your configuration, the actual data may be in there, too. 

Power BI is a clear example. 

With the older .pbix format, files could end up committed to a repository, but they’re not particularly useful stored there. With the newer .pbir format, files aren’t typically pushed to GitHub at all. 

However, in import mode, actual data is cached locally on the developer’s machine. If an AI coding tool is running in that same environment, that locally cached data could be accessible to it, depending on how the tool is configured and what the developer shares with it, even if it never touches your repository. The exposure point shifts, but it doesn’t disappear.

The question to ask your vendor: How is our Power BI configured: direct query or import mode? And do you have a policy on what gets committed to the repository?

2. The scope of access isn’t always transparent, including human access 

Section 1 outlined what your repository may contain in the worst case. The reality is that it’s often unclear how much of that an AI tool actually accesses, when, and how, and not knowing is itself a governance problem. Unlike a file sitting in version control, AI tools actively process your code: ingesting it, analysing it, and generating from it. The scope of that processing isn’t always documented upfront, and it’s worth asking about before you start.

Beyond what the AI model processes, there’s a separate question about who at the vendor company can see your code. AI tools are software products, and like any software product, they are built, maintained, and monitored by people. Depending on the vendor’s internal access controls, support staff, engineers, or security teams may have the ability to view repository contents.

The questions to ask your vendor: Who within your organisation has access to our code or repository contents? What access controls and audit logging are in place?

3. Geopolitical and vendor risk, including the supply chain 

Most AI coding tools are US-based. If your organisation has made a deliberate choice to use EU infrastructure (on-premise GitLab, European cloud providers, or similar), connecting to a US AI vendor is not a neutral extension of those choices.

 It introduces a new data residency consideration that may conflict with your existing governance decisions.

What makes this harder to assess is that AI tools often don’t operate in isolation. The tool your vendor uses may itself call other AI services under the hood: sub-processors that are not always disclosed upfront. A product that looks EU-friendly on the surface can still route your code through a US-based model provider. 

Under GDPR, transferring personal data to a third country without adequate safeguards is not permitted and “it looked European” is not a defence. 

There’s also a stability question. AI vendors are newer, and the regulatory and political environment around them is shifting quickly. Terms of service can change. Vendor relationships can change. That’s a different risk profile than established infrastructure providers.

The questions to ask your vendor: Which AI tools do you use, and who are their sub-processors? Where is our code processed and stored? How does your tooling comply with GDPR when data is transferred outside the EU?

Keeping the risks in proportion

These are real considerations, but they need context.

If your code is already hosted on GitHub, it’s already on US infrastructure. Adding an AI layer on top of that is incremental risk, not an entirely new category. The same governance questions: who can access your code, where it’s stored, what it contains, apply whether or not AI is involved. AI makes those questions more urgent, not entirely new.

The risks described above are also low-probability when good repository hygiene is in place: clear policies about what gets committed, how files are configured, and what goes into code comments. 

That said, “low probability” is not the same as “not worth asking about.” The value of these questions is in establishing that your vendor has thought about this, has policies in place, and can answer clearly.

The goal isn’t to avoid AI tooling. Used well, it delivers real benefits. The goal is to use it with the same intentionality you’d apply to any tool that touches your code and your data.

Why this conversation exists

Transparency here isn’t just good ethics. It’s good governance. You deserve to know what tools are being used on your project, what they touch, and what safeguards are in place, so you can make informed decisions, raise concerns early, and set boundaries where you need to.

If you have questions about how AI tooling is used in your project, ask them. Any vendor worth partnering with should be able to answer clearly.

Questions about i-spark’s approach to AI and governance? Get in touch.

Content

Is your data ready for what’s next?
Flexible data solutions that grow with you.

AI-assisted code development: What you should know

This article is part of a series on how i-spark uses AI in our work. For an overview of the three main categories of AI...

How AI is changing the data industry 

Take a single office building on any given morning. On the third floor, a data engineer is building a transformation model in dbt. She has...

Let’s debrief: data & AI | February 2026

February may be the shortest month on the calendar, but the data and AI industry didn’t get the memo. What stood out this month? If...