How Our Engineering Team Uses AI
AI tools are everywhere right now, and our engineering team uses them daily. In this post, weβre sharing how we actually use AI coding tools and agents while building mirrord, whatβs been useful and what hasnβt. If youβre looking for ways to leverage AI in real-world software development, hopefully some of this would be useful for you.
What do we do?
But first, some context about our team and the product weβre building. At MetalBear, weβre a completely remote company building mirrord, which is a Kubernetes development tool written in Rust. mirrord is not a typical SaaS CRUD microservice app. It’s a local tool that communicates with your on-prem Kubernetes environment, including components like a layer that injects itself into your process, an ephemeral agent that runs in Kubernetes, a Kubernetes operator, and lots of glue.
We raised our seed round a couple of months ago and are a team of 34 at the time of writing, out of which 15 make up the engineering team. We donβt mandate any AI tooling, but we do strongly encourage it, and engineers are free to pick whatever works for them and experiment. We have a Slack channel where people publicly share how they use AI, in the hopes their use case is relevant for others on the team. Here are a few of the examples that were shared recently, including use cases for Claude Code, ChatGPT, and Gemini.
Where AI helps the most
Getting oriented in unfamiliar code
One of the most consistent and least controversial ways we have been using AI is as an entry point into unfamiliar code. This is especially useful when understanding a new area of the codebase, coming back to something that hasnβt been touched in a while, or trying to understand code in external libraries. Instead of starting by opening files and reading through the code manually, people often use tools like Claude Code or Cursor for a high-level explanation of how a certain part of the system is structured and how the pieces relate to each other. A prompt might look something like:
Iβm looking at the TCP outgoing interception code in the mirrord layer.
Can you explain which modules are involved and how a connection flows from the
local process to the agent?
Itβs important to note here that engineers arenβt asking the AI to explain mirrord as a whole, or to be an authority on the architecture. That simply doesnβt work for a codebase as large as ours. Theyβre using it to form an initial mental model for a specific part of the system theyβll be working on. So even if that model is incomplete or slightly wrong, it still provides a useful starting point and makes the next step, reading the actual code, much easier.
Exploring ideas and alternatives
Another area where AI has been useful for us is early in the development process, during the planning stage before any approach has been chosen for solving a problem. Engineers often use it to explore ideas by describing the feature they want to implement or a bug theyβre trying to fix, and seeing what kinds of approaches the model suggests. Having AI lay out a few different options can surface trade-offs earlier, or help rule out directions they donβt want to pursue, without paying the full cost of writing and rewriting code. A prompt in this case might look something like:
Iβm looking into adding support for filtering incoming requests by HTTP method in mirrord.
I donβt want implementation code yet. Can you help me think through where filtering could live,
what kinds of constraints or edge cases we should consider, and what
trade-offs different approaches might have?
That said, objections have also been raised internally about using AI this way. Once a model proposes a concrete solution, it can unintentionally narrow your thinking. Even a mediocre solution can anchor your brain and make it harder to explore better alternatives on your own.
Scripts
If thereβs one area where everyone on the team agrees AI consistently delivers value, itβs scripts. For debugging scenarios or local workflows, being able to describe what you need and have a working script generated for you can save a huge amount of time. One of the engineers used these prompts to create a reusable PowerShell function which they needed:
# Prompt 1
I want to create a basic PowerShell function to add to my
PowerShell profile.
The function should create a Kubernetes pod using `kubectl`,
based on the `busybox` image.
- It should accept at least one argument for the pod name.
- The pod must run forever using `-- sleep infinity` (this is an
absolute requirement).
The command it generates should look like this:
`kubectl run fun --image=busybox --restart=Never -- sleep infinity`
Additional requirements:
- Make the restart policy configurable via a function argument,
so I donβt have to keep deleting the pod every time I restart the cluster.
- Run basic sanity checks to ensure there is a running Kubernetes cluster.
- After writing the script, suggest a few additional improvements.
Name the function `New-KubectlBusyBoxPod`.
# Prompt 2
Extend the function with the following behavior:
- Add an option to automatically attach to `/bin/sh` in the newly
created pod.
- Before creating the pod, run `kubectl get pod $Name` to check if it
already exists.
If the pod exists, the output will look like:
```
NAME READY STATUS RESTARTS AGE
fun 1/1 Running 0 5m39s
```
Parse the second line of stdout, split it by whitespace, and extract:
- the pod name
- the pod status
If the pod already exists, stop early and print a message like:
`pod of $name exists, status: {pod-status}`
Final adjustments:
1. Change the default restart policy to `"Always"`.
2. Wrap the existing pod status in single quotes in the output.
3. Display the βpod already existsβ message in red, since this is an
error case.
4. If the pod exists, prompt with a y/n question asking whether to delete
it and proceed with creation.
- If deleting, use `--force` and verify that the deletion succeeded.
But there’s another benefit besides time-saving. These AI-generated scripts tend to be more structured and readable by default compared to what an engineer would write, because spending extra time on a throwaway script usually isnβt worth it for them. This makes them much easier to tweak, extend, and reuse later when similar needs come up. Over time, many of these scripts have stopped being one-offs and instead become part of a small personal toolkit that gets reused again and again across debugging sessions.
Where AI struggles
Complex architectures
mirrord has a complex and fairly unusual architecture, which general purpose LLMs struggle with. If you ask an AI tool to do anything that requires full context of how mirrord works, it will most likely fail. Weβve had very few instances of someone on the team using AI to fix a bug successfully without much manual intervention. And weβre still very far from letting fully autonomous agents loose on our codebase, even with a human reviewing the output, because it often requires more work fixing that code than writing it yourself.
That said, some engineers have had better results by explicitly giving the model persistent architectural context. In practice, this means maintaining internal CLAUDE.md or AGENTS.md files that describe mirrordβs structure and its major components. These files arenβt static, and engineers use the models themselves to keep them updated for future use. In case youβre curious, this is what one of those files looks like currently:
# CLAUDE.md
Context for Claude Code when working with the mirrord repository.
# Check packages (agent is Linux-only)
cargo check -p mirrord-layer --keep-going
cargo check -p mirrord-protocol --keep-going
cargo check -p mirrord-agent --target x86_64-unknown-linux-gnu --keep-going
# Integration tests
cargo test -p mirrord-layer
```
**Key paths:**
- Protocol messages: `mirrord/protocol/src/codec.rs`
- Agent main loop: `mirrord/agent/src/entrypoint.rs`
- Layer hooks: `mirrord/layer/src/file/hooks.rs`, `mirrord/layer/src/socket/hooks.rs`
- Intproxy routing: `mirrord/intproxy/src/lib.rs`
- Configuration: `mirrord/config/src/lib.rs`
## Architecture
mirrord lets developers run local processes in the context of a Kubernetes cluster. It intercepts syscalls locally and executes them remotely in a target pod's environment.
### Component Overview
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LOCAL MACHINE β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β User Application β β CLI (mirrord) β β
β β ββββββββββββββββββ β β - Starts intproxy β β
β β β Layer β β β - Resolves target β β
β β β (LD_PRELOAD) ββββΌββββββ€ - Sets up env vars β β
β β βββββββββ¬βββββββββ β ββββββββββββββββββββββββ β
β ββββββββββββΌββββββββββββ β
β β Unix socket / TCP β
β ββββββββββββΌββββββββββββ β
β β Intproxy β Routes messages, manages connections β
β β (or Operator proxy) β Background tasks: files, incoming, outgoing β
β ββββββββββββ¬ββββββββββββ β
βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TCP (k8s port-forward or operator connection)
βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KUBERNETES β CLUSTER β
β ββββββββββββΌββββββββββββ β
β β Agent β Ephemeral pod, runs in target's namespace β
β β - File operations β Has access to target's fs, network, env β
β β - DNS resolution β Uses iptables for traffic stealing β
β β - Traffic steal β β
β β - Outgoing conns β β
β ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### The Three Tiers
**Layer** (`mirrord-layer`)
- Injected into the user's process via `LD_PRELOAD` (Linux) or
`DYLD_INSERT_LIBRARIES` (macOS). Uses Frida GUM to hook libc functions.
When a hooked function is called, the layer either:
- Handles it locally (bypass)
- Sends a `ClientMessage` to the proxy and waits for a `DaemonMessage` response
**Proxy** - Routes messages between layer and agent. Two variants:
- **Intproxy** (`mirrord-intproxy`): Runs locally in open-source mode
- **Operator**: Runs in-cluster for paid version (separate repo at `../operator/`)
**Agent** (`mirrord-agent`)
- Ephemeral pod that performs operations in the target's context. Network tasks
run in the **target pod's network namespace** for correct DNS resolution and routing.
...
If you check out the full file here youβll see, itβs a lot of context that we need to provide the model to get it to a state where itβs decent, but still not enough to be trusted entirely.
Long-running reasoning
Another place where AI tools consistently struggle is when the scope becomes large or the context stretches over time. Models will often forget why they made an earlier decision in the same session. Itβs common to see them fix a bug in one place and accidentally break something unrelated elsewhere, simply because they lost track of an earlier constraint. This makes them unreliable for iterative changes unless the engineer is carefully tracking the logic themselves. The output often looks plausible at first glance, which makes these failures easy to miss if youβre not paying close attention.
The performance of different models has also varied in this area for our engineering team.
- ChatGPT tends to be the best all-rounder: it usually understands prompts well, gives the most consistently reasonable answers, and is relatively good at iterating on its own mistakes, including identifying and fixing bugs it introduced.
- Gemini stands out for deep research, especially in its βThinkingβ modes, and can go impressively far when you give it time, but it frequently loses track of earlier decisions and constraints, leading to fixes in one place that break unrelated parts elsewhere.
- Claude Code is somewhere in between, with some engineers finding it useful and others struggling to get value from it.
So is AI changing how we build software at MetalBear?
AI hasnβt replaced engineers on our team, and it hasnβt removed the need to deeply understand the systems weβre building. It hasnβt magically solved complex architectural problems, and it certainly hasnβt made it safe to hand over large parts of the codebase to fully autonomous agents. What it has done is reduce friction and save time.
It helps engineers get oriented faster, explore ideas earlier, write handy scripts, and offload a lot of mechanical or repetitive work. The biggest difference weβve seen isnβt which model people use, but how intentionally they use it. The engineers getting the most value are the ones who scope problems tightly and control the context they give the model.
So yes, AI is kind of changing how we build software, but not in the way most marketing would have you believe, at least for a low-level, deeply technical product like ours. Right now, for us, AI is best thought of as a powerful tool around the edges of software development. Itβs very good at accelerating parts of the process that are tedious or exploratory, but bad in areas that require deeper understanding.

