A Guide to Vibe-debugging | Nikhil Vashistha

table of contents

Introduction

"Engineers think in systems, and those systems include real-world components. Coders only care that they finished writing their code, and now the finished code is someone else’s problem. Because coders only contribute code, they can can be easily replaced by Cursor or Claude, which can contribute much more code, much faster."

A very obvious noob move, in hindsight, was not having any clarity on structuring the work that the agent had to perform. What made it a little better was to simply assume that the agent is a really good junior developer who had to be taken through the entire task-list and roadmap to get any relevant work done, not a know-all, do-all entity.

I was giving only so much context to the agent to be able to work as a coder, not as an engineer. Soon enough, the finished code became someone else's (my) problem.

Low Clarity = Basic Bugs

Once the agent executed it's first set of tasks, I asked it to create the UI (because the OG task list didn't have it as a task) so that I could see what the code amounted to.

See, I'm not averse to bugs. They're a sign that things can only get better, and could lead to other unseen cracks. What I found hilarious was how basic some of the early bugs were - build errors (the app wouldn't start) and not displaying the score.
Not. Displaying. The. Score. On a scoring app.

Squashing Basic Bugs

Pretty simple - simply take a screenshot and paste it to the agent.
Don't explain too much, or don't say anything at all.

The agent did surprisingly well at reading the image (it shouldn't surprise me anymore, but years of bad OCR products had me questioning image recognition).
It didn't simply read the screenshot and jump to conclusions, but made changes based on some train of thought.

Incredible!

The updated PRD and task-list brought slightly better bugs upfront, but they were piling up faster than I knew what to do with them. A few questions on my mind:

1. When exactly should testing be done? After every parent task? After every subtask? After every round of code?
2. How should I record these bug fixes? Do I need to?

Test-Driven-Development

Low clarity → Less structure → Never-ending bugs

In my renewed search for best practices and how to manage testing better, I came across a thread on Reddit where people using agents to code were very blasè about the perils of relentless bug-bashing.

They swore by something called test-driven-development, or TDD.

Understanding TDD

As per the current task list, the agent would work a task and then just move on to the next one. According to the TDD methodology (aka Red-Green-Refactor):

🔴 Red Phase

Write a test for the expected behaviour/output. This will fail because a test is being written before the code for it exists.

🟢 Green Phase

Write the least amount of code so that the test passes.

🔁 Refactor Phase

Write a test for the expected behaviour/output. This will fail because a test is being written before the code for it exists.

An Analogy

Imagine yourself planning out your week.

Non-TDD: Add meetings and tasks into your calendar as they come. The gaps are where you fill in work and other tasks.

TDD: Figure out what you want your week to look like and what's important to you. Let's say you don't want to take more than 6 hours of meetings and calls and you want at least 2 hours of deep focus time per day. Arrange your calendar accordingly.

Integrating TDD

The first few tasks had already been done, so I thought it best to ask the agent to convert the remaining task breakdowns into something that followed a TDD format.

That created another set of 5 child tasks per sub-task that included Red, Green, Refactor tasks. The tasks themselves didn't change.

Would it have been better to restart the entire project from scratch using TDD instead of simply converting existing tasks? Maybe. Tasks could have been worked down into well-defined modular bits, and not "When the app starts up, load any match that was in progress..."

Is this too big a test to set up in a TDD format? Possibly.

Managing Tests

Getting the agent to execute these new TDD oriented tasks opened a new window into the workings of XCode - a module called Test Navigator.
A whole module to run and manage tests? Hallelujah!

I never paid attention to this module present very ubiquitously in the XCode primary navigation in my previous attempts to bug fix.
Tasks were now TDD'd, so the agent would first make changes to the code and then ask to run just that particular test!

I'm sure the agent ran tests earlier too, but task-specific tests were now front-and-center for not just resolving errors but the execution itself, signalling a structural shift in how development was being approached.

Going Forward

The TDD tasks were structured such that the last child task would always run the entire test suite to test if any changes made in the current task exposed cracks elsewhere.

As of now, I like to verify the build manually and see the progress being made. It's fun to click buttons on the simulator and see the app function. Seeing how the agent executes now, given the new set of rule sets, PRD, and task-list, I think I might be bold enough to let the agent run large chunks of tasks autonomously.