A Little Knowledge is a Dangerous Thing

I'm a contributor to the Surge Synthesizer open source project. For the uninitiated, it's an extremely comprehensive software synthesizer. If you don't know what that means: it's a complicated musical instrument. My most recent project there has been to implement a convolution reverb feature. That means, in extremely simple terms, that you can take a microphone and a balloon to a space like, say, a cathedral. Pop the balloon and record the resulting sound on a microphone. You can then do some math with this recording and any audio recording of your choosing, and it will sound like the audio was heard in that cathedral. Pretty cool!

Anyway, I recently finished up that project except for a nagging bug. We have this feature that lets you add some delay before the reverb kicks in. This can range from 0 to 1 seconds of delay. One of our developers discovered a bug where if you moved the delay from zero to nonzero and then back to zero, you could get a nasty distortion. This particular delay can be programmatically modified (known in synthesizer land as modulation), so this was something that could easily happen in practice.

Said developer kindly tracked down the bug and figured out that if you moved from non-zero delay back to zero, for some internal implementation reasons you suddenly needed a minimum delay time of six samples rather than zero samples. For CPU efficiency reasons though, it would be really nice to have that minimum be zero when possible. I decided on a simple algorithm: if the delay is ever under modulation, the minimum should be six seconds. Otherwise, we can let it be 0 seconds.

Now, I'm at the end of this project. I'm pretty tired of it. I decided to test if AI as of 3/27/2026 was up to the task. I opened up gemini-cli, loaded the file with the bug, and gave it the following prompt:

This code currently has a problem. When the pre-delay time goes from 0 to a non-zero value and then back to zero, it starts producing artifacts. The way to fix this is that, if the delay time is ever moved past zero, and that move is done as part of a modulation, it should treat the minimum possible delay value as 6 samples instead of 0 samples. It should only switch back to treating 0 as 0 if the delay value goes to 0 when it's not being modulated. Please make this change.

After it churned and finished, I added the following:

Ensure it builds, the command to compile the ConvolutionEffect.cpp file can be found in build/compile_commands.json.

I cleaned up the result a bit to make it more compact, and it was this added function:

bool ConvolutionEffect::is_delay_modulated()
{
    const int id = fxdata->p[convolution_delay].id;
    for (const auto &m : storage->getPatch().modulation_global)
    {
        if (m.destination_id == id && !m.muted)
            return true;
    }

    return false;
}

And the code that uses it:

    float delay_samples = storage->samplerate * *pd_float[convolution_delay];
    if (is_delay_modulated() || delay_samples > 0)
    {
        // Thanks to how sinc interpolation works, there's a minimum delay of 6 samples. However, we
        // really want to use 0 samples if we can because it's a lot more efficient. So we set a min
        // of 6 if modulation of the delay parameter is turned on, or if it's set to a non-zero
        // value.
        delay_samples = std::max(
            delay_samples,
            static_cast<float>(sst::basic_blocks::tables::SurgeSincTableProvider::FIRipol_N) / 2);
    }
    delayTime_.newValue(delay_samples);

At first glance, this result is actually quite impressive. Surge is an extremely complicated project, the internals are not particularly documented, and the interactions can be spaghetti. The LLM managed to figure out what I meant by "modulation", discover where modulation was done, find the correct connecting data structure, connect it to the individual live parameter (by ID), and check for it. All good, right?

Where Knowledge Becomes Danger

This code is in fact extremely problematic, but if you didn't have any experience with Surge internals you wouldn't even question it. It would appear to work fine under testing, you'd commit it, and be done. But then somewhere along the line, somebody using convolution reverb would randomly crash in the middle of a live performance and Surge's reputation as a synthesizer would be ruined.

Synthesizers need to produce sound extremely quickly on a very tight schedule. Audio runs at a minimum of 44,100 samples per second and if it needs a sample to go out the sound card, that sample must be there to go. Otherwise you get an audible stutter. It is extremely unforgiving. To satisfy this requirement, synthesizers generally have a special "audio thread" that is responsible for doing this tight-deadline audio processing. This is separated from things like the GUI, which has much more relaxed guidelines.

The creation of modulation is done by the user in that GUI, separately from the audio thread. Now, most synthesizers communicate between this GUI and the audio thread by means of message passing through something like a ring buffer. Surge does not do this, so reads/writes of these intersections need to be carefully audited. This variable is simply not safe to be referenced like this by the audio thread! This access would need to be protected by a lock (which the audio thread cannot do, because then it might fail the realtime deadline).

The crash would be as follows: the synthesizer is running and spitting out sound. The performer removes an active modulation from the patch, likely through one of their on-stage controllers that pushes it into Surge. The convolution effect is in the middle of iterating over the list of all active modulations. Now there's one fewer, and it attempts to access the end of a list that just got shorter. Worst case scenario, it's an out-of-bound access, and it crashes. Best case scenario, it reads garbage data and the internal state no longer reflects reality, causing sound issues down the line. Either scenario, very bad!

The problem with the LLM here is that it doesn't understand this architecture. It doesn't know the difference between the GUI thread and the audio thread in Surge. It doesn't know that data property X is set in the GUI thread and is only allowed to be read in the audio thread under very specific and limited circumstances. Threading interactions are hard enough for humans to reason about, it's no surprise that an LLM whiffed them. Where it's a problem is when someone with less experience comes, does this exercise, gets some code that looks reasonable, and commits it.

That problem is where the industry is rushing full-speed ahead, which is insane to me.

How Did This Happen?

It's worth interrogating why it came up with this result, which reveals a larger problem with LLM-based coding. Let's go back to the prompt. When I wrote it, I did it with an inherent assumption. That assumption was that this was possible to do using the existing code. This assumption is wrong. It is possible to implement the feature this way, by creating a new message-pathing path for this stream of information. But my assumption built in to that prompt is that it was possible in the current Surge.

LLMs do not challenge your assumptions! If you provide instructions, it will try to fulfill those instructions (more accurately, it will produce a result that looks like it filled those instructions, known as an "answer-shaped response"). Part of that fulfillment means that any assumption you make, will be considered truth. That includes unspoken assumptions. Unspoken assumptions that are extremely important about the state of the world. More succinctly: bias.

In my experience, we're all pretty bad at explicitly communicating our biases. LLMs don't actually know anything about the world and they've been reinforcement-learned to be exceedingly helpful, so you'll get a result that confirms any bias you put into it. Of course the LLM could output code that does what I asked it to. After all, the prompt assumed it was possible!

That makes these tools super dangerous in the hands of junior developers, who don't have a good mental model about the state of the system they're asking the LLM to work on. That's a problem because all indications are that juniors absolutely love these tools. Now it's been a very long time since I was a junior developer, and I still got it wrong. Where does that leave us?