Prototype of a Vocoder in Sonic Pi

Someone requested this on GitHub so I’m dusting this off and posting it as more of a progress report in case anyone was interested (issue was here https://github.com/samaaron/sonic-pi/issues/1930)

I’ve also written this up as a gist here with code samples, including the code for the implementation in SuperCollider: https://gist.github.com/xavriley/0907002649d6b6b2ac2bcbe739d96761

Prototype of vocoder on Sonic Pi

Demo here: https://www.dropbox.com/s/qiktze3ml7bz5iq/autotune_the_shipping_forecast.wav?dl=0
Original voice input here: https://soundcloud.com/jb_uk/neil-nunes-bbc-radio-4-and

This is a demo of a simple effects synth using the Vocoder.ar UGen from SuperCollider. This is a fairly primitive vocoder implementation made of a bunch of bandpass filters which are “tuned” to various frequencies.

The results are mixed, but it’s difficult to get a decent reproduction of the words from this.

To get T-Pain/Imogen Heap style auto-tune it would be necessary to write a different algorithm probably using a technique called PSOLA which is described at the following links:

Part of the problem is that the algorithm is probably under copyright which makes distribution of an open source version more difficult.

5 Likes

Update on this - I’ve found a PSOLA based pitch shifter implemented as a SuperCollider quark (plugin) here https://github.com/dyfer/PitchShiftPA It soundsway better than the Vocoder.ar synth. Just need time to implement it properly…

6 Likes

I don’t suppose you’ve had time to look into this?

Status update:

//PitchShiftPA is based on formant preserving pitch-synchronous overlap-add re-synthesis, as developed by Keith Lent
//based on real-time implementation by Juan Pampin, combined with non-real-time implementation by Joseph Anderson
//This synthdef is based on the pseudo-UGen by Marcin Pączkowski, using GrainBuf and a circular buffer at https://github.com/dyfer/PitchShiftPA

(
SynthDef('sonic-pi-fx_vocoder', {|
		pitch_ratio = 1, formant_ratio = 1,
		min_freq = 10, max_formant_ratio = 10, grains_period = 2,
		out_bus=0, in_bus=0, time_dispersion|

		var in, localbuf, grainDur, wavePeriod, trigger, freqPhase, maxdelaytime, grainFreq, bufSize, delayWritePhase, grainPos, snd, freq;
		var absolutelyMinValue = 0.01; // used to ensure positive values before reciprocating
		var numChannels = 1;

		//multichanel expansion
		[pitch_ratio, formant_ratio].do({ arg item;
			item.isKindOf(Collection).if({ numChannels = max(numChannels, item.size) });
		});

		in = In.ar(in_bus,1).asArray.wrapExtend(numChannels);
	    freq = Pitch.kr(in)[0];
	    //freq = freq.asArray.wrapExtend(numChannels);
		pitch_ratio = pitch_ratio.asArray.wrapExtend(numChannels);

		min_freq = min_freq.max(absolutelyMinValue);
		maxdelaytime = min_freq.reciprocal;

		freq = freq.max(min_freq);

		wavePeriod = freq.reciprocal;
		grainDur = grains_period * wavePeriod;
		grainFreq = freq * pitch_ratio;

		if(formant_ratio.notNil, { //regular version

			formant_ratio = formant_ratio.asArray.wrapExtend(numChannels);

			max_formant_ratio = max_formant_ratio.max(absolutelyMinValue);
			formant_ratio = formant_ratio.clip(max_formant_ratio.reciprocal, max_formant_ratio);

			bufSize = ((SampleRate.ir * maxdelaytime * max_formant_ratio) + (SampleRate.ir * ControlDur.ir)).roundUp; //extra padding for maximum delay time
			freqPhase = LFSaw.ar(freq, 1).range(0, wavePeriod) + ((formant_ratio.max(1) - 1) * grainDur);//phasor offset for formant shift up - in seconds; positive here since phasor is subtracted from the delayWritePhase

		}, { //slightly lighter version, without formant manipulation

			formant_ratio = 1 ! numChannels;

			bufSize = ((SampleRate.ir * maxdelaytime) + (SampleRate.ir * ControlDur.ir)).roundUp; //extra padding for maximum delay time
			freqPhase = LFSaw.ar(freq, 1).range(0, wavePeriod);
		});

		localbuf = numChannels.collect({LocalBuf(bufSize, 1).clear});
		delayWritePhase = numChannels.collect({|ch| BufWr.ar(in[ch], localbuf[ch], Phasor.ar(0, 1, 0, BufFrames.kr(localbuf[ch])))});
		grainPos = (delayWritePhase / BufFrames.kr(localbuf)) - (freqPhase / BufDur.kr(localbuf)); //scaled to 0-1 for use in GrainBuf
		if(time_dispersion.isNil, {
			trigger = Impulse.ar(grainFreq);
		}, {
			trigger = Impulse.ar(grainFreq + (LFNoise0.kr(grainFreq) * time_dispersion));
		});
		snd = numChannels.collect({|ch| GrainBuf.ar(1, trigger[ch], grainDur[ch], localbuf[ch], formant_ratio[ch], grainPos[ch])});

		Out.ar(out_bus, snd.dup)
	}
).writeDefFile("/Users/xavierriley/Downloads/Sonic Pi.app/Contents/Resources/etc/synthdefs/compiled/")
)
# in synthinfo.rb
    class FXVocoder < FXInfo
      def name
        "Vocoder"
      end

      def introduced
        Version.new(3,2,0)
      end

      def synth_name
        "fx_vocoder"
      end

      def doc
        ""
      end

      def arg_defaults
        super.merge({
          :pitch => 440,
          :pitch_ratio => 1.0,
          :formant_ratio => 1.0,
          :min_freq => 10,
          :max_formant_ratio => 10,
          :grains_Period => 2.0,
        })
      end
    end
...
        :fx_vocoder => FXVocoder.new,
# Sonic Pi code

# harmonises a vocal sample as a major chord

load_synthdefs
sn = "~/Downloads/acappella.wav"

sample sn

in_thread do
  with_fx :vocoder, pitch_ratio: 1.5, formant_ratio: 0.5 do
    sample sn
    sleep sample_duration(sn)
  end
end

in_thread do
  with_fx :vocoder, pitch_ratio: 1.25, formant_ratio: 0.5 do
    sample sn
    sleep sample_duration(sn)
  end
end

The API could do with tweaking to make it more intuitive but the noises are there.

Internally it pitch tracks the input sound, and then pitch shifts against the tracked pitch. That’s why this example file has those moments of “distortion” as it’s not able to track a clear pitch at those points. With a good input source the effect should be super smooth.

This also opens the door to a T-Pain style autotune but I need to do a bit more work to get that.

3 Likes

Nice work @xavierriley! Maybe I should get back to working on those Synths and FX that I started a while ago too :thinking:

Just tried this out. Very cool. added it on my Mac, and tried it with direct voice input

with_fx :autotuner,mix: 0.7  do |c|
  set :ct,c
  live_audio :min,stereo: true,amp: 4
  
  # now start setting target pitch to get robot voice behaviour
  live_loop :robot do
    control get(:ct), target_pitch: scale(:a2, :minor_pentatonic, num_octaves: 2).choose
    sleep 0.4
  end
end
1 Like

This has landed in 3.2 beta! :tada:

2 Likes