Using text to speech on Sonic Pi (Mac version only)

I came across a neat article about the Apple “say” command utilised by the text to speech components of MacOS. I thought it would be a nice feature to try with Sonic Pi, but it was difficult to isolate the audio output and use it as an input to Sonic Pi. After playing around a bit, I decided to use the built in feature of the command to save the spoken text to a sample, this could then easily be loaded into Sonic Pi just like any sample.
In the event, it proved to be farily easy to do, and the program below is the result, containing the function sayIt and some demonstrations of its use.

#sayIt demo to utilise text to speech on Sonic Pi on a Mac
#a function sayIt is defined, which utilises the Apple say command to create a sample speech.aiff
#this is stored in a user specifed path in the line s1="/user/rbn/saytext/speech.aiff"
#adjust as you wish. The function will work nicely if defined in the init.rb file or it can be used as here.
#a very useful article is as at https://maithegeek.medium.com/having-fun-in-macos-with-say-command-d4a0d3319668
#function sayIt created by Robin Nedwman, August 2023. NB this will only work on Sonic Pi on a MAc.
#If you use a different language to English there are voice synths availalbe for difernt languages.

define :sayIt do |message,voice="kate",*args|
  adef={amp: 1,pan: 0,rpitch: 0} #default arguments
  ag=args[0] #supplied optional arguments
  ag=adef if ag==nil #no args supplied. Use all defaults
  adef.length.times do |i| #update default args with those supplied
    if !ag.keys.include? adef.keys[i]
      ag[adef.keys[i]]=adef[adef.keys[i]]
    end
  end
  s1="/Users/rbn/saytext/speech.aiff"
  #save say output as a temporary sample file speech.aiff
  system("say -v '"+voice+"' '"+message+"' -o '"+s1+"'")
  sleep 0.1
  sample_free_all #forget any previous versions of sample file
  sleep 0.1
  sample s1,ag #play sample with supplied arguments
  sleep sample_duration s1,ag #wait until finished
  sleep 0.3## add a little extra to avoid abrupt ending
  #optional delete straight away
  #system("rm '"+s1+"'") #delete the temporary sample file
  #sleep 0.2#allow short gap before next sound
end


sayIt "[[volm 0.7]]Hi, my name is Daniel.[[slnc 400]]This program demonstrates the use of the Apple say command.[[slnc 500]]
here I use it to create an A I F F sample with one of the Apple supplied voice synths. [[slnc 400]]
This sample can then be utilised on a Sonic Pi program just the same as any other.","daniel"
sayIt "[[volm 0.8]]The sample is produced by the function say It which requires two basic parameters.[[slnc 500]]
First a text string containing the text you wish to hear, and second the voice synth name;
for example[[slnc 50]] daniel, kate, serena, jamie, or sandy to mention some of them.[[slnc 300]]Other standard parameters like a pan setting,
    or an R pitch value can also be supplied.[[slnc 600]] A new sample is produced for each sayIt command. It is named
  speech, and situated in a user specified folder. The same sample is updated each time the sayIt command is used","daniel"

sayIt "hello my fine friend.[[slnc 200]] How are you today?","kate",pan: -1
sayIt"[[volm 0.8]]I am really fine Kate[[slnc 200]]Thanks for asking.","daniel",pan: 1
sayIt"Hi there. I am Serena","serena",pan: 1

sayIt"[[volm 0.8]]and I am Jamie","jamie",pan: -1
sayIt"Did you know? You can do cool things with speech, and Sonic Pi?"
sleep 1
sayIt "Here is an echo"
with_fx :echo,phase: 0.5,mix: 0.7 do
  sayIt "Hello![[slnc 2000]]  Hello!","kate",amp: 3
end
sleep 0.3
with_fx :reverb,room: 0.8,mix: 0.7 do
  sayIt"[[rate -10]]Hi Jamie here. This has a bit of reverb added to it","jamie",pan: -1
end
with_fx :gverb,room: 15 do
  sayIt "[[rate -2]]This is Kate. Now try with some G verb. It is a bit more potent!","kate",pan: 1
end
with_fx :whammy,transpose: -12 do
  sayIt "[[rate -2]]This has a whammy effect applied to it, with transpose minus 12!","kate"
end
sayIt "The next demo will use the ring-mod fx","kate",pan: -1
with_fx :ring_mod,freq: 20 do |k|
  sayIt "[[volm 0.6]]Now I  have a bit of ring modulation applied to what I am saying[[slnc 400]] Quite like a Darlek[[slnc 500]] [[volm 1]][[rate -100]]EXTERMINATE ","daniel"
  sayIt "[[volm 0.7]][[rate 120]]exterr-mmenn-eight,","jamie",rpitch: 2 ,amp: 1,pan: 1
end
sayIt"You can use Say It to explain what is happening. For instance, here is a C major scale","serena",pan: -1
play_pattern_timed scale(:c4,:major),[0.2],release: 0.2,pan: -1

sayIt"[[volm 0.8]]And here are some chords","serena",pan: 1
ch = [chord(:c4,:major),chord(:c4,:minor),chord(:g4,:major)]
use_synth :tb303;ch.each do |x|;play x,release: 0.5;sleep 1;end

sayIt "[[volm 1]]So, that is the end of this quick demo.
  [[slnc 1000]]It shows that the Apple say command can be a cool addition to Sonic Pi on the Mac.[[slnc 400]]
This is Serena saying goodbye for now. Have fun!","serena"

#There are various embeedded commands you caon incorporate in the message string
#here are some
#[[slnc 500]] ->silcence for 500ms
#[[volm 0..1]] can also use +0.1 to raise existing level
#[[rate 100]] where 100 is the number of words per minute can also use = or - to change current setting
#[[pbas  50]] changes the pitch of the voice. can use + or - for relative changes
#[[rset]] resets parameters to default.

You can see the program in operation below

5 Likes

For others like me, who are just discovering this: Robin is using Apple “embedded speech commands” in the message string sent to the say utility (as he documented at the end of the script).

You can find out more options to give to say from it’s online manual. In a Terminal window, type man say.

A full list of speech commands is in Apple’s Speech Synthesis Programming Guide.

Also, if you update line 19 to this, then you can just copy/paste into Sonic Pi without needing to change it:

  s1="#{ENV['HOME']}/Music/sayIt-speech.aiff"
2 Likes

Neat! Had tried with a ~ (after doing a mkdir saytext in my user root) and it complained. Replacing rbn with alex worked. This is more robust.

Was experimenting with the macOS TTS a bit, some months ago, for a “project”. Honestly, I hadn’t even thought about scripting it.

I feel there’s an appetite for this type of thing, these days. We have more sophisticated singing voices (like Vocaloid and Emvoice). Yet there’s a bit of nostalgia involved with some of these. And they could be combined.
Something I might like to do is to tweak the pitch contour… possibly in Melodyne. Might even try to harmonize voices, though speaking is so different from singing.

(It also brings back some memories from my first job after graduating with a degree in anthropology: speech lab assistant in Switzerland working on a speech synthesis project linked to British Telecom. Basically, I was segmenting speech signals in Signalyze, a piece of commercial research software developed by the lab director. Since then, Praat has been the favourite option for a lot of people. It does do speech synthesis and the type of analysis it does can work really well for music, I found during my own research in ethnomusicology.)

(Our lab was also doing formant synthesis, which everybody else had tried and abandoned. AFAICT, all current TTS systems in actual use rely on some form of “diphonic synthesis”, based on samples.)

Thanks a lot for that!
And… wait! It does speech contour??

Noice! Will definitely need to experiment.