TTS speech using webquotes, and FreeSound!

jonny · April 26, 2024, 2:41pm

Been a while since I’ve tried any TTS stuff, but todays aspirations again included “integrating” webthings like FreeSound.

I think gem management is still on the SP_wishlist, and tried the workaround mentioned to copy the installed freesound gem to vendors dir, but that didn’t work, but then I tried some web stuff in SP, then tried to sanitise and synthesise… and amazingly, it worked!

I’m hoping to take this to the next level and then get these into buffers for reuse and manipulation, and exploration into fx and onset

Requirements

The data is being fetched from an api, so you’ll need some internet, and an api key.

I think this first how to cited the nasa apis*
5 ways to make HTTP requests in Ruby | Twilio

and my first google result for a quotes api found this
Quotes API - API Ninjas (api-ninjas.com) - register and verify for free api

*there are many, and I didn’t even realise the APOD one I was using was returning pictures!
NASA Open APIs

on windows using nircmd from nirsoft; winget install wincmd
for ease I placed in my windows directory as the copy in my paths folder wasn’t being found by the system call
alternatively you could use some SAPI thing via Powershell
- I looked for apis that return speech as wave files; nothing yet, but will look into this further

v0: test

system("nircmd speak text hello")
If anyone knows anyway to pipe an argument into system call please let me know!

As a workaround to this blocker we 1) write what we want to say to a file, then 2) speak from file

v1: inspirational quotes

txtfile = 'C:/temp/sp_nircmd_speak.txt'

require 'uri'
require 'net/http'

category = %w(art famous god history inspirational life love).choose
uri = URI('https://api.api-ninjas.com/v1/quotes?category='+category)
ninjakey = 'your-api-key-here'
params = { 'X-Api-Key': ninjakey}
uri.query = URI.encode_www_form(params)
res = Net::HTTP.get_response(uri)
#puts res.body if res.is_a?(Net::HTTPSuccess)

data = res.body #data needs sanitising

#quote indices
start_index = data.index('quote": "') + 'quote": "'.length 
end_index = data.index('"', start_index) - 1 # quote ends with "
#puts start_index, end_index #debug

substring = data[start_index..end_index] 
File.open(txtfile, "w") {|f| f.write(substring)} #1) write data to file
wait 0.5
`nircmd speak file C:/temp/sp_nircmd_speak.txt` #2) speak!

I tried wrapping up with a sufficiently long delay between calls, but eventually a timing error ended the fun, after wrapping the above up in a speak method (optionally passing category as explicit)

use_sched_ahead_time 2 # I always need this for short sleeps like 1.0 / 16 @ 60bpm

live_loop :l1, delay: 2 do
  tick
  sample :bass_dnb_f if (spread 3, 16).look
  sample :bd_808, amp: 1.8 if (spread 7, 16).look
  cue :speak if (factor? look, 16*16)
  sleep 1.0 / 16
end

live_loop :l2, sync: :speak do
  # stop
  # wait
  speak
  sync :speak
end

I recorded the performance but forgot SP isn’t currently speaking the sounds!

For that the speech needs writing to file, then playing… v2 coming up!

robin.newman · April 26, 2024, 4:10pm

You might like to have a look at this post I deed concerning speech on a Mac. Using text to speech on Sonic Pi (Mac version only)

jonny · April 26, 2024, 6:16pm

system(“say -v '”+voice+“’ '”+message+“’ -o '”+s1+“'”)

Thanks Robin! Of course! Not sure if I have voice control via my windows/nircmd method (jealous), but this, as always, is super helpful👌🏻

Free Sounds!

The second way in which you can download sounds is by accessing their previews .

If anyone can break down how we can do oauth2 via ruby please holler back, in interim I tried this and again amazingly it worked! Most unusual!
(Took a few stabs to fix some bugs including timing errors, but this could be useful…)

v1 testing, using just the soundid

soundid = 1234
token = '&token=yourapikey'
fs_api = 'https://freesound.org/apiv2/sounds/' 
uri = URI(fs_api + soundid.to_s + '/?fields=previews' + token)
res = Net::HTTP.get_response(uri)
data = res.body #preview download url

start_phrase = 'preview-lq-ogg":"'
start_index = data.index(start_phrase) + start_phrase.length
end_index = data.index('"', start_index) - 1
#puts start_index, end_index

dlstring = data[start_index..end_index]
##| puts dlstring
oggfile = 'c:\temp\fsogg.ogg'

dl = URI(dlstring)
response = Net::HTTP.get_response(dl)

if response.is_a?(Net::HTTPSuccess)
  # If the response is successful, write the body to the local file
  File.open(oggfile, "wb") do |file|
    file.write(response.body)
  end
  puts "File downloaded successfully."
else
  puts "Failed to download file: #{response.code} - #{response.message}"
end


load_sample "c:/temp/fsogg.ogg"
sample "c:/temp/fsogg.ogg"

jonny · April 28, 2024, 3:23pm

Windows method 2, still writing to file, but now using SAPI, to get voice control, via PS.

@robin.newman I don’t know if you or any windows whizzkids can decipher this particular puzzle piece…

I’ve opted for SAPI, to allow for voice selection. Almost everything works…

ps script (conjured with GPT) to write to file (plan to include easier voice selection with partial matching once problem a is resolved)

param (
    [string]$Text = 'Welcome to Sonic Pi',
    [string]$WavFile =  'c:\temp\test.wav',  
    # $env:USERPROFILE + '\Music\file.wav',
    [string]$Voice = 'Microsoft David Desktop',
    [int]$Rate = 1  # Default rate    
)

# Create a SpeechSynthesizer object
[System.Reflection.Assembly]::LoadWithPartialName('System.Speech')
$speechSynthesizer = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer

# Select the voice
if ($Voice) {
    $speechSynthesizer.SelectVoice($Voice)
}

# Set additional voice options
if ($Rate -ne 0) {
    $speechSynthesizer.Rate = $Rate
}

# Synthesize speech
$speechSynthesizer.SetOutputToWaveFile($WavFile)
$speechSynthesizer.Speak($Text)

# Reset the output to default (speaker)
$speechSynthesizer.SetOutputToNull()

#Write-Host 'Speech saved to file:' + $WavFile

With the parameters and defaults we’re able to optionally pipe arguments.

Tested in powershell_ise .\speak2.ps1 -Text 'Hello'

SP Sample load test

sample_free_all
s = 'c:\temp\test.wav'
sample s

Also works from elevated command prompt

powershell.exe -ExecutionPolicy Bypass -File c:\temp\test\speak2.ps1 -Text "i am jonny"

Note the double-quotes here for multi string is required.

problem a

In sonic-pi the script call works, but when attempting to pass the Text argument it doesn’t work, unless elsewhere (cmd + ps)

Weirdly even attempts to pass a single word also fail, suggesting the issue may not just be with the double-quotes…

this works (no args)

system("powershell.exe -ExecutionPolicy Bypass -File c:\\temp\\test\\speak2.ps1")

but when trying to add additional arguments, it fails to receive

example

#system("powershell.exe -ExecutionPolicy Bypass -File c:\\temp\\test\\speak2.ps1")

define :speak do |phrase = 'ok'|
  puts phrase
  system("powershell.exe -ExecutionPolicy Bypass -File c:\\temp\\test\\speak2.ps1 -Text" + phrase)
end

speak
wait 1

sample_free_all
s = 'c:\temp\test.wav'
sample s

I’ve tried escaping the quotes and other things…
As I think I’ve ran out of ideas I thought I’d throw this out there…

Also works from elevated command prompt

Retested in normal cmd and no issue there, perhaps unsurprisingly…

#system("powershell.exe -ExecutionPolicy Bypass -File c:\\temp\\test\\speak2.ps1")

define :speak do |phrase = 'here we go'|
  puts phrase
  system("powershell.exe -ExecutionPolicy Bypass -File c:\\temp\\test\\speak2.ps1 -Text \"" + phrase + "\"")
end

speak "never mind, sorted - woohoo 💥"
wait 1

sample_free_all
s = 'c:\temp\test.wav'
sample s

robin.newman · April 28, 2024, 4:33pm

Windows is not really my scene. I very really use a windows pc and only have a very ancient one lurking around. Work mainly wit Mac Raspberry Pi and one or two virtual Linux machines.

emlyn · April 29, 2024, 10:08am

Sonic Pi used to have built-in support for Freesound (by passing an integer to sample it would download and cache the sample with that ID), but if I remember correctly, it was removed after Freesound made a change to their authentication and it stopped working.
A while back I made a PR adding it back in, with some fixes to make it work again: Resuscitate Freesound? by emlyn · Pull Request #2438 · sonic-pi-net/sonic-pi · GitHub, but I’m not sure if Sam would want it back in.

Topic		Replies	Views
Gidday. I'm Phil from Australia Introductions & Stories	6	790	April 11, 2019
Hello from Seattle! I'm Caroline Introductions & Stories	5	707	October 5, 2019
Introducing FRAM, the Sonic Pi Speech Synthesizer! Creations & Ideas	5	665	September 14, 2022
RTTTL ringtone translator / parser Creations & Ideas	5	1505	November 1, 2018
Using text to speech on Sonic Pi (Mac version only) Creations & Ideas	5	768	August 23, 2023