Build your TextToSpeech service with FreeTTS

May 6, 2016 JAVA, PHP

A simple http enabled text to speech server

Introduction
One of the technologies I’m more fond of is domotics. I don’t have the budget for a java enabled oven, but with some x10 plugs I’ve recently had a lot of fun building a system where a midlet on my mobile periodically polls my home server communicating its current gsm cell so that my oven can be switched on when I’m coming back home. I know, it’s kitsch ( and expensive, depending on your gprs provider), but I love this kind of useless things.
Last month, an old dream came back to my mind; my home system talking to me. Easy to achieve today. I’m quite obsessed with web services, system integration and so on, and I wanted to build some kind of talking appliance directly exposed to all of my systems (desktop, laptop, mobile etc), so this time I opted for a simple http approach, no SOAP, xml-rpc or raw socket.
Then I had to choose my TextToSpeech engine. I had already messed around with FreeTTS, a speech synthesizer written entirely in the JavaTM programming language, and had been quite satisfied with it.
It was also a good chance to use jdk’s com.sun.net.httpserver.HttpServer class to implement the simple http server I needed. So I had everything to build my “talking appliance”. Note that FreeTTS includes in its sample codebase a client/server solution which is different from what I’m explaining in this article. In our case we will have an appliance with server code, an audio board and speakers whereas in FreeTTS client/server sample, the server produces the audio but it is played by the client.

Overview
I’ve used a jdk 1.5 and FreeTTS 1.2 .You obviously need to put FreeTTS jars in your classpath. Nothing else is required and, being the tts engine 100% java, it is also platform independent.
With just 2 simple classes, our talking engine will be ready. I’m not going to enter into the details of com.sun.net.httpserver.HttpServer, the only thing we need to know about it for our purposes is that we can create a context associated to handlers. The code will clearly explain all of this.
Imagine we want to have an http service listening on port 8000, with an http context like “/ttsserver/say”, which will “speak” the URI query ( the text after “?”).
A tipical call will be something like this:
“http://localhost:8000/ttsserver/say?How’s the weather like today”

The code
Let’s first code the main class. It will just initialize and start the http server. Here is the code:

package org.beanizer.ttsserver;

import com.sun.net.httpserver.HttpServer;
import java.io.IOException;
import java.net.InetSocketAddress;

public class Server {
    private HttpServer server=null;    
    public void start(){
        try {
            server = HttpServer.create(new InetSocketAddress(8000), 0);
            server.createContext("/ttsserver/say", new VoiceHandler());
            server.setExecutor(null); 
            server.start();        
            
        } catch (IOException ex) {
            ex.printStackTrace();
        }
    }
    public static void main(String args[]){
        Server s=new Server();
        s.start();
    }
}

The main method creates an instance of the class and calls its start method.
Let’s analyze the code within the try/catch block in start:
1) an http server is created listening on tcp port 8000.
2) The context “/ttsserver/say” will be handled by VoiceHandler, which is the second class we’ll code.
3) We create a default executor (an object that executes submitted Runnable tasks).
4) Finally we start the http server.

We need just one more simple class to handle http requests received on the created context.

package org.beanizer.ttsserver;

import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;
import com.sun.speech.freetts.util.Utilities;

public class VoiceHandler implements HttpHandler{
    private Voice voice;
    
    public VoiceHandler() {
        try {
            voice = VoiceManager.getInstance().getVoice(
                    Utilities.getProperty("voice16kName", "kevin16"));
            voice.allocate();
        } catch (Exception e) {
            e.printStackTrace();
            System.exit(1);
        }
    }
    public void handle(HttpExchange t) throws IOException {
        voice.speak(t.getRequestURI().getQuery());
        String response = "Ok";
        t.sendResponseHeaders(200, response.length());
        OutputStream os = t.getResponseBody();
        os.write(response.getBytes());
        os.close();
    }
}

Our VoiceHandler class extends com.sun.net.httpserver.HttpHandler , overriding its handle method to manage incoming requests.
In the class constructor we grab an instance of FreeTTS VoiceManager, get a Voice out of it using the Utilities class, and then allocate it. At this point the voice is ready to be used with a simple speak(String) call.
One note about setting the voice. FreeTTS comes with a couple of included voices: an 8k quality, 16k quality (the one we’re using here) and a limited domain one, with better quality but specific(limited) for speaking date/time. On the net it is possible to find other voices also for different languages and it’s even possible to build your own voices(though not easy). Read here for this.

Back to our class. The handle method will:
1) “speak” what’s been passed with the http request (everything after the question mark in the URI)
2) send an “Ok” response to the caller.

To start our service, make sure you have FreeTTS jars in your classpath and launch org.beanizer.ttsserver.Server class with java.

Conclusions
This is basically all. It is possible to extend this structure to manage an access control list, parameters for choosing the voice etc., but just with these two trivial classes every device capable of making an http call (included browsers) can make our appliance speak.
At home I’m using it for the most disparate things, and it’s really funny and addicting. Some of the uses:
1) Some important log messages (opportunely filtered to avoid “spam”)
2) I don’t like phone ring tones, so my asterisk server just commands the tts server to say who’s calling me, using the name instead of the caller number for known ones.

Currently I’m working on a similar service for voice recognition, but this is a completely different beast…..Build your TextToSpeech service with FreeTTS

Build your TextToSpeech service with FreeTTS

LEAVE A COMMENT Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta