Sinsy - Singing Voice Synthesizer

What is Sinsy

Sinsy is an HMM-based singing voice synthesis sytem that can generate audio files with singing based on MusicXML notation and the given voicebank and dictionaries. It is released under a modified BSD license.

You can try the demo in the official website, but using the website is subject to terms so we will focus on the program itself.

Voicebanks

A voicebank is needed to generate the audio. There are several voice banks on the Sinsy website that are not available anywhere. The only voicebank that comes with Sinsy is HTS Voice "NIT SONG070 F001" version 0.90.

HTS Engine

Sinsy depends on the HTS Engine API, which is software to synthesize speech waveform from HMMs trained by the HMM-based speech synthesis system (HTS). It is released under a modified BSD license.

Compiling and Installing

HTS Engine API

First, we must install the HTS Engine API. Download the HTS Engine API tarball and run the standard configure/make commands.

./configure --prefix=/usr
make
sudo make install

Sinsy

Then, install Sinsy. First, download the Sinsy tarball. We need to pass the paths of the HTS Engine API to Sinsy when installing. The prefix above is /usr so we will use that. Use the following commands (also see the INSTALL file).

./configure --prefix=/usr
  --with-hts-engine-header-path=/usr/include \
  --with-hts-engine-library-path=/usr/lib
make
sudo make install

I was getting errors when compiling so I prepared this patch to fix them.

diff -ruN --text sinsy-0.92.orig/lib/util/Configurations.cpp sinsy-0.92/lib/util/Configurations.cpp
--- sinsy-0.92.orig/lib/util/Configurations.cpp       2015-12-25 03:46:56.000000000 +0000
+++ sinsy-0.92/lib/util/Configurations.cpp    2019-08-11 11:28:36.248505819 +0100
@@ -129,7 +129,7 @@
             }
          }
       }
-      configs.insert(std::make_pair<std::string, std::string>(key, value));
+      configs.insert(std::make_pair<std::string, std::string>(std::move(key), std::move(value)));
    }
    return true;
 }
diff -ruN --text sinsy-0.92.orig/lib/util/MacronTable.cpp sinsy-0.92/lib/util/MacronTable.cpp
--- sinsy-0.92.orig/lib/util/MacronTable.cpp  2015-12-25 03:46:56.000000000 +0000
+++ sinsy-0.92/lib/util/MacronTable.cpp       2019-08-11 11:28:36.248505819 +0100
@@ -136,7 +136,7 @@
       extractPhonemeList(st.at(1), result->forward);
       extractPhonemeList(st.at(2), result->backward);

-      if (false == convertTable.insert(std::make_pair<std::vector<std::string>, Result*>(pl, result)).second) {
+      if (false == convertTable.insert(std::make_pair<std::vector<std::string>, Result*>(std::move(pl), std::move(result))).second) {
          ERR_MSG("Wrong macron table (There is a duplication : " << st.at(0) << ") : " << fname);
          delete result;
          return false;
diff -ruN --text sinsy-0.92.orig/lib/util/PhonemeTable.cpp sinsy-0.92/lib/util/PhonemeTable.cpp
--- sinsy-0.92.orig/lib/util/PhonemeTable.cpp 2015-12-25 03:46:56.000000000 +0000
+++ sinsy-0.92/lib/util/PhonemeTable.cpp      2019-08-11 11:28:36.248505819 +0100
@@ -180,7 +180,7 @@
       for (size_t i(1); i < sz; ++i) {
          pl->push_back(st.at(i));
       }
-      if (false == convertTable.insert(std::make_pair<std::string, PhonemeList*>(st.at(0), pl)).second) {
+      if (false == convertTable.insert(std::make_pair<std::string, PhonemeList*>(st.at(0), std::move(pl))).second) {
          ERR_MSG("Wrong phoneme table (some syllables have same name : " << st.at(0) << ") : " << fname);
          delete pl;
          return false;

To apply the patch, save it to a file called a.patch and run

patch --forward --strip=1 --input=a.patch

NIT SONG070 F001 Voicebank

Finally, we install the voicebank. Just download the voicebank tarball and copy the voicebank somewhere. We will copy it to /usr/share/hts-nit-song070-f001/nitech_jp_song070_f001.htsvoice. We will also copy the sample MusicXML file that comes with it to /usr/share/hts-nit-song070-f001/SAMPLE.xml.

How to Use

Sinsy needs 5 pieces of information to run:

  • The language to use
  • The file to save to
  • The dictionary path
  • The voicebank to use
  • A MusicXML file (can be created using MuseScore)

The only supported language at the moment is Japanese, so we use -w j. We will use -o test.wav for the output wav file, -x /usr/dic (where Sinsy installed its dictionaries) for the dictionary directory and -m /usr/share/hts-nit-song070-f001/nitech_jp_song070_f001.htsvoice for the voicebank. Finally, we can just use the sample MusicXML file that came along with the voicebank: /usr/share/hts-nit-song070-f001/SAMPLE.xml.

sinsy -w j -o test.wav -x /usr/dic \
  -m /usr/share/hts-nit-song070-f001/nitech_jp_song070_f001.htsvoice \
  /usr/share/hts-nit-song070-f001/SAMPLE.xml

This should create a file called test.wav with the singing.

Arch Users

I have created AUR packages for all of the above. Arch users can just install the sinsy package and it will install everything mentioned above. You can just run the sinsy command as above and it will work out of the box.

Phoneme Reference

See the Sinsy Reference manual for phoneme information and how to input Japanese and English lyrics.

By Alexandros Theodotou in
Tags : #audio, #music, #gnu, #linux, #aur,