How to Use CV UTAU Voicebanks

Posted on |

How to Use CV UTAU Voicebanks

Introduction to CV UTAU

Those who are new to UTAU may find the idea of Voicebanks a little confusing. VOCALOID standardizes its voices, making utilization simple. VOCALOIDs recorded in specific languages essentially function the same, making the inter-usage of assets a breeze. Since UTAU is a community-based tool, many different styles and techniques have been invented over the years. In short, standardizing UTAU is pretty much impossible, making the learning curve a bit more intense.

The first Voicebank style invented for UTAU was “CV” (known as 単独音 tandokuon by Japanese users), or “Consonant-Vowel” style recordings back in 2008. Phonemes (smallest unit of speech distinguishing one word from another) in the Japanese language are rather simple, usually consisting of one consonant and one vowel. You may hear the phrase “Diphones” used when referring to CV Voicebanks. CV uses two phonemes for each sound in its library.

However, with its strong points come its downfalls. CV tends to be “choppy” and more robotic than more complex styles like CVVC, VCV, and VCCV. Still, it is where most users suggest starting out when recording and using the software, and many senior members of the community still utilize this recording method.

Romaji vs Hiragana

Before we begin, we need to establish a very important difference between Voicebanks! Depending on where the Voicebank was developed, it may be written in Romaji or Hiragana. Taught to children, Hiragana is the most basic style of the Japanese alphabet, and it is mostly written in diphones. It is a phonetic lettering system, meaning the symbols portray sounds rather than words. The word hiragana literally means “ordinary” or “simple”. Romaji is a phrase used to refer to the Romanization of Japanese words and sounds. 

So, how does that pertain to UTAU? Simply, a Japanese UTAU Voicebank can be written in either style and so can a UST. See where this is going? If the styles don’t match up, your UTAU won’t make any sound. How do we fix this without manually fixing every note? 

There are a few different ways, but let’s go with the easiest option. Plugins can handle a lot of these issues, and we have a few personal favorites on our OGIEN UTAU Suite page. Head on over to the page and download iroiro, a plugin that can actually convert the UST to Hiragana or Romaji. Follow the instructions for iroiro’s installation and you should be ready to go!

How to Use CV

CV Voicebanks are perhaps the easiest to use and record, and they are a great starting point for any beginner to UTAU. For the most part, you can import a UST file into UTAU, select a CV Voicebank, and hit play. However, for the sake of a better sound, many with experience in the software may tell you to “fit” the UST.

Fitting a UST file to a UTAU Voicebank will definitely improve the sound and make your covers sound more professional. By fitting the UST, you are telling the software to conform to that particular UTAU’s configurations and setup (OTO). This is an important step if the UST was not explicitly made for the UTAU you are using, and it helps to improve the smoothness and clarity of the voice. So, let’s do it!

How to Fit a UST to a CV Voicebank

  • Open a UST file
  • Select all (Ctrl+A)
  • Right-click on a note
  • On the pop-up, select “Property”. A new window will appear.
Properties Window
  • You may notice sections on this window labeled “Preutterance” and “Overlap”. To their right, there is a “Clear” button. We want to click that.
  • Next, at the bottom of the window, there is a box labeled STP. 
    1. If it has a value, delete it. 
    2. If the box is grayed out, double-click the box to clear it.
  • Press “OK”
Crossfade Buttons
  • In the top right of the main window of UTAU, you will see a group of four buttons (ACPT, P2P3, P1P4, RESET).
    1. Click RESET
    2. Click P2P3
  • That’s it! You’ve fit the UST to your CV Voicebank

One final tip for optimal smoothness: Crossfades.

The Crossfade function crosses the envelopes of a vowel sound and the preceding one as well. To use crossfading in your UST:

  • Select the notes
  • Go to Tools, then Built in Tools
  • Select Crossfade
  • Finally, press OK


Some users may experience odd glitches. If you play the track back and notice slurring happening, you may want to change what notes you select. In our experience, selecting only the notes tends to help. Sometimes, hitting Ctrl+A selects rests and other unique settings that cause the fit to mess up. Click the first note of the vocal track, and then scroll to the end. Select the last note by holding down Shift, then left-click the lyric. Fit the UST again, and it should work!

Founder & Artist | + posts

Seran, the founder of STUDIO OGIEN, established the platform in 2014 as a medium to showcase her creative works and stories. With a strong professional background in web development and a lifelong interest in technology, she holds a particular fascination for vocal synthesis. Dreaming of becoming an author, she channels her commitment into crafting captivating narratives through STUDIO OGIEN. She hopes to highlight her genuine dedication to her craft and unwavering pursuit of art through the studio, where she integrates her love for technology and vocal synthesis into her works.

Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x