What Is UTAU? A Brief Overview

Posted on |

What Is UTAU? A Brief Overview

UTAU is a Japanese singing synthesizer application created by Ameya/Ayame. This program is similar to Vocaloid, a professional vocal synthesis software that would inspire UTAU’s creation. Furthermore, UTAU is an independent application that is free for anyone to download and use (there is also a shareware version with special features available, but the free version on its own is already perfectly solid). The program uses .wav files the user provides to create a singing voice, which will synthesize by introducing song lyrics and melody.

UTAU, like Vocaloid, presents the user with a piano roll. Users can input notes/midis/etc. into the software, add lyrics, and tune the vocal to create a semi-realistic singer, more commonly referred to as a voicebank. Each Voicebank is unique, with its own strengths, weaknesses, voice type, range, and terms of use.​

A look at the UTAU GUI​ ​ 

UTAUloids Rise To Fame

UTAUloids have been on the rise since the shareware’s release in 2008. The most notable faces of the program are Defoko and Kasane Teto. Defoko being the built-in voice for the UTAU shareware, while Teto is the ‘face’ of UTAU. They both have immense popularity. A small fun fact: new Vocaloid fans often mistake Teto as an official Vocaloid voicebank! Check out their voicebanks in action below.

How Does It Work?

First off, we’re glad that you’re excited to jump into creating your first Voicebank, but there are a few things you need to know first!

Programs to Create Voicebanks

Fl Studio: A DAW (Digital Audio Workstation) used within the UTAU and Vocaloid communities for many years. It’s most used by these communities to mix songs/covers.

Audacity: A free-to-use DAW. It is typically used to record and splice voicebank samples.

Reaper: A popular DAW that offers a free trial period. Often used to mix songs and is rather user-friendly.

OREMO: A free-to-use program created specifically for recording Voicebanks. This program is a fan favorite for recording since it will automatically name a user’s voice samples as well as add aliases to the file name. Features a metronome and BGM function to keep recordings on time and on the tune.

setParam: Another program created specifically for use with UTAU, specifically OTO configuration. While UTAU possesses an interface for configuring OTOs, setParam offers a more intuitive interface and allows the user a much more detailed look at their recordings.

MIDI: (Musical Instrument Digital Interface) Often used in music production, it is a standard file format for communicating information between musical instruments. MIDIs can be used in the creation of UST files.

Resampler: A vital component of UTAU. This engine reads a Voicebank’s WAV files when the user plays a track in UTAU or when they render it for external use.

FRQ: Files generated by the resampler to properly read the Voicebank’s WAV samples. If you download a Voicebank that has FRQ files in it, don’t delete them! The creator may have edited them manually to fix errors and glitches in the Voicebank.

Voicebank Styles:

CV: “Consonant Vowel” recording format. The smallest and simplest style of Voicebank, this is a great choice for beginners to get acquainted with the recording process.

VCV: “Vowel Consonant Vowel”, blends together the ending vowel of a sample with the consonant-vowel pair of the next. While it is more labor intensive to create compared to a CV bank, it’s the most popular form of Voicebank for its smooth end result.

Lite VCV: A simplified/compact version of a VCV Voicebank with more smoothness than CV. A good option for those wanting to branch into VCV.

CVVC: A CV Voicebank with “VC” samples to improve clarity and smoothness. Easier to record than VCV, but trickier to properly configure. This recording style can offer more flexibility than VCV, depending on the Voicebank.

VCCV: Developed by Cz, one may refer to this as the new standard for English UTAU Voicebanks. Widely supported by the community and praised for its clarity, though it does create an Americanized accent in a lot of Voicebanks.

Rentan: A recording style specific to CV Voicebanks. Samples are recorded all at once, rather than one at a time, within the same file. After being configured in UTAU, it works just the same as a standard CV voicebank.

Multipitch: A style of Voicebank that uses multiple single voicebanks, all recorded at different pitches, into one larger Voicebank. Allows for a much greater vocal range with a more natural sound.

Kire/Powerscale: A Voicebank type where, as the voice reaches higher pitches, the recordings become more powerful. Popular in the community and useful for Rock songs.

Appends: A term originally derived from Vocaloid (specifically, Crypton Future Media Vocaloids). These Voicebanks are recorded to fit a specific theme or timbre. Examples might be “Soft”, “Whisper”, “Power”, “Dark”, etc. Commonly recorded as stand-alone Voicebanks, but may also be included in Multipitch/Multi Expression voices.

Terms To Know For Post Production

Mora: The number of syllables in a voice sample. For example, “a-a-i-a-u-e-a” is a 7-mora style recording.

Prefix map: An important file for Multipitch Voicebanks, this is how UTAU knows what voice samples to play on which notes.

Tuning: In which the user warps, bends, and/or changes the pitch of a voice track in a Vocal Synthesis software. This is done to change the way the voice sings a song in order to make it more unique or more human-like.

Constant Velocity: A configuration in UTAU that can be changed per track or single note. This affects how quickly the consonant part of the voice sample plays. Usually, this is used to avoid a “slurring” sound in playback for quicker songs.

Flags: Codes used by the Resampler to alter the voice properties of the UTAU. They can be used to add breathiness to a voice, reduce nasal tones, make the voice sound more masculine/feminine, and much, much more. The Flags are typically defined per Resampler, so some may offer different effects than others.

Alias: A name given to voice samples in the OTO. This tool can be used to assign multiple names to the same recording, which is commonly used in VCV banks.

Mixing: The process of taking vocal tracks and combining them with an instrumental in a pleasing way.

File Types

.WAV: The file format UTAU voice samples are recorded in. WAV is the only file type UTAU will use for a Voicebank on Windows computers.

.UST: UTAU Sequence Text Files. Similar to sheet music or a MIDI file (which can be turned into a UST), this is the main file type used in UTAU to store information about a voice track.

OTO: Also known as an oto.ini, this is the file used to tell UTAU how to distinguish between the starting point of a sample, where the consonant begins, where the vowel is, the cut off of the sample, and how much of the sample is okay to stretch on longer notes.

Additional Terms to Know When Working With UTAU Software

UTAU: The name of the software, it is also the Japanese word for “Sing”.

Vocaloid: Perhaps the most well-known Vocal Synthesis program, developed originally for use by professionals. It is a commercial program that requires the user to purchase the base software as well as each additional voice they may want to use.

UTAUloid: An older community term used to refer to a specific UTAU character. This term originates from “Vocaloid” and is in use alongside it.

Pitch: How high or low a tone is.

Timbre: The character or quality of a voice, different from pitch or intensity. Youthful, gruff, feminine, etc. could all be descriptors of timbre.

Vocal synthesis: The artificial production of human singing voices/voice-like instruments, much like speech synthesis. Common term when referring to UTAU, Vocaloid, and other similar applications. 

Voicebank:  A collection of voice samples and OTO(s) compiled for use within UTAU as a functioning singing voice. A Voicebank is usually alongside a character or mascot that represents the voice. Often referred to shortly as a “Bank”. 

Vipperloid(s): A popular series of Japanese UTAU. You may recognize members such as Yokune Ruko and Sukone Tei. They originated from vip@2ch.

Nico Nico Douga/ニコニコ動画: Similar to a Japanese version of YouTube, this video-sharing website is incredibly popular with the Japanese Vocal Synth community. 

Nikokara/ニコカラ: A Nico Nico Douga supported service that displays Hiragana song lyrics across a video.

Let’s Put Those Terms To Use!

Check back soon for a more in-depth look at the UTAU software!

Need more assistance with UTAU and creating your very own voicebank? STUDIO OGIEN has compiled resources to use with the UTAU software. Check it out here! If you can’t find what you’re looking for, please let us know through our contact form or leave a comment on this article. We can’t wait to see what you create!

Terminology and information referenced from utau.us, PRISMOID, and Wikipedia.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Voicebank Progress


Kasai SALIENT
100%
Honos VALOR
5%
Apollo PRIME
0%
Theia MONARCH
0%
0
Would love your thoughts, please comment.x
()
x