Phonetics deals with the study and classification of speech sounds in a language. It is fitting, therefore, to know that the Japanese alphabet is, for what it's worth, more of a table of sounds rather than a listing of characters. Understanding the phonetics, moreover, gives the student a stronger grasp of the peculiarities of Japanese speech, and a strong foundation in learning the kana and why it takes the form it takes today.
The Japanese language, for all intents and purposes, is a "sound-poor" language in the sense that instead of having an alphabet that contains consonants and vowels that can be combined in many different ways to produce different sounds, the Japanese "alphabet" contains fixed sounds, which can be summed up as 45 basic sounds, with 77 derived sounds. These sounds are represented by their distinctive kana, and are known as the hiragana and katakana.
Although it's beyond the scope of this article to explain the origins of these two writing systems, suffice it to know that the purpose of these alphabets is to represent certain sounds in the form of a syllabary, meaning each character represents a syllable for use in the written language.
This section is divided into three sections. Click on the links below to jump directly to that section.
Speech and Timing
A single character, or syllable, in the Japanese Kana can be either a vowel, or a combination of a consonant and a vowel.
Vowels > a, i, u, e, o (are syllables in themselves)
Consonants > k, s, t, n, h, m, y, r, w
Combined Consonant and Vowel > ka, ki, ku, ke, ko (syllables with representative kana)
These combinations make up the basic sounds of the Japanese alphabet, which adds up to 45 different sounds (note that the sounds "yi", "ye", "wi", "wu" and "we" are no longer standard sounds). Remember that because of these combinations, vowel sounds are more or less rigid throughout the basic sounds. The "a" sound in "ka", for example, would be the same as the "a" sound in "sa".
Derived sounds are the result of variations in consonant sounds (i.e. voiced and plosive sounds) and dipthongs (vowel sound combinations in a single syllable).
Consonants are otherwise known as "shapes", in that they literally "shape" the sound of a vowel. In this sense, the sound "ka" is distinct from the sound "da", despite both having the same "a" vowel sound. This is because consonants shape vowel sounds by shifting the various voice forming elements in your head, which include (but are not limited to) the tongue, soft palate, glottis, teeth, and lips. Consonants take specific formations in the mouth, but notable are the voiced consonants.
Consider the following consonant, "k". This is formed by raising the rear of the tongue to make it touch the soft palate (if you're having a hard time visualizing this, see this image as reference). To make the "k" sound, we simply blow air out, and the tongue quickly retracts out of the way. Now, consider the consonant "g". To make this consonant, the same formation is done as that for "k", but what do you think is the difference, here? The difference is that you have to make a sound with your vocal cords to produce a "g" sound, while you didn't have to do this for the "k" consonant. The consonant "g", in this sense, is a voiced consonant, because it requires your voice in order to produce it.
Because of this, "k" and "g" are commonly referred to as consonant pairs, where "k" is known as the silent consonant and "g" is the voiced consonant. Voiced consonants are represented in the kana by what's known as a diacritical mark, which appears like a quotation mark ( " ) beside the kana:
ka > か
ga > が
The consonants "s", "t", and "h" also have their own voiced variants, as shown here:
(silent) "s" > "z" (voiced)
"t" > "d"
"h" > "b"
It's interesting to know, therefore, that if you've memorized the basic sounds, you've already memorized 20 of the derived sounds using voiced consonants based on these consonant pairs. The diacritical mark on the kana simply indicates the use of a voiced consonant over the silent consonant.
There is only one plosive consonant in the Japanese language, and this is the consonant "p". It's called "plosive" because it requires you to close your lips (to stop airflow), after which the air is suddenly released, like a little "exPLOSION". This is the characteristic sound you hear in the "p" consonant. Strangely, however, this consonant is grouped together with the consonant pair "h/b", and is differentiated with a plosive mark ( ˚ ) beside the kana (commonly referred to as the "little circle").
The reason for the grouping with the "h/b" pair isn't so much due to similarities in the formation of the consonant, but more so due to the mechanical similarities in forming the "b" and "p" sound. They are similar in the sense that both require the mouth to be closed completely in order to be sounded. This has implications in the transformation of other consonants, which will be covered in future lessons.
h/b/p > ha/ba/pa > は／ば／ぱ
This adds another 5 sounds to the already 20 derived sounds, giving us 25 different derived sounds from variations in consonant sounds.
A diphthong is a combination of two different vowels, but voiced as a single syllable. A simple example in the English language would be "owl" (made with the sounds "ah" and "oo"). Diphthongs are made as a result of combining the semivowel "y" with the sounds "ki", "shi", "chi", "ni", "hi", "mi", "ri", "gi", "ji", "bi" and "pi". There are other diphthong combinations present, but are only found in their katakana variants, and are meant exclusively for loan words, and not local words.
The formation of a diphtong is quite literally a combination of sounds, as can be seen in this simple example:
き＋よ = きょ > ki + yo = kyo
A diphthong is indicated with a small "y" kana, be it "ゃ", "ゅ" or "ょ" (recall, again, that "yi" and "ye" are no longer standard sounds in the kana). Note that a diphthong is recognized as a single syllable, and this has implications on pronunciation and timing, as will be explained later.
Given 3 different combinations, that makes 33 additional derived sounds. Not delving into the specifics, 19 additional diphthongs for loan words rounds the total derived sounds to 77 (25 + 33 + 19).
Applications of syllabic construction
Perhaps the most important thing to know about this section is that paired consonants are important to know when considering how a silent consonant may transform into a voiced consonant in different kanji combinations. The kunyomi reading for "人" ("person"), for example, is "hito", but transforms into "bito" in the compound word "恋人", which is read as "koibito" ("lover"). The "h", here, has been transformed into its voiced equivalent. With this in mind, remember that kanji characters, though written in a standard form, may experience transformations in sounds depending on their combinations. It should not come to you as a shock, therefore, if the resulting transformation causes a silent consonant to become a voiced consonant.
Diphthongs are also important to consider when we run into words that are highly dependent on their single syllable nature. A common example is 病院 ("hospital") and 美容院 ("beauty parlor"). The former contains a diphthong, and is romanized as "byôin", while the latter has no diphthong, and is romanized as "biyôin". To understand the significance in this distinction, we will now discuss pronunciation and timing and see how syllabic construction has much to do with how these words are pronounced.
My thoughts on Anime @ www.animananime.com
As a preamble to any guide to learning pronunciations of any language, it should be kept in mind that the best method of learning is to simply listen and observe the pronunciation of native speakers. Any attempts to find English equivalents for the sounds of the Japanese language is limited, if not unavailable, and the only purpose of guides, such as these, are just that - to guide students of the language in learning.
As mentioned earlier, the Japanese language is "sound-poor" in that it is made up of fixed vowel sounds that don't change much in their constructions with consonants. It is, thus, reassuring to know that only 5 basic vowel sounds exist (otherwise known as "short vowels"). Many guide books give useful, although not faithful, English representations of Japanese language vowels in their introductions; however, the most faithful rendition I have seen would probably have to be that found in the Random House Japanese-English: English-Japanese Dictionary by Seigo Nakao, which appears as follows:
"a" as in father
"i" as in feet
"u" as in mood
"e" as in met
"o" as in fort
There are other vowel sounds besides these five basic vowel sounds, and are known as long vowels and combined vowels. The former will be tackled in a different lesson, but the latter tends have a gliding quality from one vowel to the next. The combined vowel sound "ai", for example, has a sound quite like that found in the English word "buy". At this stage, it is perhaps easier (and less stressful) to treat these combined vowels as individual sounds, but keep in mind that they are not as rigid in pronunciation as individual consonant-vowel pairs.
Japanese consonants are, more or less, similar to those found in the English language. Certain peculiarities, however, need to be noted:
The consonant sound "ch" is ALWAYS pronounced as in "chair" or "church", and never as in "choir" or "character".
The consonant sound "g" is ALWAYS pronounced as in "gift" or "get", and never as in "ginger" or "gesture". The sound, however, may transform into a nasalized "ng" sound when preceded by certain vowels, as well as the syllabic "n".
shigoto > shingoto
Note, however, that this change is still recognized as "g" by native speakers, and is only noted for the sake of peculiarities in pronunciation.
There is no English equivalent for the consonant "ts", but Seigo Nakao mentions its likeness to the sound in the word "footsore". It should be noted, however, that the sound in this example should be taken as a single "glide", and not as a stop between the two words in the compound word.
The consonant "f" found in the syllable "fu" sounds more like an "h" sound to the foreigner. This is because the Japanese "f" requires the speaker to narrow his/her lips (as in kissing), as opposed to a drawing of the teeth to the lower lip in the English "ef". This explains why "fu" takes the place of the character "hu" in the Kana table.
The Japanese "r" is unlike the "retroflex" English "r", found in American speakers when they pull the tongue backwards (or more accurately, "upwards" to face the hard palate. See this image if you're having a hard time visualizing this). Instead, the Japanese "r" requires the speaker to place the tongue just behind the upper, front teeth (called a "dental r"), after which it is pulled down. This gives it a sound closer to the English "l" to foreigners, and explains why "r" is used for foreign loan words containing the consonant sound "l".
The consonant "n" has its own peculiarities, and will be considered in a different lesson.
Faint Vowel Sounds
I separated this section so as not to preclude any assumptions regarding vowel pronunciations. More than anything, this section explains a peculiarity rather than a rule regarding how the Japanese pronounce certain vowels. In other words, you won't be told by a native speaker that so-and-so a vowel is pronounced faintly, simply because a Japanese listener would hear the vowel that would otherwise be misconstrued as "faint" by the foreigner. A common example of this is the "i" sound in "hito" ("person"), as it is oftentimes heard as "hto" or "shto" by the foreigner. A Japanese speaker would still recognize the "i" sound, despite the muting, but a foreigner might not.
The vowels "u" and "i" are of primary concern in this camp. The "u" sound becomes mute in some instances preceding a consonant, such as in the word "sukoshi" ("a little bit"). It also becomes mute in some cases when it comes at the end of a word, such as in the sentence ender "desu". These two words, therefore, are pronounced as "skoshi" and "des", respectively.
The vowel "i" becomes a whispered sound (sometimes even as a whisp of air) in the syllables shi, chi, hi, ki and pi, but most especially so in the syllable shi. This occurs when the syllables precede the consonants ch, f, h, k, p, s, sh, t, or ts. This list might sound formidable, but it occurs rather frequently, as can be found in the example "shiken" ("examination"), where the pronunciation becomes "shken". The previous example for "hito" also follows this.
Long Vowels and Double Consonants
These two features of the Japanese language are better understood when one understands "timing" in the speaking of characters. That said, the next section will discuss timing, while the full topic on long vowels and double consonants will be tackled in another lesson.
My thoughts on Anime @ www.animananime.com
Speech and Timing
If we were to describe how words are spoken in Japanese, the word "choppy" probably wouldn't be all that unwarranted. For what it's worth, the Japanese language does have an element of choppiness to it, and you may even feel the choppiness when practicing the timing of certain words. As you advance, however, you'll learn to see the nuance of this choppiness and how it actually adds to the level of differentiation between words and their respective meanings. In fact, you may begin to lose the choppiness as the words become less alien, and begin to flow from your mouth. The previous example, "byôin" and "biyôin", for example, rely HEAVILY on timing in order for them to be sounded correctly.
Rules on Timing
The rule is rather simple. All syllables receive 1 count with more or less equal stress. The first part of this statement means that all the 45 + 77 sounds you learned from the kana are considered "single beat" sounds with 1 count:
1) 父 > ちち > chichi = 2 counts
2) 相手 > あいて > aite = 3 counts
3) 林檎 > りんご > ringo = 3 counts
4) 定食 > ていしょく > teishoku = 4 counts
5) 教科書 > きょうかしょ > kyôkasho = 4 counts
Examples 1) and 2) show a simple combination of sounds, where the latter has a combined vowel sound (i.e. "ai"). Example 3) contains the syllable "n", which also takes one count (despite it not having any vowel). Example 4) contains a diphthong (i.e. "sho"). Taking these examples, counts simply mean that each individual part of a word receives an equal amount of time to be sounded out. This is perhaps trickiest to perform on the syllabic "n", as a pause will occur in giving this syllable one count (this will be discussed in full in a later lesson).
The last example, example 5), is a special case because it contains a long vowel "kyô". Long vowels differ from diphthongs in that they receive 2 counts. This is where the second part of the rule comes in. Though individual syllables are pronounced with more or less equal stress, long vowels have the tendency to sound accented (or "stressed") because of the fact that they take 2 counts instead of just a single count. So as not to preempt a future lesson, all this means is that "kyôkasho" has the tendency to be sounded as "KYO-kasho" with an extra emphasis on the "kyo" syllable (although this is not totally necessary). This also brings back the difference between 病院 ("byôin") and 美容院 ("biyôin"). The former has only 4 counts (roughly sounded as "BYO-i-n"), while the latter has 5 counts (in this case, "bi-YO-i-n"), where the capital letters in the pronunciations represent long vowels with 2 counts, each.
The difference may seem subtle, but to a native speaker, there is a noticeable difference between the two. Take the time to try and practice timing by clapping at each cited syllable. Keep in mind that the syllabic "n" receives one count, as well. An age old trick question as to how many syllables there are in the word 新幹線 ("shinkansen") fools many a student with the incorrect answer of 3 syllables. The truth is that this word actually has 6 syllables, and thus gets 6 counts! Try reciting it while clapping 6 times and see if you can do it!
Some people have called this the "high's and low's" of Japanese speech, when in reality, it is simply an accenting of one sound over others in a word. A typical example is the difference in pronunciation between 箸 ("chopsticks") and 橋 ("bridge"). Though both are said as "hashi", the former has an accent on the first syllable, while the latter has an accent on the second. Hence, "chopsticks" is said as "HAshi", while "bridge" is said as "haSHI".
It is too cumbersome to have to note the differences in accents between words with identical pronunciations, but note that native speakers may tend to speak with these accents intact. It's reassuring, however, to know that most native speakers can work out the meaning of words without the accents based on context, alone. It's safe, therefore, to simply do away with the accents altogether by maintaining the second part of the rule regarding timing (leaving syllables with more or less equal stress), but keep in mind that you shouldn't overdo it and remove any form of pitch changes in your speech, as this tends to make the speaker sound even more so alien than attempting to add the accents on the spot.
Long story short, just be aware that there are accents, but you don't need to know them if you don't want to.
Final notes on pronunciation and timing
We've come to the end of this section, but a few things should be remembered by the student. Long vowels, double consonants, and the syllabic "n" were separated from this section to reduce the complexity of the subject. Take note that learning these three additional peculiarities in phonetics adds additional notes to the rules we discussed just now. Take everything you read here, therefore, as a stepping stone and not as a period on the text. To finish your studies on Japanese phonetics, continue reading on the sections for each of these additional subjects.
My thoughts on Anime @ www.animananime.com