Speech segmentation models and their importance
– Whole-word access model
– Decomposition model
– Combined whole-word and decomposition model
– Limited experimental evidence for discriminating between models
– Contextual clues and probabilistic nature of lexical recognition
– Decompositional analysis necessary for other processes
– Contextual clues provided by lexical recognition
– Example of probabilistic word completion
– Different meanings depending on word segmentation
– Potential for advanced pattern recognition and AI technologies
Applications of lexical recognition
– Enhancing computer speech recognition
– Building and searching a network of semantically connected ideas
– Statistical models for speech segmentation and alignment
– Applications in animation, video sub-titling, and linguistic research
– Availability of commercial segmentation and alignment software
Phonotactic cues in speech segmentation
– Difficulty in identifying boundaries between lexical units
– Lack of pauses in normal speech
– Coarticulation and its effect on vowel and consonant production
– Language-specific changes in casual speech
– Phonotactics as a guide for word boundary placement
Phonotactic cues in different languages
– English phonotactics inhibiting certain interpretations
– Examples of phonotactic cues in English words
– Vowel harmony in Finnish providing cues
– Coexistence of vowel harmony and morphemes in compounds
– Importance of phonotactic cues in distinguishing word boundaries
Speech segmentation in infants and non-natives
– Infants rely on phonotactic and rhythmic cues, with prosody being the dominant cue, for speech segmentation.
– Between 6 and 9 months, infants become sensitive to the sound structure of their native language.
– English-native infants approach stressed syllables as the beginning of words.
– Infants can segment bisyllabic words with strong-weak stress patterns, but weak-strong stress patterns are often misinterpreted.
– Infants show complexity in tracking frequency and probability of words.
Note: The related concepts, challenges for language learners, and further research needed sections do not have identical concepts to be combined.
Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing.
Speech segmentation is a subfield of general speech perception and an important subproblem of the technologically focused field of speech recognition, and cannot be adequately solved in isolation. As in most natural language processing problems, one must take into account context, grammar, and semantics, and even so the result is often a probabilistic division (statistically based on likelihood) rather than a categorical one. Though it seems that coarticulation—a phenomenon which may happen between adjacent words just as easily as within a single word—presents the main challenge in speech segmentation across languages, some other problems and strategies employed in solving those problems can be seen in the following sections.
This problem overlaps to some extent with the problem of text segmentation that occurs in some languages which are traditionally written without inter-word spaces, like Chinese and Japanese, compared to writing systems which indicate speech segmentation between words by a word divider, such as the space. However, even for those languages, text segmentation is often much easier than speech segmentation, because the written language usually has little interference between adjacent words, and often contains additional clues not present in speech (such as the use of Chinese characters for word stems in Japanese).