Synthesis Lectures on Speech and Audio Processing - Böcker
Visar alla böcker i serien Synthesis Lectures on Speech and Audio Processing. Handla med fri frakt och snabb leverans.
14 produkter
14 produkter
303 kr
Skickas inom 10-15 vardagar
In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility.
356 kr
Skickas inom 10-15 vardagar
Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech “chain” starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into computerized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process.What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem.After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing
356 kr
Skickas inom 10-15 vardagar
Principles / Introduction / Latent Semantic Mapping / LSM Feature Space / Computational Effort / Probabilistic Extensions / II. Applications/ Junk E-mail Filtering / Semantic Classification / Language Modeling / Pronunciation Modeling / Speaker Verification / TTS Unit Selection / III. Perspectives / Discussion / Conclusion / Bibliography
356 kr
Skickas inom 10-15 vardagar
In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum–Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice. Table of Contents: Introduction and Background / Statistical Speech Recognition: A Tutorial / Discriminative Learning: A Unified Objective Function / Discriminative Learning Algorithm for Exponential-Family Distributions / Discriminative Learning Algorithm for Hidden Markov Model / Practical Implementation of Discriminative Learning / Selected Experimental Results / Epilogue / Major Symbols Used in the Book and Their Descriptions / Mathematical Notation / Bibliography
356 kr
Skickas inom 10-15 vardagar
The presented methods include both single- and multi-pitch estimators based on statistical approaches, like maximum likelihood and maximum a posteriori methods, filtering methods based on both static and optimal adaptive designs, and subspace methods based on the principles of subspace orthogonality and shift-invariance.
303 kr
Skickas inom 10-15 vardagar
Adaptive filters with a large number of coefficients are usually involved in both network and acoustic echo cancellation. Consequently, it is important to improve the convergence rate and tracking of the conventional algorithms used for these applications. This can be achieved by exploiting the sparseness character of the echo paths. Identification of sparse impulse responses was addressed mainly in the last decade with the development of the so-called ``proportionate''-type algorithms. The goal of this book is to present the most important sparse adaptive filters developed for echo cancellation. Besides a comprehensive review of the basic proportionate-type algorithms, we also present some of the latest developments in the field and propose some new solutions for further performance improvement, e.g., variable step-size versions and novel proportionate-type affine projection algorithms. An experimental study is also provided in order to compare many sparse adaptive filters in different echo cancellation scenarios. Table of Contents: Introduction / Sparseness Measures / Performance Measures / Wiener and Basic Adaptive Filters / Basic Proportionate-Type NLMS Adaptive Filters / The Exponentiated Gradient Algorithms / The Mu-Law PNLMS and Other PNLMS-Type Algorithms / Variable Step-Size PNLMS Algorithms / Proportionate Affine Projection Algorithms / Experimental Study
356 kr
Skickas inom 10-15 vardagar
This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as much as possible while maintaining the speech perception identical to its original form. The optimal filters can be designed either in the time domain or in a transform space. As the title indicates, this book will focus on developing and analyzing optimal filters in the Karhunen-Loève expansion (KLE) domain. We begin by describing the basic problem of speech enhancement and the fundamental principles to solve it in the time domain. We then explain how the problem can be equivalently formulated in the KLE domain. Next, we divide the general problem in the KLE domain into four groups, depending on whether interframe and interband information is accounted for, leading to four linear models for speech enhancement in the KLE domain. For each model, we introduce signal processing measures to quantify the performance of speech enhancement, discuss the formation of different cost functions, and address the optimization of these cost functions for the derivation of different optimal filters. Both theoretical analysis and experiments will be provided to study the performance of these filters and the links between the KLE-domain and time-domain optimal filters will be examined. Table of Contents: Introduction / Problem Formulation / Optimal Filters in the Time Domain / Linear Models for Signal Enhancement in the KLE Domain / Optimal Filters in the KLE Domain with Model 1 / Optimal Filtersin the KLE Domain with Model 2 / Optimal Filters in the KLE Domain with Model 3 / Optimal Filters in the KLE Domain with Model 4 / Experimental Study
356 kr
Skickas inom 10-15 vardagar
Table of Contents: Introduction / Problem Formulation / Performance Measures / Linear and Widely Linear Models / Optimal Filters with Model 1 / Optimal Filters with Model 2 / Optimal Filters with Model 3 / Optimal Filters with Model 4 / Experimental Study
377 kr
Skickas inom 10-15 vardagar
Table of Contents: Introduction / Brief Overview of Speech Recognition / Introduction to Weighted Finite-State Transducers / Speech Recognition by Weighted Finite-State Transducers / Dynamic Decoders with On-the-fly WFST Operations / Summary and Perspective
356 kr
Skickas inom 10-15 vardagar
Table of Contents: Introduction / Literature Review / Estimation of Dynamic Articulatory Parameters / Construction of Articulatory Model Based on MRI Data / Vocal Fold Excitation Models / Experimental Results of Articulatory Synthesis / Conclusion
303 kr
Skickas inom 10-15 vardagar
As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction for speech enhancement comprises a history of more than 30 years of research. In this survey, we wish to demonstrate the significant advances that have been made during the last decade in the field of discrete Fourier transform domain-based single-channel noise reduction for speech enhancement.Furthermore, our goal is to provide a concise description of a state-of-the-art speech enhancement system, and demonstrate the relative importance of the various building blocks of such a system. This allows the non-expert DSP practitioner to judge the relevance of each building block and to implement a close-to-optimal enhancement system for the particular application at hand. Table of Contents: Introduction / Single Channel Speech Enhancement: General Principles / DFT-Based Speech Enhancement Methods: Signal Model and Notation / Speech DFT Estimators / Speech Presence Probability Estimation / Noise PSD Estimation / Speech PSD Estimation / Performance Evaluation Methods / Simulation Experiments with Single-Channel Enhancement Systems / Future Directions
482 kr
Skickas inom 10-15 vardagar
Digital measurement of the analog acoustical parameters of a music performance hall is difficult. The present study describes the exponential sine sweep (ESS) measurement process in the derivation of an acoustical impulse response function (AIRF) of three music performance halls in Canada.
535 kr
Skickas inom 10-15 vardagar
This book provides an analysis of acoustic features of polysemous strings and an implementation of a speech disambiguation program based on the phonetic information. Throughout the book, the term ‘polysemous string’ refers to idioms with plausible literal interpretations, restrictive and non–restrictive relative clauses, and the same expressions used as quotations and appearing in a non–quotational context. The author explains how, typically, context is sufficient to determine the intended meaning. But there is enough evidence in psycholinguistic and phonetic literature to suspect that these superficially identical strings exhibit different acoustic features. In the experiment presented in the book, the participants were asked to read short excerpts containing corresponding elements of polysemous strings placed in the same intonational position. The acoustic analyses of ditropic pairs and subsequent statistical tests revealed that there is almost no difference in the duration, pitch, or intensity in literal and figurative interpretations. However, the analysis of relative clauses and quotations demonstrated that speakers are more likely to use acoustic cues to differentiate between the two possible readings. The book argues that the acoustic analysis of polysemous phrases could be successfully implemented in designing automatic speech recognition systems in order to improve their performance in disambiguating polysemous phrases.Analyzes acoustic features of polysemous strings and an implementation of a speech disambiguation programIncludes evidence that superficially identical strings exhibit different acoustic featuresArgues that acoustic analysis of polysemous phrases can be successfully implemented in automatic speech recognition
535 kr
Skickas inom 10-15 vardagar
This book provides an analysis of acoustic features of polysemous strings and an implementation of a speech disambiguation program based on the phonetic information. Throughout the book, the term ‘polysemous string’ refers to idioms with plausible literal interpretations, restrictive and non–restrictive relative clauses, and the same expressions used as quotations and appearing in a non–quotational context. The author explains how, typically, context is sufficient to determine the intended meaning. But there is enough evidence in psycholinguistic and phonetic literature to suspect that these superficially identical strings exhibit different acoustic features. In the experiment presented in the book, the participants were asked to read short excerpts containing corresponding elements of polysemous strings placed in the same intonational position. The acoustic analyses of ditropic pairs and subsequent statistical tests revealed that there is almost no difference in the duration, pitch, or intensity in literal and figurative interpretations. However, the analysis of relative clauses and quotations demonstrated that speakers are more likely to use acoustic cues to differentiate between the two possible readings. The book argues that the acoustic analysis of polysemous phrases could be successfully implemented in designing automatic speech recognition systems in order to improve their performance in disambiguating polysemous phrases.Analyzes acoustic features of polysemous strings and an implementation of a speech disambiguation programIncludes evidence that superficially identical strings exhibit different acoustic featuresArgues that acoustic analysis of polysemous phrases can be successfully implemented in automatic speech recognition