Beställningsvara. Skickas inom 10-15 vardagar. Fri frakt över 249 kr.
Beskrivning
This book constitutes the refereed proceedings of the 19th National Conference on Man-Machine Speech Communication, NCMMSC 2024, held in Urumqi, China, during August 15–18, 2024. The 33 papers included in these proceedings were carefully reviewed and selected from 205 submissions.
.- The Attention-Based Fusion of Master-Auxiliary Network for Speech Enhancement..- M-CMGAN: Attempting to Use Mamba on Speech Enhancement..- A Backend-friendly On-device Multi-channel Speech Enhancement System with IPD and PHM..- SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments..- ASD-Diff: Unsupervised Anomalous Sound Detection With Masked Diffusion Model..- Emergence of Hemispheric Asymmetries and Predictive Coding in the Neural Mechanism of Speech Perception..- Phoneme Semantic Backdoor Attacks with Multiple Task Learning for Speech Classification Task..- AESR: Speech Recognition With Speech Emotion Recogniting Learning..- A Comparative Analysis of Diphthong Acquisition in Standard Chinese by Learners from ‘the Belt and Road’..- ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram..- Transformer-based Model for Auditory EEG Decoding..- A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions..- Sound Zone Control Based on a Kronecker Second-Order Tensor Decomposition..- MCDubber: Multimodal Context-Aware Expressive Video Dubbing..- TeleSpeechPT: Large-Scale Chinese Multi-Dialect And Multi-Accent Speech Pre-Training..- Investigation into the Impact of Speaker Adversarial Perturbation on Speech Recognition..- Pruning and Quantization Enhanced Densely Connected Neural Network for Efficient Acoustic Echo Cancellation..- Improved DOA Estimation of Sound Source of Small Amplitudes using a Single Acoustic Vector Sensor..- Investigation on Training Strategy for Cross-Modal Large Language Models with Speech and Text..- ExARN: Target Speaker Extraction with Attentive Recurrent Networks..- Tone Perception by Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region..- Study on Prosodic Disambiguation of VP/NP Syntactic Structure by Chinese EFL Learners..- An electroencephalogram-based study of neural responses to imagined speech in Mandarin..- A Speech Corpus of Putonghua-Learning Preschoolers From the Uygur Ethnic Group in South Xinjiang Uygur Autonomous Region of China..- Evaluation of Data Inconsistency for Multi-modal Sentiment Analysis..- LDMME: Latent Diffusion Model for Music Editing..- Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech..- Speech emotion recognition based on multi acoustic feature fusion..- DA-KWFormer: A Domain Adaptation Network with K-Weight Transformer for Speech Emotion Recognition..- An Unsupervised Domain Adaptation Method based on Distribution Alignment for Speaker Verification..- Cross-Model Knowledge Distillation and Metadata Fusion for Respiratory Sound Classification..- Effect of Focus on Vowel Duration and Formant in Cantonese..- A Quantitative Parameter of Pronunciation, TVVF.