International Version
Slutsåld
Dan Jurafsky is an associate professor in the Department of Linguistics, and by courtesy in Department of Computer Science, at Stanford University. Previously, he was on the faculty of the University of Colorado, Boulder, in the Linguistics and Computer Science departments and the Institute of Cognitive Science. He was born in Yonkers, New York, and received a B.A. in Linguistics in 1983 and a Ph.D. in Computer Science in 1992, both from the University of California at Berkeley. He received the National Science Foundation CAREER award in 1998 and the MacArthur Fellowship in 2002. He has published over 90 papers on a wide range of topics in speech and language processing. James H. Martin is a professor in the Department of Computer Science and in the Department of Linguistics, and a fellow in the Institute of Cognitive Science at the University of Colorado at Boulder. He was born in New York City, received a B.S. in Comoputer Science from Columbia University in 1981 and a Ph.D. in Computer Science from the University of California at Berkeley in 1988. He has authored over 70 publications in computer science including the book A Computational Model of Metaphor Interpretation.
Foreword
Preface
About the Authors
1 Introduction
1.1 Knowledge in Speech and Language Processing
1.2 Ambiguity
1.3 Models and Algorithms
1.4 Language, Thought, and Understanding
1.5 The State of the Art
1.6 Some Brief History
1.6.1 Foundational Insights: 1940s and 1950s
1.6.2 The Two Camps: 19571970
1.6.3 Four Paradigms: 19701983
1.6.4 Empiricism and Finite State Models Redux: 19831993
1.6.5 The Field Comes Together: 19941999
1.6.6 The Rise of Machine Learning: 20002008
1.6.7 On Multiple Discoveries
1.6.8 A Final Brief Note on Psychology
1.7 Summary
Bibliographical and Historical Notes
Part I Words
2 Regular Expressions and Automata
2.1 Regular Expressions
2.1.1 Basic Regular Expression Patterns
2.1.2 Disjunction, Grouping, and Precedence
2.1.3 A Simple Example
2.1.4 A More Complex Example
2.1.5 Advanced Operators
2.1.6 Regular Expression Substitution, Memory, and ELIZA
2.2 Finite-State Automata
2.2.1 Using an FSA to Recognize Sheeptalk
2.2.2 Formal Languages
2.2.3 Another Example
2.2.4 Non-Deterministic FSAs
2.2.5 Using an NFSA to Accept Strings
2.2.6 Recognition as Search
2.2.7 Relating Deterministic and Non-Deterministic Automata
2.3 Regular Languages and FSAs
2.4 Summary
Bibliographical and Historical Notes
Exercises
3 Words and Transducers
3.1 Survey of (Mostly) English Morphology
3.1.1 Inflectional Morphology
3.1.2 Derivational Morphology
3.1.3 Cliticization
3.1.4 Non-Concatenative Morphology
3.1.5 Agreement
3.2 Finite-State Morphological Parsing
3.3 Construction of a Finite-State Lexicon
3.4 Finite-State Transducers
3.4.1 Sequential Transducers and Determinism
3.5 FSTs for Morphological Parsing
3.6 Transducers and Orthographic Rules
3.7 The COmbination of an FST Lexicon and Rules
3.8 Lexicon-Free FSTs: The Porter Stemmer
3.9 Word and Sentence Tokenization
3.9.1 Segmentation in Chinese
3.10 Detection and Correction of Spelling Errors
3.11 Minimum Edit Distance
3.12 Human Morphological Processing
3.13 Summary
Bibliographical and Historical Notes
Exercises
4 N-grams
4.1 Word Counting in Corpora
4.2 Simple (Unsmoothed) N-grams
4.3 Training and Test Sets
4.3.1 N-gram Sensitivity to the Training Corpus
4.3.2 Unknown Words: Open Versus Closed Vocabulary Tasks
4.4 Evaluating N-grams: Perplexity
4.5 Smoothing
4.5.1 Laplace Smoothing
4.5.2 Good-Turing Discounting
4.5.3 Som...