Gregory Grefenstette - Böcker
Visar alla böcker från författaren Gregory Grefenstette. Handla med fri frakt och snabb leverans.
6 produkter
6 produkter
2 486 kr
Skickas inom 10-15 vardagar
The universal adoption of the Internet and the WWW have created an enormous, multilingual virtual textual database. Rather than looking upon foreign language documents as distracting noise, one can consider these documents as untapped sources of information. This book addresses the problem of accessing multilingual information through a single-language query, a research problem which is receiving growing attention by US and foreign governments. It describes the problem, highlighting the differences between the field and the related areas of machine translation and information retrieval. Researchers from Europe, Japan and America present a wide variety of techniques and experimental results. The life-size experiments are run on modern large-scale retrieval testbeds, running up to hundreds of megabytes of texts. The techniques involve using bilingual dictionaries, machine translation systems, parallel text corpora, comparable but non-parallel text corpora, latent semantic indexing, and weighted Boolean interrogation.This text should be suitable as a secondary text for a graduate level course on cross-language information retrieval, and as a reference for researchers and practitioners in industry.
1 624 kr
Skickas inom 10-15 vardagar
This text presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb-noun pairings, common expressions, and word family members. The techniques are tested on 20 different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in appendices. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested.Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. This text includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment and semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.
Del 278 - Springer International Series in Engineering and Computer Science
Explorations in Automatic Thesaurus Discovery
Häftad, Engelska, 2012
1 624 kr
Skickas inom 10-15 vardagar
Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.
2 486 kr
Skickas inom 10-15 vardagar
Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.
Search-Based Applications
At the Confluence of Search and Database Technologies
Häftad, Engelska, 2010
312 kr
Skickas inom 10-15 vardagar
We are poised at a major turning point in the history of information management via computers. Recent evolutions in computing, communications, and commerce are fundamentally reshaping the ways in which we humans interact with information, and generating enormous volumes of electronic data along the way. As a result of these forces, what will data management technologies, and their supporting software and system architectures, look like in ten years? It is difficult to say, but we can see the future taking shape now in a new generation of information access platforms that combine strategies and structures of two familiar -- and previously quite distinct -- technologies, search engines and databases, and in a new model for software applications, the Search-Based Application (SBA), which offers a pragmatic way to solve both well-known and emerging information management challenges as of now. Search engines are the world's most familiar and widely deployed information access tool, used byhundreds of millions of people every day to locate information on the Web, but few are aware they can now also be used to provide precise, multidimensional information access and analysis that is hard to distinguish from current database applications, yet endowed with the usability and massive scalability of Web search. In this book, we hope to introduce Search Based Applications to a wider audience, using real case studies to show how this flexible technology can be used to intelligently aggregate large volumes of unstructured data (like Web pages) and structured data (like database content), and to make that data available in a highly contextual, quasi real-time manner to a wide base of users for a varied range of purposes. We also hope to shed light on the general convergences underway in search and database disciplines, convergences that make SBAs possible, and which serve as harbingers of information management paradigms and technologies to come. Table of Contents: Search Based Applications / Evolving Business Information Access Needs / Origins and Histories / Data Models and Storage / Data Collection/Population / Data Processing / Data Retrieval / Data Security, Usability, Performance, Cost / Summary Evolutions and Convergences / SBA Platforms / SBA Uses and Preconditions / Anatomy of a Search Based Application / Case Study: GEFCO / Case Study: Urbanizer / Case Study: National Postal Agency / Future Directions
Text- and Speech-Triggered Information Access
8th ELSNET Summer School, Chios Island, Greece, July 15-30, 2000, Revised Lectures
Häftad, Engelska, 2003
520 kr
Skickas inom 10-15 vardagar
This book presents revised versions of the lectures given at the 8th ELSNET European Summer School on Language and Speech Communication held on the Island of Chios, Greece, in summer 2000.Besides an introductory survey, the book presents lectures on data analysis for multimedia libraries, pronunciation modeling for large vocabulary speech recognition, statistical language modeling, very large scale information retrieval, reduction of information variation in text, and a concluding chapter on open questions in research for linguistics in information access.The book gives newcomers to language and speech communication a clear overview of the main technologies and problems in the area. Researchers and professionals active in the area will appreciate the book as a concise review of the technologies used in text- and speech-triggered information access.