タイトル: Comparison of Phrase Indexing for Biomedical and Newswire Documents
著者: Jose, C. Clemente
Torisawa, Kentaro
Satou, Kenji
キーワード: text mining
information retrieval
phrase indexing
発行日: Nov-2005
出版者: JAIST Press
抄録: In this paper we compare a simple but widely used approach for multi-word indexing in two large collections of documents belonging to two different genres: newswire articles and biomedical abstracts. While in the first collection indexing results are reasonably accurate, in the second one performance drops noticeably. The special characteristics of the second corpus can explain the difference in results, and questions the validity of a naive approach to the problem of multi-word indexing, opening an interesting line of research. By comparing the characteristics of both document sets we can bring some light into what aspects would be relevant to develop a domain independent multi-word indexing strategy.
記述: The original publication is available at JAIST Press http://www.jaist.ac.jp/library/jaist-press/index.html
IFSR 2005 : Proceedings of the First World Congress of the International Federation for Systems Research : The New Roles of Systems Sciences For a Knowledge-based Society : Nov. 14-17, 2125, Kobe, Japan
Symposium 5, Session 4 : Data/Text Mining from Large Databases Text Mining
言語: ENG
URI: http://hdl.handle.net/10119/3915
ISBN: 4-903092-02-X
出現コレクション:IFSR 2005


