タイトル: A High Precision Algorithm for Automatic Extraction of High-frequency Words Based on Statistics
著者: XUAN, Zhaoguo
DANG, Yanzhong
JIANG, Shaohua
ZHAO, Mingwei
キーワード: Chinese-word segmentation
statistics algorithm
high-frequency words
Chinese information processing
発行日: Nov-2005
出版者: JAIST Press
抄録: Automatic Chinese Word Segmentation is one of the basic research issues on text categorization, automatic summarization and information retrieval as well as other Chinese Information Processing tasks. In this paper we put forward a high precision algorithm for extracting high-frequency words without thesaurus. It firstly counts the frequencies of co-occurrence patterns of Chinese characters from documents, then eliminates the “bridge-connection” frequencies and therefore obtains the support frequencies of patterns. Afterwards, the words are identified and acquired according to the support frequencies instead of the primary appearing frequencies. The proposed algorithm is tested in the task of extracting words from several sets of scientific document abstracts, and the results show that this algorithm can improve both precision and recall of extracted lexical set to some extent. This algorithm can either be applied to text categorization and automatic summarization.
記述: The original publication is available at JAIST Press http://www.jaist.ac.jp/library/jaist-press/index.html
IFSR 2005 : Proceedings of the First World Congress of the International Federation for Systems Research : The New Roles of Systems Sciences For a Knowledge-based Society : Nov. 14-17, 2133, Kobe, Japan
Symposium 6, Session 4 : Vision of Knowledge Civilization Future Technology
言語: ENG
URI: http://hdl.handle.net/10119/3923
ISBN: 4-903092-02-X
