JAIST Repository >
科学技術開発戦略センター 2003~2008 >
z2-70. JAIST PRESS 発行誌等 >
IFSR 2005 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/3923
|
タイトル: | A High Precision Algorithm for Automatic Extraction of High-frequency Words Based on Statistics |
著者: | XUAN, Zhaoguo DANG, Yanzhong JIANG, Shaohua ZHAO, Mingwei |
キーワード: | Chinese-word segmentation statistics algorithm high-frequency words Chinese information processing |
発行日: | Nov-2005 |
出版者: | JAIST Press |
抄録: | Automatic Chinese Word Segmentation is one of the basic research issues on text categorization, automatic summarization and information retrieval as well as other Chinese Information Processing tasks. In this paper we put forward a high precision algorithm for extracting high-frequency words without thesaurus. It firstly counts the frequencies of co-occurrence patterns of Chinese characters from documents, then eliminates the “bridge-connection” frequencies and therefore obtains the support frequencies of patterns. Afterwards, the words are identified and acquired according to the support frequencies instead of the primary appearing frequencies. The proposed algorithm is tested in the task of extracting words from several sets of scientific document abstracts, and the results show that this algorithm can improve both precision and recall of extracted lexical set to some extent. This algorithm can either be applied to text categorization and automatic summarization. |
記述: | The original publication is available at JAIST Press http://www.jaist.ac.jp/library/jaist-press/index.html IFSR 2005 : Proceedings of the First World Congress of the International Federation for Systems Research : The New Roles of Systems Sciences For a Knowledge-based Society : Nov. 14-17, 2133, Kobe, Japan Symposium 6, Session 4 : Vision of Knowledge Civilization Future Technology |
言語: | ENG |
URI: | http://hdl.handle.net/10119/3923 |
ISBN: | 4-903092-02-X |
出現コレクション: | IFSR 2005
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
20033.pdf | | 71Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|