JAIST Repository >
School of Knowledge Science >
JAIST Research Reports >
Research Report - School of Knowledge Science: ISSN 1347-1570 >
KS-RR-2009 >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/8449
|
Title: | Generalized kernel canonical correlation analysis : criteria and low rank kernel learning |
Authors: | Nguyen, Canh Hao Ho, Tu Bao Renders, Jean-Michel Cancedda, Nicola |
Issue Date: | 2009-02-20 |
Publisher: | 北陸先端科学技術大学院大学知識科学研究科 |
Magazine name: | Research report (School of Knowledge Science, Japan Advanced Institute of Science and Technology) |
Volume: | KS-RR-2009-002 |
Start page: | 1 |
End page: | 21 |
Abstract: | Canonical Correlation Analysis is a classical data analysis technique for computing common correlated subspaces for two datasets. Recent advances in machine learning enable the technique to operate solely on kernel matrices, making it a kernel method with the advantages of modularity, efficiency and nonlinearity. Its performance is also improved with appropriate regularization and low-rank approximation methods, making it applicable to many practical applications. However, the classical technique is applicable to find correlation of only two datasets. It is a practical problem that we wish to consider correlation of more than two datasets at the same time. Such problems occurs in many domains such as multilingual text processing, where we wish to find a common representation of parallel document corpora from more than two languages altogether (we call this situation multiple view or multiview for short). Generalizing CCA to more than two views face some problems, namely: finding criteria for multiview CCA and available computational solutions for these criteria. In this report, we analyze the criteria that have been proposed to be objective functions for multi-view CCA. We obtain that only some of them are suitable for our purpose. In these criteria, only one of them, namely MAXVAR, has an efficient solution. We describe our algorithm for this criterion. We conduct experiments on a multi-lingual corpora. Experiment results show that multi-view CCA brings an advantage over two view CCA when there are not too many training data are available. We then show that low rank approximation of kernels are done independently from views. This could be a disadvantage as different views may be projected onto subspaces that may not result in correlation. We then propose a new incomplete Cholesky decomposition procedure that simultaneously decomposes all views. Experiment results show that the new ICD, by making sure the alignment of subspaces from different views, give a higher performance for multiview CCA when there are many views and a few dimensions for approximation. |
URI: | http://hdl.handle.net/10119/8449 |
Material Type: | publisher |
Appears in Collections: | KS-RR-2009
|
Files in This Item:
File |
Description |
Size | Format |
KS-RR-2009-002.pdf | | 30205Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|