메뉴 건너뛰기




Volumn 44, Issue 3, 2008, Pages 1032-1048

Towards a unified approach to document similarity search using manifold-ranking of blocks

Author keywords

Document segmentation; Document similarity search; Manifold ranking; Web page segmentation; Web similarity search

Indexed keywords

DATA PROCESSING; GRAPH THEORY; INFORMATION RETRIEVAL; LEARNING ALGORITHMS; QUERY PROCESSING;

EID: 40649129226     PISSN: 03064573     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.ipm.2007.07.012     Document Type: Article
Times cited : (31)

References (48)
  • 1
    • 40649098073 scopus 로고    scopus 로고
    • Allan, J., Carbonell, J., Doddington, G., Yamron, J. P. & Yang, Y. (1998). Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194-218).
    • Allan, J., Carbonell, J., Doddington, G., Yamron, J. P. & Yang, Y. (1998). Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194-218).
  • 3
    • 40649087454 scopus 로고    scopus 로고
    • Cai, D., Yu, S., Wen, J. -R., & Ma, W. -Y. (2003). VIPS: A vision based page segmentation algorithm. Microsoft technical report, MSR-TR-2003-79.
    • Cai, D., Yu, S., Wen, J. -R., & Ma, W. -Y. (2003). VIPS: A vision based page segmentation algorithm. Microsoft technical report, MSR-TR-2003-79.
  • 4
    • 8644241107 scopus 로고    scopus 로고
    • Cai, D., He, X., Wen, J. -R., & Ma, W. -Y. (2004). Block-level link analysis. In Proceedings of the 27th annual international ACM SIGIR conference (SIGIR'2004).
    • Cai, D., He, X., Wen, J. -R., & Ma, W. -Y. (2004). Block-level link analysis. In Proceedings of the 27th annual international ACM SIGIR conference (SIGIR'2004).
  • 5
    • 85029933081 scopus 로고    scopus 로고
    • Callan, J. (1994). Passage-level evidence in document retrieval, In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 302-310).
    • Callan, J. (1994). Passage-level evidence in document retrieval, In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 302-310).
  • 6
    • 84893227548 scopus 로고    scopus 로고
    • Chen, J., Zhou, B., Shi, J., Zhang, H. -J., & Qiu, F. (2001). Function-based object model towards website adaptation. In Proceedings of the 10th world wide web conference (WWW10).
    • Chen, J., Zhou, B., Shi, J., Zhang, H. -J., & Qiu, F. (2001). Function-based object model towards website adaptation. In Proceedings of the 10th world wide web conference (WWW10).
  • 7
    • 40649095065 scopus 로고    scopus 로고
    • Choi, F. (1999). JTextTile: A free platform independent text segmentation algorithm. http://www.cs.man.ac.uk/~choif.
    • Choi, F. (1999). JTextTile: A free platform independent text segmentation algorithm. http://www.cs.man.ac.uk/~choif.
  • 9
    • 84957632308 scopus 로고    scopus 로고
    • Cruz, I. F., Borisov, S., Marks, M. A., Webb, T. R. (1998). Measuring structural similarity among web documents: Preliminary results. In Proceedings of the 7th international conference on electronic publishing (pp. 513-524).
    • Cruz, I. F., Borisov, S., Marks, M. A., Webb, T. R. (1998). Measuring structural similarity among web documents: Preliminary results. In Proceedings of the 7th international conference on electronic publishing (pp. 513-524).
  • 10
    • 0033293618 scopus 로고    scopus 로고
    • Dean, J., & Henzinger, M. R. (1999). Finding related pages in the World Wide Web. In Proceedings of the eighth international conference on world wide web (pp. 1467-1479).
    • Dean, J., & Henzinger, M. R. (1999). Finding related pages in the World Wide Web. In Proceedings of the eighth international conference on world wide web (pp. 1467-1479).
  • 11
    • 33745774624 scopus 로고    scopus 로고
    • Diaz, F. (2005). Regularizing ad hoc retrieval scores. In Proceedings of the 14th ACM international conference on information and knowledge management (CIKM'2005).
    • Diaz, F. (2005). Regularizing ad hoc retrieval scores. In Proceedings of the 14th ACM international conference on information and knowledge management (CIKM'2005).
  • 12
    • 27344433526 scopus 로고    scopus 로고
    • LexRank: Graph-based lexical centrality as salience in text summarization
    • Erkan G., and Radev D. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR) 22 (2004) 457-479
    • (2004) Journal of Artificial Intelligence Research (JAIR) , vol.22 , pp. 457-479
    • Erkan, G.1    Radev, D.2
  • 13
    • 40649084898 scopus 로고    scopus 로고
    • Fogaras, D., & Rácz, B. (2004). Scaling link-based similarity search. Technical report.
    • Fogaras, D., & Rácz, B. (2004). Scaling link-based similarity search. Technical report.
  • 14
    • 77953112255 scopus 로고    scopus 로고
    • Haveliwala, T.H., Gionis, A., Klein, D., Indyk, P. (2002). Evaluating strategies for similarity search on the Web. In Proceedings of WWW2002 (pp. 432-442).
    • Haveliwala, T.H., Gionis, A., Klein, D., Indyk, P. (2002). Evaluating strategies for similarity search on the Web. In Proceedings of WWW2002 (pp. 432-442).
  • 15
    • 40649125861 scopus 로고    scopus 로고
    • ndP meeting of the association for computational linguistics, Los Cruces, NM.
    • ndP meeting of the association for computational linguistics, Los Cruces, NM.
  • 16
    • 0001819680 scopus 로고    scopus 로고
    • TextTiling: Segmenting text into multi-paragraph subtopic passages
    • Hearst M.A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23 1 (1997) 33-64
    • (1997) Computational Linguistics , vol.23 , Issue.1 , pp. 33-64
    • Hearst, M.A.1
  • 17
    • 0030381274 scopus 로고    scopus 로고
    • Hearst, M. A., & Pedersen, O. (1996). Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR1996 (pp. 76-84).
    • Hearst, M. A., & Pedersen, O. (1996). Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR1996 (pp. 76-84).
  • 18
    • 0027709747 scopus 로고    scopus 로고
    • thP annual international ACM/SIGIR conference, Pittsburgh, PA.
    • thP annual international ACM/SIGIR conference, Pittsburgh, PA.
  • 19
    • 1542377530 scopus 로고    scopus 로고
    • Iwayama, M., Fujii, A., Kando, N., Marukawa, Y. (2003). An empirical study on retrieval models for different document genres: Patents and newspaper articles. In Proceedings of SIGIR2003.
    • Iwayama, M., Fujii, A., Kando, N., Marukawa, Y. (2003). An empirical study on retrieval models for different document genres: Patents and newspaper articles. In Proceedings of SIGIR2003.
  • 20
    • 0242625250 scopus 로고    scopus 로고
    • Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining.
    • Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining.
  • 21
    • 2442561063 scopus 로고    scopus 로고
    • Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S. (2003). A bag of paths model for measuring structural similarity in web documents. In Proceedings of the 9th ACM SIGKDD conference (pp. 577-582).
    • Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S. (2003). A bag of paths model for measuring structural similarity in web documents. In Proceedings of the 9th ACM SIGKDD conference (pp. 577-582).
  • 22
    • 0030649812 scopus 로고    scopus 로고
    • Passage retrieval revisited
    • Kaszkiel M., and Zobel J. Passage retrieval revisited. ACM SIGIR Forum 31 SI (1997) 178-185
    • (1997) ACM SIGIR Forum , vol.31 , Issue.SI , pp. 178-185
    • Kaszkiel, M.1    Zobel, J.2
  • 23
    • 40649087910 scopus 로고    scopus 로고
    • Kaufmann, S. (1999). Cohesion and collocation: Using context vectors in text segmentation, In Proceedings of the 37th conference on association for computational linguistics (pp. 591-595).
    • Kaufmann, S. (1999). Cohesion and collocation: Using context vectors in text segmentation, In Proceedings of the 37th conference on association for computational linguistics (pp. 591-595).
  • 24
    • 4243148480 scopus 로고    scopus 로고
    • Authoritative sources in a hyperlinked environment
    • Kleinberg J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 5 (1999) 604-632
    • (1999) Journal of the ACM , vol.46 , Issue.5 , pp. 604-632
    • Kleinberg, J.M.1
  • 25
    • 78149348400 scopus 로고    scopus 로고
    • Kovacevic, M., Diligenti, M., Gori, M., & Milutinovic, V. (2002). Recognition of common areas in a web page using visual information: A possible application in a page classification. In Proceedings of 2002 IEEE international conference on data mining (ICDM'02), Maebashi City, Japan.
    • Kovacevic, M., Diligenti, M., Gori, M., & Milutinovic, V. (2002). Recognition of common areas in a web page using visual information: A possible application in a page classification. In Proceedings of 2002 IEEE international conference on data mining (ICDM'02), Maebashi City, Japan.
  • 26
    • 84885651292 scopus 로고    scopus 로고
    • Kurland, O., Lee, L. (2005). PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR2005 (pp. 306-313).
    • Kurland, O., Lee, L. (2005). PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR2005 (pp. 306-313).
  • 27
    • 33750301114 scopus 로고    scopus 로고
    • Kurland, O., Lee, L. (2006). Respect my authority! HITS without hyperlinks: utilizing cluster-based language models. In Proceedings of SIGIR2006.
    • Kurland, O., Lee, L. (2006). Respect my authority! HITS without hyperlinks: utilizing cluster-based language models. In Proceedings of SIGIR2006.
  • 28
    • 84885608860 scopus 로고    scopus 로고
    • Kurland, O., Lee, L., Domshlak, C. (2005). Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In Proceedings of SIGIR2005.
    • Kurland, O., Lee, L., Domshlak, C. (2005). Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In Proceedings of SIGIR2005.
  • 29
    • 34250656180 scopus 로고    scopus 로고
    • Lin, Z., Lyu, M.R., King, I. (2006). PageSim: A novel link-based measure of web page similarity. In Proceeding of the 15th international world wide web conference.
    • Lin, Z., Lyu, M.R., King, I. (2006). PageSim: A novel link-based measure of web page similarity. In Proceeding of the 15th international world wide web conference.
  • 30
    • 40649112941 scopus 로고    scopus 로고
    • Mihalcea, R., Tarau, P. (2004). Textrank: Bringing order into texts. In Proceedings of EMNLP 2004 (pp. 404-411).
    • Mihalcea, R., Tarau, P. (2004). Textrank: Bringing order into texts. In Proceedings of EMNLP 2004 (pp. 404-411).
  • 31
    • 80053239269 scopus 로고    scopus 로고
    • Otterbacher, J., Erkan, G., Radev, D. (2005). Using random walks for question-focused sentence retrieval. In Proc. HLT/EMNLP 2005.
    • Otterbacher, J., Erkan, G., Radev, D. (2005). Using random walks for question-focused sentence retrieval. In Proc. HLT/EMNLP 2005.
  • 32
    • 40649089144 scopus 로고    scopus 로고
    • Page, L., Brin, S., Motwani, R., Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web, Technical report. Stanford, CA: Stanford University.
    • Page, L., Brin, S., Motwani, R., Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web, Technical report. Stanford, CA: Stanford University.
  • 33
    • 84948481845 scopus 로고
    • An algorithm for suffix stripping
    • Porter M.F. An algorithm for suffix stripping. Program 14 3 (1980) 130-137
    • (1980) Program , vol.14 , Issue.3 , pp. 130-137
    • Porter, M.F.1
  • 34
    • 84885575844 scopus 로고    scopus 로고
    • Qin, T., Liu, T. -Y., Zhang, X. -D., Chen, Z., Ma, W. -Y. (2005). A study of relevance propagation for web search. In Proceedings of SIGIR2005.
    • Qin, T., Liu, T. -Y., Zhang, X. -D., Chen, Z., Ma, W. -Y. (2005). A study of relevance propagation for web search. In Proceedings of SIGIR2005.
  • 35
    • 84966534942 scopus 로고    scopus 로고
    • Robertson, S., Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. of the 17th international ACM/SIGIR conference on research and development in information retrieval (pp. 232-241).
    • Robertson, S., Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. of the 17th international ACM/SIGIR conference on research and development in information retrieval (pp. 232-241).
  • 36
    • 40649091449 scopus 로고    scopus 로고
    • Robertson, S., Walker, S., Beaulieu, M. (1999) Okapi at TREC-7: Automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of TREC'99.
    • Robertson, S., Walker, S., Beaulieu, M. (1999) Okapi at TREC-7: Automatic ad hoc, filtering, VLC and filtering tracks. In Proceedings of TREC'99.
  • 38
    • 0016572913 scopus 로고
    • A vector space model for automatic indexing
    • Salton G., Wong A., and Yang C.S. A vector space model for automatic indexing. Communications of the ACM 18 11 (1975) 613-620
    • (1975) Communications of the ACM , vol.18 , Issue.11 , pp. 613-620
    • Salton, G.1    Wong, A.2    Yang, C.S.3
  • 39
    • 0030402534 scopus 로고    scopus 로고
    • Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of SIGIR'96.
    • Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of SIGIR'96.
  • 40
    • 33750343579 scopus 로고    scopus 로고
    • Smucker, M.D., & Allan, J. (2006). Find-similar: Similarity browsing as a search tool. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR'2006) (pp. 461-468).
    • Smucker, M.D., & Allan, J. (2006). Find-similar: Similarity browsing as a search tool. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR'2006) (pp. 461-468).
  • 41
    • 18744381159 scopus 로고    scopus 로고
    • Song, R., Liu, H., Wen, J. -R., & Ma, W. -Y. (2004). Learning block importance models for web pages. In Proceeding of the thirteenth world wide web conference (WWW 2004) (pp. 203-211).
    • Song, R., Liu, H., Wen, J. -R., & Ma, W. -Y. (2004). Learning block importance models for web pages. In Proceeding of the thirteenth world wide web conference (WWW 2004) (pp. 203-211).
  • 42
    • 40649083366 scopus 로고    scopus 로고
    • Tombros, A., & Ali, Z. (2005). Factors affecting web page similarity. In Proceedings of ECIR2005.
    • Tombros, A., & Ali, Z. (2005). Factors affecting web page similarity. In Proceedings of ECIR2005.
  • 44
    • 18744412141 scopus 로고    scopus 로고
    • Xue, G. -R., Zeng, H. -J., Chen, Z., & Yu, Y. (2004). MRSSA: An iterative algorithm for similarity spreading over interrelated objects. In Proceedings of CIKM2004.
    • Xue, G. -R., Zeng, H. -J., Chen, Z., & Yu, Y. (2004). MRSSA: An iterative algorithm for similarity spreading over interrelated objects. In Proceedings of CIKM2004.
  • 45
    • 84880475213 scopus 로고    scopus 로고
    • Yu, S., Cai, D., Wen, J. -R., Ma, W. -Y. (2003). Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the twelfth international world wide web conference (WWW2003).
    • Yu, S., Cai, D., Wen, J. -R., Ma, W. -Y. (2003). Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the twelfth international world wide web conference (WWW2003).
  • 46
    • 84885587114 scopus 로고    scopus 로고
    • Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W. -Y. (2005). Improving web search results using affinity graph. In Proceedings of SIGIR2005.
    • Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W. -Y. (2005). Improving web search results using affinity graph. In Proceedings of SIGIR2005.
  • 47
    • 40649121362 scopus 로고    scopus 로고
    • Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B. (2003). Ranking on data manifolds. In Proceedings of NIPS-2003.
    • Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B. (2003). Ranking on data manifolds. In Proceedings of NIPS-2003.
  • 48
    • 40649114574 scopus 로고    scopus 로고
    • Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Schölkopf, B. (2003). Learning with local and global consistency. In Proceedings of NIPS-2003.
    • Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Schölkopf, B. (2003). Learning with local and global consistency. In Proceedings of NIPS-2003.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.