메뉴 건너뛰기




Volumn , Issue , 2008, Pages 39-46

A comparison of techniques for estimating IDF values to generate lexical signatures for the web

Author keywords

Inverse document frequency; Lexical signature

Indexed keywords

COMPARISON STUDY; DATA SETS; DOCUMENT FREQUENCY; ESTIMATION METHODS; ESTIMATION TECHNIQUES; FUTURE APPLICATIONS; INTERNET ARCHIVE; INVERSE DOCUMENT FREQUENCY; N-GRAMS; RANK CORRELATION; REAL TIME; TERM FREQUENCY; TEXTUAL CONTENT; WEB INTERFACE; WEB PAGE;

EID: 77951106747     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1458502.1458510     Document Type: Conference Paper
Times cited : (6)

References (24)
  • 1
    • 1042273235 scopus 로고    scopus 로고
    • Zipf's Law and the internet
    • L. A. Adamic and B. A. Huberman. Zipf's Law and the Internet. Glottometrics, 3:143-150, 2002.
    • (2002) Glottometrics , vol.3 , pp. 143-150
    • Adamic, L.A.1    Huberman, B.A.2
  • 2
    • 33646341987 scopus 로고    scopus 로고
    • Methods for comparing rankings of search engine results
    • J. Bar-Ilan, M. Mat-Hassan, and M. Levene. Methods for Comparing Rankings of Search Engine Results. Computer Networks, 50(10):1448-1463, 2006.
    • (2006) Computer Networks , vol.50 , Issue.10 , pp. 1448-1463
    • Bar-Ilan, J.1    Mat-Hassan, M.2    Levene, M.3
  • 5
    • 77951111535 scopus 로고    scopus 로고
    • A. Franz and T. Brants. All Our N-Gram are Belong to You. http://googleresearch.blogspot.com/ 2006/08/all-our-n-gram-are-belong-to-you. html.
    • Franz, A.1    Brants, T.2
  • 7
    • 0003700089 scopus 로고    scopus 로고
    • NIST Special Publication 500-249: TREC-9
    • D. Hawking. Overview of the TREC-9 Web Track. In NIST Special Publication 500-249: TREC-9, pages 87-102, 2001.
    • (2001) Overview of the TREC-9 Web Track , pp. 87-102
    • Hawking, D.1
  • 8
    • 2442626107 scopus 로고    scopus 로고
    • Distributed search over the hidden web: Hierarchical database sampling and selection
    • VLDB Endowment
    • P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of VLDB '02, pages 394-405. VLDB Endowment, 2002.
    • (2002) Proceedings of VLDB '02 , pp. 394-405
    • Ipeirotis, P.G.1    Gravano, L.2
  • 9
    • 0344154400 scopus 로고    scopus 로고
    • Using the web to obtain frequencies for unseen bigrams
    • F. Keller and M. Lapata. Using the Web to Obtain Frequencies for Unseen Bigrams. Computational Linguistics, 29(3):459-484, 2003.
    • (2003) Computational Linguistics , vol.29 , Issue.3 , pp. 459-484
    • Keller, F.1    Lapata, M.2
  • 10
    • 77951145238 scopus 로고    scopus 로고
    • Approximating document frequency with term count values
    • Old Dominion University
    • M. Klein and M. L. Nelson. Approximating Document Frequency with Term Count Values. Technical Report arXiv:0807.3755, Old Dominion University, 2008.
    • (2008) Technical Report arXiv:0807.3755
    • Klein, M.1    Nelson, M.L.2
  • 11
    • 70450246189 scopus 로고    scopus 로고
    • Revisiting lexical signatures to (Re-)discover web pages
    • M. Klein and M. L. Nelson. Revisiting Lexical Signatures to (Re-)Discover Web Pages. In Proceedings of ECDL '08, 2008.
    • (2008) Proceedings of ECDL '08
    • Klein, M.1    Nelson, M.L.2
  • 12
    • 12244261882 scopus 로고    scopus 로고
    • Improved robustness of signature-based near-replica detection via lexicon randomization
    • A. Kolcz, A. Chowdhury, and J. Alspector. Improved Robustness of Signature-Based Near-Replica Detection via Lexicon Randomization. In Proceedings of KDD '04, pages 605-610, 2004.
    • (2004) Proceedings of KDD '04 , pp. 605-610
    • Kolcz, A.1    Chowdhury, A.2    Alspector, J.3
  • 14
    • 33746035286 scopus 로고    scopus 로고
    • Automated extraction of hit numbers from search result pages
    • Y. Ling, X. Meng, and W. Meng. Automated extraction of hit numbers from search result pages. In Proceedings of WAIM '06, pages 73-84, 2006.
    • (2006) Proceedings of WAIM '06 , pp. 73-84
    • Ling, Y.1    Meng, X.2    Meng, W.3
  • 15
    • 36349016704 scopus 로고    scopus 로고
    • Agreeing to disagree: Search engines and their public interfaces
    • F. McCown and M. L. Nelson. Agreeing to Disagree: Search Engines and their Public Interfaces. In Proceedings of JCDL '07, pages 309-318, 2007.
    • (2007) Proceedings of JCDL '07 , pp. 309-318
    • McCown, F.1    Nelson, M.L.2
  • 16
    • 34547317670 scopus 로고    scopus 로고
    • Lazy preservation: Reconstructing websites by crawling the crawlers
    • F. McCown, J. A. Smith, and M. L. Nelson. Lazy Preservation: Reconstructing Websites by Crawling the Crawlers. In Proceedings of WIDM '06, pages 67-74, 2006.
    • (2006) Proceedings of WIDM '06 , pp. 67-74
    • McCown, F.1    Smith, J.A.2    Nelson, M.L.3
  • 17
    • 84962711699 scopus 로고    scopus 로고
    • A study of using search engine page hits as a proxy for n-gram frequencies
    • P. Nakov and M. Hearst. A Study of Using Search Engine Page Hits as a Proxy for n-gram Frequencies. In Proceedings of RANLP '05, 2005.
    • (2005) Proceedings of RANLP '05
    • Nakov, P.1    Hearst, M.2
  • 18
    • 9144269133 scopus 로고    scopus 로고
    • Analysis of lexical signatures for improving information persistence on the world wide web
    • S.-T. Park, D. M. Pennock, C. L. Giles, and R. Krovetz. Analysis of Lexical Signatures for Improving Information Persistence on the World Wide Web. ACM Transactions on Information Systems, 22(4):540-572, 2004.
    • (2004) ACM Transactions on Information Systems , vol.22 , Issue.4 , pp. 540-572
    • Park, S.-T.1    Pennock, D.M.2    Giles, C.L.3    Krovetz, R.4
  • 21
    • 1142293071 scopus 로고    scopus 로고
    • Re nement of TF-IDF schemes for web pages using their hyperlinked neighboring pages
    • K. Sugiyama, K. Hatano, M. Yoshikawa, and S. Uemura. Re nement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages. In Proceedings of HYPERTEXT '03, pages 198-207.
    • Proceedings of HYPERTEXT '03 , pp. 198-207
    • Sugiyama, K.1    Hatano, K.2    Yoshikawa, M.3    Uemura, S.4
  • 23
    • 33745644751 scopus 로고    scopus 로고
    • Wordrank-based lexical signatures for finding lost or related web pages
    • X. Wan and J. Yang. Wordrank-based Lexical Signatures for Finding Lost or Related Web Pages. In APWeb, pages 843-849, 2006.
    • (2006) APWeb , pp. 843-849
    • Wan, X.1    Yang, J.2
  • 24
    • 0034852836 scopus 로고    scopus 로고
    • Improving trigram language modeling with the world wide web
    • X. Zhu and R. Rosenfeld. Improving Trigram Language Modeling with the World Wide Web. In Proceedings of ICASSP '01, pages 533-536, 2001.
    • (2001) Proceedings of ICASSP '01 , pp. 53-536
    • Zhu, X.1    Rosenfeld, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.