메뉴 건너뛰기




Volumn 45, Issue 3, 2008, Pages 255-276

Lexicon randomization for near-duplicate detection with I-Match

Author keywords

Information retrieval efficiency; Spam detection

Indexed keywords

DECISION SUPPORT SYSTEMS; INFORMATION MANAGEMENT; LEARNING SYSTEMS; SEARCH ENGINES;

EID: 49149127990     PISSN: 09208542     EISSN: 15730484     Source Type: Journal    
DOI: 10.1007/s11227-007-0171-z     Document Type: Article
Times cited : (15)

References (38)
  • 2
    • 0042766369 scopus 로고    scopus 로고
    • Engineering a multi-purpose test collection for web retrieval experiments
    • Bailey P, Craswell N, Hawking D (2003) Engineering a multi-purpose test collection for web retrieval experiments. Inf Process Manag 39:853-871
    • (2003) Inf Process Manag , vol.39 , pp. 853-871
    • Bailey, P.1    Craswell, N.2    Hawking, D.3
  • 4
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Breiman L (1996) Bagging predictors. Mach Lear 24:123-140
    • (1996) Mach Lear , vol.24 , pp. 123-140
    • Breiman, L.1
  • 8
    • 49149102595 scopus 로고    scopus 로고
    • The smart/empire tipster IR system
    • Morgan Kaufmann
    • Buckley C, Cardie C, Mardisa S, Mitra M, Pierce D, Wagsta K, Walz J (2000) The smart/empire tipster IR system. In: TIPSTER phase III proceedings. Morgan Kaufmann
    • (2000) TIPSTER Phase III Proceedings
  • 9
    • 0013206133 scopus 로고    scopus 로고
    • Collection statistics for fast duplicate document detection
    • 2
    • Chowdhury A, Frieder O, Grossman DA, McCabe MC (2002) Collection statistics for fast duplicate document detection. ACM Trans Inf Syst 20(2):171-191
    • (2002) ACM Trans Inf Syst , vol.20 , pp. 171-191
    • Chowdhury, A.1    Frieder, O.2    Grossman, D.A.3
  • 10
    • 12244271239 scopus 로고    scopus 로고
    • Online duplicate document detection: Signature reliability in a dynamic retrieval environment
    • Conrad J, Guo X, Schriber C (2003) Online duplicate document detection: signature reliability in a dynamic retrieval environment. In: CIKM, pp 443-452
    • (2003) CIKM , pp. 443-452
    • Conrad, J.1    Guo, X.2    Schriber, C.3
  • 12
    • 0032594950 scopus 로고    scopus 로고
    • Support vector machines for spam categorization
    • 5
    • Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neur Netw 10(5):1048-1054
    • (1999) IEEE Trans Neur Netw , vol.10 , pp. 1048-1054
    • Drucker, H.1    Wu, D.2    Vapnik, V.3
  • 13
    • 12244261390 scopus 로고    scopus 로고
    • "in vivo" spam filtering: A challenge problem for data mining
    • 2
    • Fawcett T (2003) "In vivo" spam filtering: a challenge problem for data mining. KDD Explor 5(2):203-231
    • (2003) KDD Explor , vol.5 , pp. 203-231
    • Fawcett, T.1
  • 19
    • 0003258556 scopus 로고    scopus 로고
    • Overview of the TREC-9 web track
    • Hawking D (2000) Overview of the TREC-9 web track. In: TREC-9 NIST
    • (2000) TREC-9 NIST
    • Hawking, D.1
  • 23
    • 12244282235 scopus 로고    scopus 로고
    • Methods for identifying versioned and plagiarised documents
    • Hoad TC, Zobel J (2002) Methods for identifying versioned and plagiarised documents. J Am Soc Inf Sci Technol
    • (2002) J Am Soc Inf Sci Technol
    • Hoad, T.C.1    Zobel, J.2
  • 33
    • 0016572913 scopus 로고
    • A vector-space model for information retrieval
    • Salton G, Yang C, Wong A (1975) A vector-space model for information retrieval. Commun ACM 18
    • (1975) Commun ACM , vol.18
    • Salton, G.1    Yang, C.2    Wong, A.3
  • 34
    • 0013254819 scopus 로고    scopus 로고
    • Technical report TR-1997-5, Department of Computing Science, University of Glasgow
    • Sanderson M (1997) Duplicate detection in the Reuters collection. Technical report TR-1997-5, Department of Computing Science, University of Glasgow
    • (1997) Duplicate Detection in the Reuters Collection
    • Sanderson, M.1
  • 37
    • 0036989581 scopus 로고    scopus 로고
    • Does wt10g look like the web?
    • Soboroff I (2002) Does wt10g look like the web? In: SIGIR 2002, pp 423-424
    • (2002) SIGIR 2002 , pp. 423-424
    • Soboroff, I.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.