메뉴 건너뛰기




Volumn , Issue , 2009, Pages 916-927

Top-k set similarity joins

Author keywords

[No Author keywords available]

Indexed keywords

DATA INTEGRATION; NEAR-DUPLICATES; PRIMITIVE OPERATIONS; REAL DATA SETS; SET SIMILARITY; SIMILARITY JOIN; SIMILARITY THRESHOLD; WEB PAGE;

EID: 67649653766     PISSN: 10844627     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICDE.2009.111     Document Type: Conference Paper
Times cited : (177)

References (36)
  • 1
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • M. R. Henzinger, "Finding near-duplicate web pages: a large-scale evaluation of algorithms," in SIGIR, 2006.
    • (2006) SIGIR
    • Henzinger, M.R.1
  • 2
    • 0032091575 scopus 로고    scopus 로고
    • Integration of heterogeneous databases without common domains using queries based on textual similarity
    • W. W. Cohen, "Integration of heterogeneous databases without common domains using queries based on textual similarity," in SIGMOD Conference, 1998, pp. 201 -212.
    • (1998) SIGMOD Conference , pp. 201-212
    • Cohen, W.W.1
  • 3
    • 67649644273 scopus 로고    scopus 로고
    • W. E. Winkler, The state of record linkage and current research problems, U.S. Bureau of the Census, Tech. Rep., 1999.
    • W. E. Winkler, "The state of record linkage and current research problems," U.S. Bureau of the Census, Tech. Rep., 1999.
  • 4
    • 35348849154 scopus 로고    scopus 로고
    • R. J. Bayardo, Y. Ma, and R. Srikant, Scaling up all pairs similarity search, in WWW, 2007.
    • R. J. Bayardo, Y. Ma, and R. Srikant, "Scaling up all pairs similarity search," in WWW, 2007.
  • 5
    • 33749597967 scopus 로고    scopus 로고
    • A primitive operator for similarity joins in data cleaning
    • S. Chaudhuri, V. Ganti, and R. Kaushik, "A primitive operator for similarity joins in data cleaning," in ICDE, 2006.
    • (2006) ICDE
    • Chaudhuri, S.1    Ganti, V.2    Kaushik, R.3
  • 6
    • 67649653665 scopus 로고    scopus 로고
    • C. Xiao, W. Wang, X. Lin, and J. X. Yu, Efficient similarity joins for near duplicate detection, in WWW, 2008.
    • C. Xiao, W. Wang, X. Lin, and J. X. Yu, "Efficient similarity joins for near duplicate detection," in WWW, 2008.
  • 7
    • 85104914015 scopus 로고    scopus 로고
    • Efficient exact set-similarity joins
    • A. Arasu, V. Ganti, and R. Kaushik, "Efficient exact set-similarity joins," in VLDB, 2006.
    • (2006) VLDB
    • Arasu, A.1    Ganti, V.2    Kaushik, R.3
  • 9
    • 84948989773 scopus 로고    scopus 로고
    • The impact of buffering on closest pairs queries using r-trees
    • A. Corral, M. Vassilakopoulos, and Y. Manolopoulos, "The impact of buffering on closest pairs queries using r-trees," in ADBIS, 2001, pp. 41 -54.
    • (2001) ADBIS , pp. 41-54
    • Corral, A.1    Vassilakopoulos, M.2    Manolopoulos, Y.3
  • 10
    • 1642398164 scopus 로고    scopus 로고
    • Algorithms for processing k-closest-pair queries in spatial databases
    • A. Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos, "Algorithms for processing k-closest-pair queries in spatial databases," Data Knowl. Eng., vol. 49, no. 1, pp. 67 -104, 2004.
    • (2004) Data Knowl. Eng , vol.49 , Issue.1 , pp. 67-104
    • Corral, A.1    Manolopoulos, Y.2    Theodoridis, Y.3    Vassilakopoulos, M.4
  • 12
    • 85011032600 scopus 로고    scopus 로고
    • VGRAM: Improving performance of approximate queries on string collections using variable-length grams
    • C. Li, B. Wang, and X. Yang, "VGRAM: Improving performance of approximate queries on string collections using variable-length grams," in VLDB, 2007.
    • (2007) VLDB
    • Li, C.1    Wang, B.2    Yang, X.3
  • 13
    • 85011072445 scopus 로고    scopus 로고
    • Extending q-grams to estimate selectivity of string matching with low edit distance
    • H. Lee, R. T. Ng, and K. Shim, "Extending q-grams to estimate selectivity of string matching with low edit distance," in VLDB, 2007, pp. 195 -206.
    • (2007) VLDB , pp. 195-206
    • Lee, H.1    Ng, R.T.2    Shim, K.3
  • 14
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: Data cleansing and the merge/purge problem
    • M. A. Hernández and S. J. Stolfo, "Real-world data is dirty: Data cleansing and the merge/purge problem," Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9 -37, 1998.
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernández, M.A.1    Stolfo, S.J.2
  • 15
    • 0242456811 scopus 로고    scopus 로고
    • S. Sarawagi and A. Bhamidipaty, Interactive deduplication using active learning, in KDD, 2002.
    • S. Sarawagi and A. Bhamidipaty, "Interactive deduplication using active learning," in KDD, 2002.
  • 17
    • 3142777876 scopus 로고    scopus 로고
    • Efficient set joins on similarity predicates
    • S. Sarawagi and A. Kirpal, "Efficient set joins on similarity predicates," in SIGMOD, 2004.
    • (2004) SIGMOD
    • Sarawagi, S.1    Kirpal, A.2
  • 18
    • 52649137537 scopus 로고    scopus 로고
    • Transformation-based framework for record matching
    • A. Arasu, S. Chaudhuri, and R. Kaushik, "Transformation-based framework for record matching," in ICDE, 2008, pp. 40 -49.
    • (2008) ICDE , pp. 40-49
    • Arasu, A.1    Chaudhuri, S.2    Kaushik, R.3
  • 19
    • 52649161208 scopus 로고    scopus 로고
    • A fast similarity join algorithm using graphics processing units
    • M. D. Lieberman, J. Sankaranarayanan, and H. Samet, "A fast similarity join algorithm using graphics processing units," in ICDE, 2008, pp. 1111 -1120.
    • (2008) ICDE , pp. 1111-1120
    • Lieberman, M.D.1    Sankaranarayanan, J.2    Samet, H.3
  • 21
    • 0013206133 scopus 로고    scopus 로고
    • Collection statistics for fast duplicate document detection
    • A. Chowdhury, O. Frieder, D. A. Grossman, and M. C. McCabe, "Collection statistics for fast duplicate document detection," ACM Trans. Inf. Syst., vol. 20, no. 2, pp. 171 -191, 2002.
    • (2002) ACM Trans. Inf. Syst , vol.20 , Issue.2 , pp. 171-191
    • Chowdhury, A.1    Frieder, O.2    Grossman, D.A.3    McCabe, M.C.4
  • 22
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • M. Charikar, "Similarity estimation techniques from rounding algorithms," in STOC, 2002.
    • (2002) STOC
    • Charikar, M.1
  • 23
    • 15044355327 scopus 로고    scopus 로고
    • Similarity search in high dimensions via hashing
    • A. Gionis, P. Indyk, and R. Motwani, "Similarity search in high dimensions via hashing," in VLDB, 1999.
    • (1999) VLDB
    • Gionis, A.1    Indyk, P.2    Motwani, R.3
  • 24
    • 1142279457 scopus 로고    scopus 로고
    • Robust and efficient fuzzy match for online data cleaning
    • S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and efficient fuzzy match for online data cleaning," in SIGMOD Conference, 2003, pp. 313 -324.
    • (2003) SIGMOD Conference , pp. 313-324
    • Chaudhuri, S.1    Ganjam, K.2    Ganti, V.3    Motwani, R.4
  • 26
    • 52649086729 scopus 로고    scopus 로고
    • Efficient merging and filtering algorithms for approximate string searches
    • C. Li, J. Lu, and Y. Lu, "Efficient merging and filtering algorithms for approximate string searches," in ICDE, 2008, pp. 257 -266.
    • (2008) ICDE , pp. 257-266
    • Li, C.1    Lu, J.2    Lu, Y.3
  • 27
    • 52649145249 scopus 로고    scopus 로고
    • Fast indexes and algorithms for set similarity selection queries
    • M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava, "Fast indexes and algorithms for set similarity selection queries," in ICDE, 2008, pp. 267 -276.
    • (2008) ICDE , pp. 267-276
    • Hadjieleftheriou, M.1    Chandel, A.2    Koudas, N.3    Srivastava, D.4
  • 28
    • 67649639190 scopus 로고    scopus 로고
    • E. Ukkonen, On approximate string matching, in FCT, 1983.
    • E. Ukkonen, "On approximate string matching," in FCT, 1983.
  • 29
    • 34547618938 scopus 로고    scopus 로고
    • On the resemblance and containment of documents
    • A. Z. Broder, "On the resemblance and containment of documents," in SEQS, 1997.
    • (1997) SEQS
    • Broder, A.Z.1
  • 30
    • 67649666566 scopus 로고    scopus 로고
    • R. C. Russell, Index, U.S. patent 1,261,167, April 1918.
    • R. C. Russell, "Index, U.S. patent 1,261,167," April 1918.
  • 31
    • 0033075316 scopus 로고    scopus 로고
    • Combining fuzzy information from multiple systems
    • R. Fagin, "Combining fuzzy information from multiple systems," J. Comput. Syst. Sci., vol. 58, no. 1, pp. 83 -99, 1999.
    • (1999) J. Comput. Syst. Sci , vol.58 , Issue.1 , pp. 83-99
    • Fagin, R.1
  • 32
    • 0038504811 scopus 로고    scopus 로고
    • Optimal aggregation algorithms for middleware
    • R. Fagin, A. Lotem, and M. Naor, "Optimal aggregation algorithms for middleware," J. Comput. Syst. Sci., vol. 66, no. 4, pp. 614 -656, 2003.
    • (2003) J. Comput. Syst. Sci , vol.66 , Issue.4 , pp. 614-656
    • Fagin, R.1    Lotem, A.2    Naor, M.3
  • 34
    • 0036372482 scopus 로고    scopus 로고
    • Minimal probing: Supporting expensive predicates for top-k queries
    • K. C.-C. Chang and S. won Hwang, "Minimal probing: supporting expensive predicates for top-k queries," in SIGMOD Conference, 2002, pp. 346 -357.
    • (2002) SIGMOD Conference , pp. 346-357
    • Chang, K.C.-C.1    won Hwang, S.2
  • 35
    • 35448984017 scopus 로고    scopus 로고
    • SPARK: Top-k keyword query in relational databases
    • Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: top-k keyword query in relational databases," in SIGMOD Conference, 2007, pp. 115 -126.
    • (2007) SIGMOD Conference , pp. 115-126
    • Luo, Y.1    Lin, X.2    Wang, W.3    Zhou, X.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.