메뉴 건너뛰기




Volumn 21, Issue 4, 2012, Pages 437-461

Trie-join: A trie-based method for efficient string similarity joins

Author keywords

Data integration and cleaning; Edit distance; String similarity joins; Subtrie pruning; Trie index

Indexed keywords

DATA INTEGRATION; EDIT DISTANCE; STRING SIMILARITY; SUBTRIE PRUNING; TRIE INDEX;

EID: 84864286297     PISSN: 10668888     EISSN: 0949877X     Source Type: Journal    
DOI: 10.1007/s00778-011-0252-8     Document Type: Article
Times cited : (60)

References (59)
  • 1
    • 84864280321 scopus 로고    scopus 로고
    • http://secondstring. sourceforge. net/.
  • 2
    • 84864278686 scopus 로고    scopus 로고
    • http://www. dcs. shef. ac. uk/~sam/simmetrics. html.
  • 3
    • 77950901996 scopus 로고    scopus 로고
    • Scalable ad-hoc entity extraction from text collections
    • Agrawal S., Chakrabarti K., Chaudhuri S., Ganti V.: Scalable ad-hoc entity extraction from text collections. PVLDB 1(1), 945-957 (2008).
    • (2008) PVLDB , vol.1 , Issue.1 , pp. 945-957
    • Agrawal, S.1    Chakrabarti, K.2    Chaudhuri, S.3    Ganti, V.4
  • 4
    • 52649137537 scopus 로고    scopus 로고
    • Transformation-based framework for record matching
    • Arasu, A., Chaudhuri, S., Kaushik, R.: Transformation-based framework for record matching. In: ICDE, pp. 40-49 (2008).
    • (2008) ICDE , pp. 40-49
    • Arasu, A.1    Chaudhuri, S.2    Kaushik, R.3
  • 5
    • 85104914015 scopus 로고    scopus 로고
    • Efficient exact set-similarity joins
    • Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918-929 (2006).
    • (2006) VLDB , pp. 918-929
    • Arasu, A.1    Ganti, V.2    Kaushik, R.3
  • 7
    • 35348849154 scopus 로고    scopus 로고
    • Scaling up all pairs similarity search
    • Bayardo, R. J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW, pp. 131-140 (2007).
    • (2007) WWW , pp. 131-140
    • Bayardo, R.J.1    Ma, Y.2    Srikant, R.3
  • 8
  • 9
    • 72949105984 scopus 로고    scopus 로고
    • Fast error-tolerant search on very large texts
    • Celikik, M., Bast, H.: Fast error-tolerant search on very large texts. In: SAC, pp. 1724-1731 (2009).
    • (2009) SAC , pp. 1724-1731
    • Celikik, M.1    Bast, H.2
  • 10
  • 11
    • 1142279457 scopus 로고    scopus 로고
    • Robust and efficient fuzzy match for online data cleaning
    • Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD Conference, pp. 313-324 (2003).
    • (2003) SIGMOD Conference , pp. 313-324
    • Chaudhuri, S.1    Ganjam, K.2    Ganti, V.3    Motwani, R.4
  • 12
    • 84859202692 scopus 로고    scopus 로고
    • Data debugger: An operator-centric approach for data quality solutions
    • Chaudhuri S., Ganti V., Kaushik R.: Data debugger: An operator-centric approach for data quality solutions. IEEE Data Eng. Bull. 29(2), 60-66 (2006).
    • (2006) IEEE Data Eng. Bull. , vol.29 , Issue.2 , pp. 60-66
    • Chaudhuri, S.1    Ganti, V.2    Kaushik, R.3
  • 13
    • 33749597967 scopus 로고    scopus 로고
    • A primitive operator for similarity joins in data cleaning
    • Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, pp. 5-16 (2006).
    • (2006) ICDE , pp. 5-16
    • Chaudhuri, S.1    Ganti, V.2    Kaushik, R.3
  • 14
    • 70849103081 scopus 로고    scopus 로고
    • Extending autocompletion to tolerate errors
    • Chaudhuri, S., Kaushik, R.: Extending autocompletion to tolerate errors. In: SIGMOD Conference, pp. 707-718 (2009).
    • (2009) SIGMOD Conference , pp. 707-718
    • Chaudhuri, S.1    Kaushik, R.2
  • 15
    • 84993661659 scopus 로고    scopus 로고
    • M-tree: An efficient access method for similarity search in metric spaces
    • Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426-435 (1997).
    • (1997) VLDB , pp. 426-435
    • Ciaccia, P.1    Patella, M.2    Zezula, P.3
  • 16
    • 4544388794 scopus 로고    scopus 로고
    • Dictionary matching and indexing with errors and don't cares
    • Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don't cares. In: STOC, pp. 91-100 (2004).
    • (2004) STOC , pp. 91-100
    • Cole, R.1    Gottlieb, L.-A.2    Lewenstein, M.3
  • 17
    • 84945709825 scopus 로고
    • Trie memory
    • Fredkin E.: Trie memory. Commun. ACM 3(9), 490-499 (1960).
    • (1960) Commun. ACM , vol.3 , Issue.9 , pp. 490-499
    • Fredkin, E.1
  • 20
    • 0344496626 scopus 로고    scopus 로고
    • Index-based approximate xml joins
    • Guha, S., Koudas, N., Srivastava, D., Yu, T.: Index-based approximate xml joins. In: ICDE, pp. 708-710 (2003).
    • (2003) ICDE , pp. 708-710
    • Guha, S.1    Koudas, N.2    Srivastava, D.3    Yu, T.4
  • 21
    • 52649145249 scopus 로고    scopus 로고
    • Fast indexes and algorithms for set similarity selection queries
    • Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE, pp. 267-276 (2008).
    • (2008) ICDE , pp. 267-276
    • Hadjieleftheriou, M.1    Chandel, A.2    Koudas, N.3    Srivastava, D.4
  • 22
    • 70849096574 scopus 로고    scopus 로고
    • Incremental maintenance of length normalized indexes for approximate string matching
    • Hadjieleftheriou, M., Koudas, N., Srivastava, D.: Incremental maintenance of length normalized indexes for approximate string matching. In: SIGMOD Conference, pp. 429-440 (2009).
    • (2009) SIGMOD Conference , pp. 429-440
    • Hadjieleftheriou, M.1    Koudas, N.2    Srivastava, D.3
  • 24
    • 70349659026 scopus 로고    scopus 로고
    • Hashed samples: selectivity estimators for set similarity selection queries
    • Hadjieleftheriou M., Yu X., Koudas N., Srivastava D.: Hashed samples: selectivity estimators for set similarity selection queries. PVLDB 1(1), 201-212 (2008).
    • (2008) PVLDB , vol.1 , Issue.1 , pp. 201-212
    • Hadjieleftheriou, M.1    Yu, X.2    Koudas, N.3    Srivastava, D.4
  • 25
    • 0038564328 scopus 로고    scopus 로고
    • Burst tries: a fast, efficient data structure for string keys
    • Heinz S., Zobel J., Williams H. E.: Burst tries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20(2), 192-223 (2002).
    • (2002) ACM Trans. Inf. Syst. , vol.20 , Issue.2 , pp. 192-223
    • Heinz, S.1    Zobel, J.2    Williams, H.E.3
  • 27
    • 77954747849 scopus 로고    scopus 로고
    • Probabilistic string similarity joins
    • Jestes, J., Li, F., Yan, Z., Yi, K.: Probabilistic string similarity joins. In: SIGMOD Conference, pp. 327-338 (2010).
    • (2010) SIGMOD Conference , pp. 327-338
    • Jestes, J.1    Li, F.2    Yan, Z.3    Yi, K.4
  • 28
    • 84865633750 scopus 로고    scopus 로고
    • Efficient interactive fuzzy keyword search
    • Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In WWW, pp. 433-439 (2009).
    • (2009) WWW , pp. 433-439
    • Ji, S.1    Li, G.2    Li, C.3    Feng, J.4
  • 29
    • 84944324113 scopus 로고    scopus 로고
    • Efficient index structures for string databases
    • Kahveci, T., Singh, A. K.: Efficient index structures for string databases. In: VLDB, pp. 351-360 (2001).
    • (2001) VLDB , pp. 351-360
    • Kahveci, T.1    Singh, A.K.2
  • 30
    • 33745621089 scopus 로고    scopus 로고
    • n-Gram/2L: A space and time efficient two-level n-gram inverted index structure
    • Kim, M.-S., Whang, K.-Y., Lee, J.-G., Lee, M.-J. n-Gram/2L: A space and time efficient two-level n-gram inverted index structure. In: VLDB, pp. 325-336 (2005).
    • (2005) VLDB , pp. 325-336
    • Kim, M.-S.1    Whang, K.-Y.2    Lee, J.-G.3    Lee, M.-J.4
  • 32
    • 85011072445 scopus 로고    scopus 로고
    • Extending q-grams to estimate selectivity of string matching with low edit distance
    • Lee, H., Ng, R. T., Shim, K.: Extending q-grams to estimate selectivity of string matching with low edit distance. In: VLDB, pp. 195-206 (2007).
    • (2007) VLDB , pp. 195-206
    • Lee, H.1    Ng, R.T.2    Shim, K.3
  • 33
    • 77957718350 scopus 로고    scopus 로고
    • Power-law based estimation of set similarity join size
    • Lee H., Ng R. T., Shim K.: Power-law based estimation of set similarity join size. PVLDB 2(1), 658-669 (2009).
    • (2009) PVLDB , vol.2 , Issue.1 , pp. 658-669
    • Lee, H.1    Ng, R.T.2    Shim, K.3
  • 34
    • 52649086729 scopus 로고    scopus 로고
    • Efficient merging and filtering algorithms for approximate string searches
    • Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257-266 (2008).
    • (2008) ICDE , pp. 257-266
    • Li, C.1    Lu, J.2    Lu, Y.3
  • 35
    • 85011032600 scopus 로고    scopus 로고
    • VGRAM: Improving performance of approximate queries on string collections using variable-length grams
    • Li, C., Wang, B., Yang, X. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303-314 (2007).
    • (2007) VLDB , pp. 303-314
    • Li, C.1    Wang, B.2    Yang, X.3
  • 36
    • 79959922359 scopus 로고    scopus 로고
    • Faerie: Efficient filtering algorithms for approximate dictionary-based entity extraction
    • Li, G., Deng, D., Feng, J. Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction. In: SIGMOD Conference, pp. 529-540 (2011).
    • (2011) SIGMOD Conference , pp. 529-540
    • Li, G.1    Deng, D.2    Feng, J.3
  • 37
    • 79960467518 scopus 로고    scopus 로고
    • Efficient fuzzy full-text type-ahead search
    • Li G., Ji S., Li C., Feng J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617-640 (2011).
    • (2011) VLDB J. , vol.20 , Issue.4 , pp. 617-640
    • Li, G.1    Ji, S.2    Li, C.3    Feng, J.4
  • 38
    • 84859260100 scopus 로고    scopus 로고
    • Set similarity join on probabilistic data
    • Lian X., Chen L.: Set similarity join on probabilistic data. PVLDB 3(1), 650-659 (2010).
    • (2010) PVLDB , vol.3 , Issue.1 , pp. 650-659
    • Lian, X.1    Chen, L.2
  • 39
    • 74549168398 scopus 로고    scopus 로고
    • Efficient algorithms for approximate member extraction using signature-based inverted lists
    • Lu, J., Han, J., Meng, X.: Efficient algorithms for approximate member extraction using signature-based inverted lists. In: CIKM, pp. 315-324 (2009).
    • (2009) CIKM , pp. 315-324
    • Lu, J.1    Han, J.2    Meng, X.3
  • 40
    • 38149018071 scopus 로고
    • Patricia: practical algorithm to retrieve information coded in alphanumeric
    • Morrison D. R.: Patricia: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 514-534 (1968).
    • (1968) J. ACM , vol.15 , pp. 514-534
    • Morrison, D.R.1
  • 41
    • 0345566149 scopus 로고    scopus 로고
    • A guided tour to approximate string matching
    • Navarro G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31-88 (2001).
    • (2001) ACM Comput. Surv. , vol.33 , Issue.1 , pp. 31-88
    • Navarro, G.1
  • 43
    • 84976659272 scopus 로고
    • Computer programs for detecting and correcting spelling errors
    • Peterson J. L.: Computer programs for detecting and correcting spelling errors. Commun. ACM 23(12), 676-687 (1980).
    • (1980) Commun. ACM , vol.23 , Issue.12 , pp. 676-687
    • Peterson, J.L.1
  • 44
    • 84864279560 scopus 로고
    • Available at
    • Russell, R. C.: Available at http://patft. uspto. gov/netacgi/nph-Parser?patentnumber=1261167 (1918).
    • (1918)
    • Russell, R.C.1
  • 45
    • 0344065611 scopus 로고    scopus 로고
    • Distance based indexing for string proximity search
    • Sahinalp, S. C., Tasan, M., Macker, J., Özsoyoglu, Z. M.: Distance based indexing for string proximity search. In: ICDE, pp. 125-136 (2003).
    • (2003) ICDE , pp. 125-136
    • Sahinalp, S.C.1    Tasan, M.2    Macker, J.3    Özsoyoglu, Z.M.4
  • 46
    • 0017930815 scopus 로고
    • Dynamic programming algorithm optimization for spoken word recognition
    • Sakoe H., Chiba S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust Speech Signal Process 26, 43-49 (1978).
    • (1978) IEEE Trans. Acoust Speech Signal Process , vol.26 , pp. 43-49
    • Sakoe, H.1    Chiba, S.2
  • 48
    • 3142777876 scopus 로고    scopus 로고
    • Efficient set joins on similarity predicates
    • Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD Conference, pp. 743-754 (2004).
    • (2004) SIGMOD Conference , pp. 743-754
    • Sarawagi, S.1    Kirpal, A.2
  • 49
    • 84893371228 scopus 로고    scopus 로고
    • Fast string correction with levenshtein automata
    • Schulz K. U., Mihov S.: Fast string correction with levenshtein automata. Intl J Doc Anal Recognit 5(1), 67-85 (2002).
    • (2002) Intl J Doc Anal Recognit , vol.5 , Issue.1 , pp. 67-85
    • Schulz, K.U.1    Mihov, S.2
  • 50
    • 0005079936 scopus 로고
    • Use of tree structures for processing files
    • Sussenguth E. H.: Use of tree structures for processing files. Commun. ACM 6, 272-279 (1963).
    • (1963) Commun. ACM , vol.6 , pp. 272-279
    • Sussenguth, E.H.1
  • 51
    • 77954744650 scopus 로고    scopus 로고
    • Efficient parallel set-similarity joins using mapreduce
    • Vernica, R., Carey, M. J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD Conference, pp. 495-506 (2010).
    • (2010) SIGMOD Conference , pp. 495-506
    • Vernica, R.1    Carey, M.J.2    Li, C.3
  • 52
    • 79957822983 scopus 로고    scopus 로고
    • Trie-join: Efficient trie-based string similarity joins with edit-distance constraints
    • Wang J., Li G., Feng J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219-1230 (2010).
    • (2010) PVLDB , vol.3 , Issue.1 , pp. 1219-1230
    • Wang, J.1    Li, G.2    Feng, J.3
  • 53
    • 79957824788 scopus 로고    scopus 로고
    • Fast-join: An efficient method for fuzzy token matching based string similarity join
    • Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE pp. 458-469 (2011).
    • (2011) ICDE , pp. 458-469
    • Wang, J.1    Li, G.2    Feng, J.3
  • 54
    • 84863541462 scopus 로고    scopus 로고
    • Entity matching: how similar is similar
    • Wang J., Li G., Yu J. X., Feng J.: Entity matching: how similar is similar. PVLDB 4(10), 622-633 (2011).
    • (2011) PVLDB , vol.4 , Issue.10 , pp. 622-633
    • Wang, J.1    Li, G.2    Yu, J.X.3    Feng, J.4
  • 55
    • 70849115286 scopus 로고    scopus 로고
    • Efficient approximate entity extraction with edit distance constraints
    • Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit distance constraints. In: SIGMOD Conference, pp. 759-770 (2009).
    • (2009) SIGMOD Conference , pp. 759-770
    • Wang, W.1    Xiao, C.2    Lin, X.3    Zhang, C.4
  • 56
    • 70849105253 scopus 로고    scopus 로고
    • Ed-join: an efficient algorithm for similarity joins with edit distance constraints
    • Xiao C., Wang W., Lin X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB 1(1), 933-944 (2008).
    • (2008) PVLDB , vol.1 , Issue.1 , pp. 933-944
    • Xiao, C.1    Wang, W.2    Lin, X.3
  • 57
    • 67649653766 scopus 로고    scopus 로고
    • Top-k set similarity joins
    • Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: ICDE, pp. 916-927 (2009).
    • (2009) ICDE , pp. 916-927
    • Xiao, C.1    Wang, W.2    Lin, X.3    Shang, H.4
  • 58
    • 57349141410 scopus 로고    scopus 로고
    • Efficient similarity joins for near duplicate detection
    • Xiao, C., Wang, W., Lin, X., Yu, J. X.: Efficient similarity joins for near duplicate detection. In: WWW, pp. 131-140 (2008).
    • (2008) WWW , pp. 131-140
    • Xiao, C.1    Wang, W.2    Lin, X.3    Yu, J.X.4
  • 59
    • 57149130672 scopus 로고    scopus 로고
    • Cost-based variable-length-gram selection for string collections to support approximate queries efficiently
    • Yang, X., Wang, B., Li, C.: Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: SIGMOD Conference, pp. 353-364 (2008).
    • (2008) SIGMOD Conference , pp. 353-364
    • Yang, X.1    Wang, B.2    Li, C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.