SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Volumn 21, Issue 4, 2012, Pages 437-461

Trie-join: A trie-based method for efficient string similarity joins

(3) Feng, Jianhua a Wang, Jiannan a Li, Guoliang a

a TSINGHUA UNIVERSITY (China)

Author keywords

Data integration and cleaning; Edit distance; String similarity joins; Subtrie pruning; Trie index

Indexed keywords

DATA INTEGRATION; EDIT DISTANCE; STRING SIMILARITY; SUBTRIE PRUNING; TRIE INDEX;

HARDWARE; INFORMATION SYSTEMS;

ALGORITHMS;

EID: 84864286297 PISSN: 10668888 EISSN: 0949877X Source Type: Journal
DOI: 10.1007/s00778-011-0252-8 Document Type: Article

Times cited : (60)

References (59)

1
- 84864280321
- http://secondstring. sourceforge. net/.

2
- 84864278686
- http://www. dcs. shef. ac. uk/~sam/simmetrics. html.

3
- 77950901996
- Scalable ad-hoc entity extraction from text collections
- Agrawal S., Chakrabarti K., Chaudhuri S., Ganti V.: Scalable ad-hoc entity extraction from text collections. PVLDB 1(1), 945-957 (2008).
- (2008) PVLDB , vol.1 , Issue.1 , pp. 945-957
- Agrawal, S.¹ Chakrabarti, K.² Chaudhuri, S.³ Ganti, V.⁴

4
- 52649137537
- Transformation-based framework for record matching
- Arasu, A., Chaudhuri, S., Kaushik, R.: Transformation-based framework for record matching. In: ICDE, pp. 40-49 (2008).
- (2008) ICDE , pp. 40-49
- Arasu, A.¹ Chaudhuri, S.² Kaushik, R.³

5
- 85104914015
- Efficient exact set-similarity joins
- Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918-929 (2006).
- (2006) VLDB , pp. 918-929
- Arasu, A.¹ Ganti, V.² Kaushik, R.³

6
- 52649127789
- Approximate joins for data-centric xml
- Augsten, N., Böhlen, M. H., Dyreson, C. E., Gamper, J.: Approximate joins for data-centric xml. In: ICDE, pp. 814-823 (2008).
- (2008) ICDE , pp. 814-823
- Augsten, N.¹ Böhlen, M.H.² Dyreson, C.E.³ Gamper, J.⁴

7
- 35348849154
- Scaling up all pairs similarity search
- Bayardo, R. J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW, pp. 131-140 (2007).
- (2007) WWW , pp. 131-140
- Bayardo, R.J.¹ Ma, Y.² Srikant, R.³

8
- 52649109639
- Compact similarity joins
- Bryan, B., Eberhardt, F., Faloutsos, C.: Compact similarity joins. In: ICDE, pp. 346-355 (2008).
- (2008) ICDE , pp. 346-355
- Bryan, B.¹ Eberhardt, F.² Faloutsos, C.³

9
- 72949105984
- Fast error-tolerant search on very large texts
- Celikik, M., Bast, H.: Fast error-tolerant search on very large texts. In: SAC, pp. 1724-1731 (2009).
- (2009) SAC , pp. 1724-1731
- Celikik, M.¹ Bast, H.²

10
- 57149127665
- An efficient filter for approximate membership checking
- Chakrabarti, K., Chaudhuri, S., Ganti, V., Xin, D.: An efficient filter for approximate membership checking. In: SIGMOD Conference, pp. 805-818 (2008).
- (2008) SIGMOD Conference , pp. 805-818
- Chakrabarti, K.¹ Chaudhuri, S.² Ganti, V.³ Xin, D.⁴

11
- 1142279457
- Robust and efficient fuzzy match for online data cleaning
- Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD Conference, pp. 313-324 (2003).
- (2003) SIGMOD Conference , pp. 313-324
- Chaudhuri, S.¹ Ganjam, K.² Ganti, V.³ Motwani, R.⁴

12
- 84859202692
- Data debugger: An operator-centric approach for data quality solutions
- Chaudhuri S., Ganti V., Kaushik R.: Data debugger: An operator-centric approach for data quality solutions. IEEE Data Eng. Bull. 29(2), 60-66 (2006).
- (2006) IEEE Data Eng. Bull. , vol.29 , Issue.2 , pp. 60-66
- Chaudhuri, S.¹ Ganti, V.² Kaushik, R.³

13
- 33749597967
- A primitive operator for similarity joins in data cleaning
- Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, pp. 5-16 (2006).
- (2006) ICDE , pp. 5-16
- Chaudhuri, S.¹ Ganti, V.² Kaushik, R.³

14
- 70849103081
- Extending autocompletion to tolerate errors
- Chaudhuri, S., Kaushik, R.: Extending autocompletion to tolerate errors. In: SIGMOD Conference, pp. 707-718 (2009).
- (2009) SIGMOD Conference , pp. 707-718
- Chaudhuri, S.¹ Kaushik, R.²

15
- 84993661659
- M-tree: An efficient access method for similarity search in metric spaces
- Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426-435 (1997).
- (1997) VLDB , pp. 426-435
- Ciaccia, P.¹ Patella, M.² Zezula, P.³

16
- 4544388794
- Dictionary matching and indexing with errors and don't cares
- Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don't cares. In: STOC, pp. 91-100 (2004).
- (2004) STOC , pp. 91-100
- Cole, R.¹ Gottlieb, L.-A.² Lewenstein, M.³

17
- 84945709825
- Trie memory
- Fredkin E.: Trie memory. Commun. ACM 3(9), 490-499 (1960).
- (1960) Commun. ACM , vol.3 , Issue.9 , pp. 490-499
- Fredkin, E.¹

18
- 0003915740
- Reading: Addison-Wesley
- Gonnet G. H.: Handbook of Algorithms and Data structures. Addison-Wesley, Reading (1984).
- (1984) Handbook of Algorithms and Data Structures
- Gonnet, G.H.¹

19
- 84944318804
- Approximate string joins in a database (almost) for free
- Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB, pp. 491-500 (2001).
- (2001) VLDB , pp. 491-500
- Gravano, L.¹ Ipeirotis, P.G.² Jagadish, H.V.³ Koudas, N.⁴ Muthukrishnan, S.⁵ Srivastava, D.⁶

20
- 0344496626
- Index-based approximate xml joins
- Guha, S., Koudas, N., Srivastava, D., Yu, T.: Index-based approximate xml joins. In: ICDE, pp. 708-710 (2003).
- (2003) ICDE , pp. 708-710
- Guha, S.¹ Koudas, N.² Srivastava, D.³ Yu, T.⁴

21
- 52649145249
- Fast indexes and algorithms for set similarity selection queries
- Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE, pp. 267-276 (2008).
- (2008) ICDE , pp. 267-276
- Hadjieleftheriou, M.¹ Chandel, A.² Koudas, N.³ Srivastava, D.⁴

22
- 70849096574
- Incremental maintenance of length normalized indexes for approximate string matching
- Hadjieleftheriou, M., Koudas, N., Srivastava, D.: Incremental maintenance of length normalized indexes for approximate string matching. In: SIGMOD Conference, pp. 429-440 (2009).
- (2009) SIGMOD Conference , pp. 429-440
- Hadjieleftheriou, M.¹ Koudas, N.² Srivastava, D.³

23
- 80053953777
- Weighted set-based string similarity
- Hadjieleftheriou M., Srivastava D.: Weighted set-based string similarity. IEEE Data Eng. Bull. 33(1), 25-36 (2010).
- (2010) IEEE Data Eng. Bull. , vol.33 , Issue.1 , pp. 25-36
- Hadjieleftheriou, M.¹ Srivastava, D.²

24
- 70349659026
- Hashed samples: selectivity estimators for set similarity selection queries
- Hadjieleftheriou M., Yu X., Koudas N., Srivastava D.: Hashed samples: selectivity estimators for set similarity selection queries. PVLDB 1(1), 201-212 (2008).
- (2008) PVLDB , vol.1 , Issue.1 , pp. 201-212
- Hadjieleftheriou, M.¹ Yu, X.² Koudas, N.³ Srivastava, D.⁴

25
- 0038564328
- Burst tries: a fast, efficient data structure for string keys
- Heinz S., Zobel J., Williams H. E.: Burst tries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20(2), 192-223 (2002).
- (2002) ACM Trans. Inf. Syst. , vol.20 , Issue.2 , pp. 192-223
- Heinz, S.¹ Zobel, J.² Williams, H.E.³

26
- 0004037050
- Technical report, U. S. Bureau of the Census, Washington, D. C
- Jaro, M. A. Unimatch: A record linkage system: User's manual. Technical report, U. S. Bureau of the Census, Washington, D. C., (1976).
- (1976) Unimatch: A Record Linkage System: User's Manual
- Jaro, M.A.¹

27
- 77954747849
- Probabilistic string similarity joins
- Jestes, J., Li, F., Yan, Z., Yi, K.: Probabilistic string similarity joins. In: SIGMOD Conference, pp. 327-338 (2010).
- (2010) SIGMOD Conference , pp. 327-338
- Jestes, J.¹ Li, F.² Yan, Z.³ Yi, K.⁴

28
- 84865633750
- Efficient interactive fuzzy keyword search
- Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In WWW, pp. 433-439 (2009).
- (2009) WWW , pp. 433-439
- Ji, S.¹ Li, G.² Li, C.³ Feng, J.⁴

29
- 84944324113
- Efficient index structures for string databases
- Kahveci, T., Singh, A. K.: Efficient index structures for string databases. In: VLDB, pp. 351-360 (2001).
- (2001) VLDB , pp. 351-360
- Kahveci, T.¹ Singh, A.K.²

30
- 33745621089
- n-Gram/2L: A space and time efficient two-level n-gram inverted index structure
- Kim, M.-S., Whang, K.-Y., Lee, J.-G., Lee, M.-J. n-Gram/2L: A space and time efficient two-level n-gram inverted index structure. In: VLDB, pp. 325-336 (2005).
- (2005) VLDB , pp. 325-336
- Kim, M.-S.¹ Whang, K.-Y.² Lee, J.-G.³ Lee, M.-J.⁴

31
- 0003657590
- Reading: Addison-Wesley
- Knuth D. E.: The Art of Computer Programming, Volume 1: Fundamental algorithms. Addison-Wesley, Reading (1968).
- (1968) The Art of Computer Programming, Volume 1: Fundamental Algorithms
- Knuth, D.E.¹

32
- 85011072445
- Extending q-grams to estimate selectivity of string matching with low edit distance
- Lee, H., Ng, R. T., Shim, K.: Extending q-grams to estimate selectivity of string matching with low edit distance. In: VLDB, pp. 195-206 (2007).
- (2007) VLDB , pp. 195-206
- Lee, H.¹ Ng, R.T.² Shim, K.³

33
- 77957718350
- Power-law based estimation of set similarity join size
- Lee H., Ng R. T., Shim K.: Power-law based estimation of set similarity join size. PVLDB 2(1), 658-669 (2009).
- (2009) PVLDB , vol.2 , Issue.1 , pp. 658-669
- Lee, H.¹ Ng, R.T.² Shim, K.³

34
- 52649086729
- Efficient merging and filtering algorithms for approximate string searches
- Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257-266 (2008).
- (2008) ICDE , pp. 257-266
- Li, C.¹ Lu, J.² Lu, Y.³

35
- 85011032600
- VGRAM: Improving performance of approximate queries on string collections using variable-length grams
- Li, C., Wang, B., Yang, X. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303-314 (2007).
- (2007) VLDB , pp. 303-314
- Li, C.¹ Wang, B.² Yang, X.³

36
- 79959922359
- Faerie: Efficient filtering algorithms for approximate dictionary-based entity extraction
- Li, G., Deng, D., Feng, J. Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction. In: SIGMOD Conference, pp. 529-540 (2011).
- (2011) SIGMOD Conference , pp. 529-540
- Li, G.¹ Deng, D.² Feng, J.³

37
- 79960467518
- Efficient fuzzy full-text type-ahead search
- Li G., Ji S., Li C., Feng J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617-640 (2011).
- (2011) VLDB J. , vol.20 , Issue.4 , pp. 617-640
- Li, G.¹ Ji, S.² Li, C.³ Feng, J.⁴

38
- 84859260100
- Set similarity join on probabilistic data
- Lian X., Chen L.: Set similarity join on probabilistic data. PVLDB 3(1), 650-659 (2010).
- (2010) PVLDB , vol.3 , Issue.1 , pp. 650-659
- Lian, X.¹ Chen, L.²

39
- 74549168398
- Efficient algorithms for approximate member extraction using signature-based inverted lists
- Lu, J., Han, J., Meng, X.: Efficient algorithms for approximate member extraction using signature-based inverted lists. In: CIKM, pp. 315-324 (2009).
- (2009) CIKM , pp. 315-324
- Lu, J.¹ Han, J.² Meng, X.³

40
- 38149018071
- Patricia: practical algorithm to retrieve information coded in alphanumeric
- Morrison D. R.: Patricia: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 514-534 (1968).
- (1968) J. ACM , vol.15 , pp. 514-534
- Morrison, D.R.¹

41
- 0345566149
- A guided tour to approximate string matching
- Navarro G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31-88 (2001).
- (2001) ACM Comput. Surv. , vol.33 , Issue.1 , pp. 31-88
- Navarro, G.¹

42
- 0032656493
- Ip-address lookup using lc-tries
- Nilsson S., Karlsson G.: Ip-address lookup using lc-tries. IEEE J. Selected Areas Commun. 17, 1083-1092 (1999).
- (1999) IEEE J. Selected Areas Commun. , vol.17 , pp. 1083-1092
- Nilsson, S.¹ Karlsson, G.²

43
- 84976659272
- Computer programs for detecting and correcting spelling errors
- Peterson J. L.: Computer programs for detecting and correcting spelling errors. Commun. ACM 23(12), 676-687 (1980).
- (1980) Commun. ACM , vol.23 , Issue.12 , pp. 676-687
- Peterson, J.L.¹

44
- 84864279560
- Available at
- Russell, R. C.: Available at http://patft. uspto. gov/netacgi/nph-Parser?patentnumber=1261167 (1918).
- (1918)
- Russell, R.C.¹

45
- 0344065611
- Distance based indexing for string proximity search
- Sahinalp, S. C., Tasan, M., Macker, J., Özsoyoglu, Z. M.: Distance based indexing for string proximity search. In: ICDE, pp. 125-136 (2003).
- (2003) ICDE , pp. 125-136
- Sahinalp, S.C.¹ Tasan, M.² Macker, J.³ Özsoyoglu, Z.M.⁴

46
- 0017930815
- Dynamic programming algorithm optimization for spoken word recognition
- Sakoe H., Chiba S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust Speech Signal Process 26, 43-49 (1978).
- (1978) IEEE Trans. Acoust Speech Signal Process , vol.26 , pp. 43-49
- Sakoe, H.¹ Chiba, S.²

47
- 0003653039
- NY: McGraw Hill
- Salton G.: Introduction to Modern Information Retrieval. McGraw Hill, NY (1987).
- (1987) Introduction to Modern Information Retrieval
- Salton, G.¹

48
- 3142777876
- Efficient set joins on similarity predicates
- Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD Conference, pp. 743-754 (2004).
- (2004) SIGMOD Conference , pp. 743-754
- Sarawagi, S.¹ Kirpal, A.²

49
- 84893371228
- Fast string correction with levenshtein automata
- Schulz K. U., Mihov S.: Fast string correction with levenshtein automata. Intl J Doc Anal Recognit 5(1), 67-85 (2002).
- (2002) Intl J Doc Anal Recognit , vol.5 , Issue.1 , pp. 67-85
- Schulz, K.U.¹ Mihov, S.²

50
- 0005079936
- Use of tree structures for processing files
- Sussenguth E. H.: Use of tree structures for processing files. Commun. ACM 6, 272-279 (1963).
- (1963) Commun. ACM , vol.6 , pp. 272-279
- Sussenguth, E.H.¹

51
- 77954744650
- Efficient parallel set-similarity joins using mapreduce
- Vernica, R., Carey, M. J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD Conference, pp. 495-506 (2010).
- (2010) SIGMOD Conference , pp. 495-506
- Vernica, R.¹ Carey, M.J.² Li, C.³

52
- 79957822983
- Trie-join: Efficient trie-based string similarity joins with edit-distance constraints
- Wang J., Li G., Feng J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219-1230 (2010).
- (2010) PVLDB , vol.3 , Issue.1 , pp. 1219-1230
- Wang, J.¹ Li, G.² Feng, J.³

53
- 79957824788
- Fast-join: An efficient method for fuzzy token matching based string similarity join
- Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE pp. 458-469 (2011).
- (2011) ICDE , pp. 458-469
- Wang, J.¹ Li, G.² Feng, J.³

54
- 84863541462
- Entity matching: how similar is similar
- Wang J., Li G., Yu J. X., Feng J.: Entity matching: how similar is similar. PVLDB 4(10), 622-633 (2011).
- (2011) PVLDB , vol.4 , Issue.10 , pp. 622-633
- Wang, J.¹ Li, G.² Yu, J.X.³ Feng, J.⁴

55
- 70849115286
- Efficient approximate entity extraction with edit distance constraints
- Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit distance constraints. In: SIGMOD Conference, pp. 759-770 (2009).
- (2009) SIGMOD Conference , pp. 759-770
- Wang, W.¹ Xiao, C.² Lin, X.³ Zhang, C.⁴

56
- 70849105253
- Ed-join: an efficient algorithm for similarity joins with edit distance constraints
- Xiao C., Wang W., Lin X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB 1(1), 933-944 (2008).
- (2008) PVLDB , vol.1 , Issue.1 , pp. 933-944
- Xiao, C.¹ Wang, W.² Lin, X.³

57
- 67649653766
- Top-k set similarity joins
- Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: ICDE, pp. 916-927 (2009).
- (2009) ICDE , pp. 916-927
- Xiao, C.¹ Wang, W.² Lin, X.³ Shang, H.⁴

58
- 57349141410
- Efficient similarity joins for near duplicate detection
- Xiao, C., Wang, W., Lin, X., Yu, J. X.: Efficient similarity joins for near duplicate detection. In: WWW, pp. 131-140 (2008).
- (2008) WWW , pp. 131-140
- Xiao, C.¹ Wang, W.² Lin, X.³ Yu, J.X.⁴

59
- 57149130672
- Cost-based variable-length-gram selection for string collections to support approximate queries efficiently
- Yang, X., Wang, B., Li, C.: Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: SIGMOD Conference, pp. 353-364 (2008).
- (2008) SIGMOD Conference , pp. 353-364
- Yang, X.¹ Wang, B.² Li, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.