SCOPUS 정보 검색 플랫폼

Proceedings of the VLDB Endowment

Volumn 1, Issue 1, 2008, Pages 933-944

Edjoin: An efficient algorithm for similarity joins with edit distance constraints

(3) Xiao, Chuan a Wang, Wei a Lin, Xuemin a

a UNIVERSITY OF NEW SOUTH WALES (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

PATTERN RECOGNITION;

APPLICATION AREA; COMPUTATION TIME; FILTERING METHOD; FUNDAMENTAL OPERATIONS; PARAMETER SETTING; RESEARCH COMMUNITIES; SIMILARITY JOIN; SUBSTANTIAL REDUCTION;

DATA INTEGRATION;

EID: 70849105253 PISSN: None EISSN: 21508097 Source Type: Conference Proceeding
DOI: 10.14778/1453856.1453957 Document Type: Article

Times cited : (272)

References (35)

1
- 0038754128
- Lower bounds for embedding edit distance into normed spaces
- A. Andoni, M. Deza, A. Gupta, P. Indyk, and S. Raskhodnikova. Lower bounds for embedding edit distance into normed spaces. In SODA, pages 523-526, 2003.
- (2003) SODA , pp. 523-526
- Andoni, A.¹ Deza, M.² Gupta, A.³ Indyk, P.⁴ Raskhodnikova, S.⁵

2
- 85104914015
- Efficient exact set-similarity joins
- A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, 2006.
- (2006) VLDB
- Arasu, A.¹ Ganti, V.² Kaushik, R.³

3
- 35348849154
- Scaling up all pairs similarity search
- R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, 2007.
- (2007) WWW
- Bayardo, R.J.¹ Ma, Y.² Srikant, R.³

4
- 2342447399
- Adaptive name matching in information integration
- M. Bilenko, R. J. Mooney, W. W. Cohen, P. Ravikumar, and S. E. Fienberg. Adaptive name matching in information integration. IEEE Intelligent Sys., 18(5):16-23, 2003.
- (2003) IEEE Intelligent Sys , vol.18 , Issue.5 , pp. 16-23
- Bilenko, M.¹ Mooney, R.J.² Cohen, W.W.³ Ravikumar, P.⁴ Fienberg, S.E.⁵

5
- 0034831593
- Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data
- C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data. In SIGMOD, pages 379-388, 2001.
- (2001) SIGMOD , pp. 379-388
- Böhm, C.¹ Braunmüller, B.² Krebs, F.³ Kriegel, H.-P.⁴

6
- 0010362121
- Syntactic clustering of the web
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
- (1997) Computer Networks , vol.29 , Issue.8-13 , pp. 1157-1166
- Broder, A.Z.¹ Glassman, S.C.² Manasse, M.S.³ Zweig, G.⁴

7
- 45249103790
- One-gapped q-gram filters for Levenshtein distance
- S. Burkhardt and J. Kärkkäinen. One-gapped q-gram filters for Levenshtein distance. In CPM, pages 225-234, 2002.
- (2002) CPM , pp. 225-234
- Burkhardt, S.¹ Kärkkäinen, J.²

8
- 35448984015
- Benchmarking declarative approximate selection predicates
- A. Chandel, O. Hassanzadeh, N. Koudas, M. Sadoghi, and D. Srivastava. Benchmarking declarative approximate selection predicates. In SIGMOD, pages 353-364, 2007.
- (2007) SIGMOD , pp. 353-364
- Chandel, A.¹ Hassanzadeh, O.² Koudas, N.³ Sadoghi, M.⁴ Srivastava, D.⁵

9
- 0036040277
- Similarity estimation techniques from rounding algorithms
- M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
- (2002) STOC
- Charikar, M.¹

10
- 85011029434
- Example-driven design of efficient record matching queries
- S. Chaudhuri, B.-C. Chen, V. Ganti, and R. Kaushik. Example-driven design of efficient record matching queries. In VLDB, pages 327-338, 2007.
- (2007) VLDB , pp. 327-338
- Chaudhuri, S.¹ Chen, B.-C.² Ganti, V.³ Kaushik, R.⁴

11
- 84859202692
- Data debugger: An operator-centric approach for data quality solutions
- S. Chaudhuri, V. Ganti, and R. Kaushik. Data debugger: An operator-centric approach for data quality solutions. IEEE Data Eng. Bull., 29(2):60-66, 2006.
- (2006) IEEE Data Eng. Bull. , vol.29 , Issue.2 , pp. 60-66
- Chaudhuri, S.¹ Ganti, V.² Kaushik, R.³

12
- 33749597967
- A primitive operator for similarity joins in data cleaning
- S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, 2006.
- (2006) ICDE
- Chaudhuri, S.¹ Ganti, V.² Kaushik, R.³

13
- 15044355327
- Similarity search in high dimensions via hashing
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
- (1999) VLDB
- Gionis, A.¹ Indyk, P.² Motwani, R.³

14
- 84944318804
- Approximate string joins in a database (almost) for free
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, 2001.
- (2001) VLDB
- Gravano, L.¹ Ipeirotis, P.G.² Jagadish, H.V.³ Koudas, N.⁴ Muthukrishnan, S.⁵ Srivastava, D.⁶

15
- 84859169313
- Technical Report CUCS-011-03, Columbia University
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free (erratum). Technical Report CUCS-011-03, Columbia University, 2003.
- (2003) Approximate string joins in a database (almost) for free (erratum)
- Gravano, L.¹ Ipeirotis, P.G.² Jagadish, H.V.³ Koudas, N.⁴ Muthukrishnan, S.⁵ Srivastava, D.⁶

16
- 0004137004
- Computer Science and Computational Biology. Cambridge University Press
- D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997.
- (1997) Algorithms on Strings, Trees, and Sequences
- Gusfield, D.¹

17
- 84994164621
- Evaluation of main memory join algorithms for joins with set comparison join predicates
- S. Helmer and G. Moerkotte. Evaluation of main memory join algorithms for joins with set comparison join predicates. In VLDB, pages 386-395, 1997.
- (1997) VLDB , pp. 386-395
- Helmer, S.¹ Moerkotte, G.²

18
- 33750296887
- Finding near-duplicate web pages: a large-scale evaluation of algorithms
- M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, 2006.
- (2006) SIGIR
- Henzinger, M.R.¹

19
- 0013331361
- Real-world data is dirty: Data cleansing and the merge/purge problem
- M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1):9-37, 1998.
- (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
- Hernández, M.A.¹ Stolfo, S.J.²

20
- 85011072445
- Extending q-grams to estimate selectivity of string matching with low edit distance
- H. Lee, R. T. Ng, and K. Shim. Extending q-grams to estimate selectivity of string matching with low edit distance. In VLDB, pages 195-206, 2007.
- (2007) VLDB , pp. 195-206
- Lee, H.¹ Ng, R.T.² Shim, K.³

21
- 85011032600
- VGRAM: Improving performance of approximate queries on string collections using variable-length grams
- C. Li, B. Wang, and X. Yang. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In VLDB, 2007.
- (2007) VLDB
- Li, C.¹ Wang, B.² Yang, X.³

22
- 1142267356
- Efficient processing of joins on set-valued attributes
- N. Mamoulis. Efficient processing of joins on set-valued attributes. In SIGMOD, pages 157-168, 2003.
- (2003) SIGMOD , pp. 157-168
- Mamoulis, N.¹

23
- 0018985316
- A faster algorithm computing string edit distances
- W. J. Masek and M. Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980.
- (1980) J. Comput. Syst. Sci. , vol.20 , Issue.1 , pp. 18-31
- Masek, W.J.¹ Paterson, M.²

24
- 0348220548
- Adaptive algorithms for set containment joins
- S. Melnik and H. Garcia-Molina. Adaptive algorithms for set containment joins. ACM Trans. Database Syst., 28:56-99, 2003.
- (2003) ACM Trans. Database Syst. , vol.28 , pp. 56-99
- Melnik, S.¹ Garcia-Molina, H.²

25
- 0000541351
- A fast bit-vector algorithm for approximate string matching based on dynamic programming
- G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM, 46(3):395-415, 1999.
- (1999) J. ACM , vol.46 , Issue.3 , pp. 395-415
- Myers, G.¹

26
- 0345566149
- A guided tour to approximate string matching
- G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31-88, 2001.
- (2001) ACM Comput. Surv. , vol.33 , Issue.1 , pp. 31-88
- Navarro, G.¹

27
- 0001670844
- Performance in practice of string hashing functions
- M. V. Ramakrishna and J. Zobel. Performance in practice of string hashing functions. In DASFAA, pages 215-224, 1997.
- (1997) DASFAA , pp. 215-224
- Ramakrishna, M.V.¹ Zobel, J.²

28
- 10644238464
- Set containment joins: The good, the bad and the ugly
- K. Ramasamy, J. M. Patel, J. F. Naughton, and R. Kaushik. Set containment joins: The good, the bad and the ugly. In VLDB, pages 351-362, 2000.
- (2000) VLDB , pp. 351-362
- Ramasamy, K.¹ Patel, J.M.² Naughton, J.F.³ Kaushik, R.⁴

29
- 0242456811
- Interactive deduplication using active learning
- S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In KDD, 2002.
- (2002) KDD
- Sarawagi, S.¹ Bhamidipaty, A.²

30
- 3142777876
- Efficient set joins on similarity predicates
- S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD, 2004.
- (2004) SIGMOD
- Sarawagi, S.¹ Kirpal, A.²

31
- 0004498253
- On approximate string matching
- E. Ukkonen. On approximate string matching. In FCT, 1983.
- (1983) FCT
- Ukkonen, E.¹

32
- 0015960104
- The string-to-string correction problem
- R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974.
- (1974) J. ACM , vol.21 , Issue.1 , pp. 168-173
- Wagner, R.A.¹ Fischer, M.J.²

33
- 0012866045
- Technical report, U.S. Bureau of the Census
- W. E. Winkler. The state of record linkage and current research problems. Technical report, U.S. Bureau of the Census, 1999.
- (1999) The state of record linkage and current research problems
- Winkler, W.E.¹

34
- 66249113620
- Efficient similarity joins for near duplicate detection
- C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, 2008.
- (2008) WWW
- Xiao, C.¹ Wang, W.² Lin, X.³ Yu, J.X.⁴

35
- 33747729581
- Inverted files for text search engines
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 2006.
- (2006) ACM Comput. Surv. , vol.38 , Issue.2
- Zobel, J.¹ Moffat, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.