SCOPUS 정보 검색 플랫폼

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09

Volumn , Issue , 2009, Pages 450-461

Efficient Top-K count queries over imprecise duplicates

(3) Sarawagi, Sunita a Deshpande, Vinay S a Kasliwal, Sourabh a

a NT Bombay ^* (India)

Author keywords

[No Author keywords available]

Indexed keywords

DATA SETS; DEDUPLICATION; DUPLICATE ELIMINATION; EXPONENTIAL TIME ALGORITHM; LINEAR EMBEDDING; NONLOCAL; NOVEL METHODS; NP-HARD; ON THE FLIES; ORDER OF MAGNITUDE; POLYNOMIAL-TIME ALGORITHMS; RUNNING TIME;

COMPUTATIONAL COMPLEXITY; DATABASE SYSTEMS; POLYNOMIAL APPROXIMATION;

CLUSTERING ALGORITHMS;

EID: 70349162329 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1516360.1516413 Document Type: Conference Paper

Times cited : (4)

References (37)

1
- 2342576574
- Eliminating fuzzy duplicates in data warehouses
- R. Ananthakrishna, S. chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In VLDB, 2002.
- (2002) VLDB
- Ananthakrishna, R.¹ chaudhuri, S.² Ganti, V.³

2
- 33749588820
- Clean answers over dirty databases: A probabilistic approach
- P. Andritsos, A. Fuxman, and R. J. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006.
- (2006) ICDE
- Andritsos, P.¹ Fuxman, A.² Miller, R.J.³

3
- 85104914015
- Efficient exact set-similarity joins
- A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, 2006.
- (2006) VLDB
- Arasu, A.¹ Ganti, V.² Kaushik, R.³

4
- 0036949730
- Correlation clustering
- Washington, DC, USA, IEEE Computer Society
- N. Bansal, A. Blum, and S. Chawla. Correlation clustering. In FOGS '02: Proceedings of the 43rd Symposium on Foundations of Computer Science, page 238, Washington, DC, USA, 2002. IEEE Computer Society.
- (2002) FOGS '02: Proceedings of the 43rd Symposium on Foundations of Computer Science , pp. 238
- Bansal, N.¹ Blum, A.² Chawla, S.³

5
- 34248229658
- Collective entity resolution in relational data
- I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. TKDD, 1(1), 2007.
- (2007) TKDD , vol.1 , Issue.1
- Bhattacharya, I.¹ Getoor, L.²

6
- 9444281954
- Learnable similarity functions and their applications to clustering and record linkage
- San Jose, California, USA, AAAI Press/The MIT Press
- M. Bilenko. Learnable similarity functions and their applications to clustering and record linkage. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004, San Jose, California, USA, pages 981-982. AAAI Press/The MIT Press, 2004.
- (2004) Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004 , pp. 981-982
- Bilenko, M.¹

7
- 33746054079
- Adaptive product normalization: Using online learning for record linkage in comparison shopping
- M. Bilenko, S. Basu, and M. Sahami. Adaptive product normalization: Using online learning for record linkage in comparison shopping. In ICDM, 2005.
- (2005) ICDM
- Bilenko, M.¹ Basu, S.² Sahami, M.³

8
- 84878049861
- Adaptive blocking: Learning to scale up record linkage
- M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage. In ICDM, 2006.
- (2006) ICDM
- Bilenko, M.¹ Kamath, B.² Mooney, R.J.³

9
- 2342447399
- Adaptive name-matching in information integration
- M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name-matching in information integration. IEEE Intelligent Systems, 2003.
- (2003) IEEE Intelligent Systems
- Bilenko, M.¹ Mooney, R.² Cohen, W.³ Ravikumar, P.⁴ Fienberg, S.⁵

10
- 24644456480
- Clustering with qualitative information
- M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360-383, 2005.
- (2005) J. Comput. Syst. Sci , vol.71 , Issue.3 , pp. 360-383
- Charikar, M.¹ Guruswami, V.² Wirth, A.³

11
- 85011029434
- Example-driven design of efficient record matching queries
- S. Chaudhuri, B.-C. Chen, V. Ganti, and R. Kaushik. Example-driven design of efficient record matching queries. In VLDB, pages 327-338, 2007.
- (2007) VLDB , pp. 327-338
- Chaudhuri, S.¹ Chen, B.-C.² Ganti, V.³ Kaushik, R.⁴

12
- 1142279457
- Robust and efficient fuzzy match for online data cleaning
- S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, 2003.
- (2003) SIGMOD
- Chaudhuri, S.¹ Ganjam, K.² Ganti, V.³ Motwani, R.⁴

13
- 26444550791
- Robust identification of fuzzy duplicates
- S. Chaudhuri, V. Ganti, and R. Motwani. Robust identification of fuzzy duplicates. In ICDE, 2005.
- (2005) ICDE
- Chaudhuri, S.¹ Ganti, V.² Motwani, R.³

14
- 33846213661
- A divide-and-merge methodology for clustering
- D. Cheng, R. Kannan, S. Vempala, and G. Wang. A divide-and-merge methodology for clustering. ACM Trans. Database Syst., 31(4):1499-1525, 2006.
- (2006) ACM Trans. Database Syst , vol.31 , Issue.4 , pp. 1499-1525
- Cheng, D.¹ Kannan, R.² Vempala, S.³ Wang, G.⁴

15
- 3142781285
- Learning to match and cluster entity names
- W. Cohen and J. Richman. Learning to match and cluster entity names. In ACM SIGIR' 01 Workshop on Mathematical/Formal Methods in Information Retrieval, 2001.
- (2001) ACM SIGIR' 01 Workshop on Mathematical/Formal Methods in Information Retrieval
- Cohen, W.¹ Richman, J.²

16
- 0000666461
- Data integration using similarity joins and a word-based information representation language
- July
- W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288-321, July 2000.
- (2000) ACM Transactions on Information Systems , vol.18 , Issue.3 , pp. 288-321
- Cohen, W.W.¹

17
- 85011051649
- Data integration with uncertainty
- X. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 687-698, 2007.
- (2007) VLDB '07: Proceedings of the 33rd international conference on Very large data bases , pp. 687-698
- Dong, X.¹ Halevy, A.Y.² Yu, C.³

18
- 84947399464
- A theory for record linkage
- I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Society, 64:1183-1210, 1969.
- (1969) Journal of the American Statistical Society , vol.64 , pp. 1183-1210
- Fellegi, I.P.¹ Sunter, A.B.²

19
- 84944318804
- Approximate string joins in a database (almost) for free
- Rome, Italy
- L. Gravano, P. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In Proc. of the 27th Int'l Conference on Very Large Databases (VLDB), Rome, Italy, 2001.
- (2001) Proc. of the 27th Int'l Conference on Very Large Databases (VLDB)
- Gravano, L.¹ Ipeirotis, P.² Jagadish, H.V.³ Koudas, N.⁴ Muthukrishnan, S.⁵ Srivastava, D.⁶

20
- 55349093583
- Survey of top-k query processing techniques in relational database systems
- I. F. Ilyas, G. Beskales, and M. A. Soliman. Survey of top-k query processing techniques in relational database systems,. To Appear in the ACM Computing Surveys, 2008, 2008.
- (2008) To Appear in the ACM Computing Surveys , pp. 2008
- Ilyas, I.F.¹ Beskales, G.² Soliman, M.A.³

21
- 0004161991
- Prentice Hall
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
- (1988) Algorithms for Clustering Data
- Jain, A.K.¹ Dubes, R.C.²

22
- 63449083945
- J. Ko, T. Mitamura, and E. Nyberg. Language-independent probabilistic answer ranking for question answering. In ACL, 2007.
- J. Ko, T. Mitamura, and E. Nyberg. Language-independent probabilistic answer ranking for question answering. In ACL, 2007.

23
- 70349131273
- Structured probabilistic models
- D. Koller and N. Friedman. Structured probabilistic models. Under preparation, 2007.
- (2007) Under preparation
- Koller, D.¹ Friedman, N.²

24
- 84888516789
- Y. Koren and D. Harel. A multi-scale algorithm for the linear arrangement problem. In WG, 2002.
- Y. Koren and D. Harel. A multi-scale algorithm for the linear arrangement problem. In WG, 2002.

25
- 34250767109
- Supporting ad-hoc ranking aggregates
- C. Li, K. C.-C. Chang, and I. F. Ilyas. Supporting ad-hoc ranking aggregates. In SIGMOD Conference, 2006.
- (2006) SIGMOD Conference
- Li, C.¹ Chang, K.C.-C.² Ilyas, I.F.³

26
- 0034592784
- Efficient clustering of high-dimensional data sets with application to reference matching
- A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Knowledge Discovery and Data Mining, pages 169-178, 2000.
- (2000) Knowledge Discovery and Data Mining , pp. 169-178
- McCallum, A.¹ Nigam, K.² Ungar, L.H.³

27
- 22944471695
- Toward conditional models of identity uncertainty with application to proper noun coreference
- Acapulco, Mexico, Aug
- A. McCallum and B. Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pages 79-86, Acapulco, Mexico, Aug. 2003.
- (2003) Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web , pp. 79-86
- McCallum, A.¹ Wellner, B.²

28
- 85018108837
- The field matching problem: Algorithms and applications
- A. E. Monge and C. P. Elkan. The field matching problem: Algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996.
- (1996) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)
- Monge, A.E.¹ Elkan, C.P.²

29
- 84898987614
- Identity uncertainty and citation matching
- Vancouver, British Columbia, MIT Press
- H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Advances in Neural Processing Systems 15, Vancouver, British Columbia, 2002. MIT Press.
- (2002) Advances in Neural Processing Systems 15
- Pasula, H.¹ Marthi, B.² Milch, B.³ Russell, S.⁴ Shpitser, I.⁵

30
- 38049124282
- S. Sarawagi. The crf project: a java implementation. http://crf. sourceforge.net, 2004.
- (2004) The crf project: A java implementation
- Sarawagi, S.¹

31
- 0242456811
- Interactive deduplication using active learning
- Edmonton, Canada, July
- S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2002), Edmonton, Canada, July 2002.
- (2002) Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2002)
- Sarawagi, S.¹ Bhamidipaty, A.²

32
- 0344496278
- Scaling up the alias duplicate elimination system: A demostration
- Bangalore, March
- S. Sarawagi and A. Kirpal. Scaling up the alias duplicate elimination system: A demostration. In Proc. of the 19th IEEE Int'l Conference on Data Engineering (ICDE), Bangalore, March 2003.
- (2003) Proc. of the 19th IEEE Int'l Conference on Data Engineering (ICDE)
- Sarawagi, S.¹ Kirpal, A.²

33
- 3142777876
- Efficient set joins on similarity predicates
- S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004.
- (2004) Proceedings of the ACM SIGMOD International Conference on Management of Data
- Sarawagi, S.¹ Kirpal, A.²

34
- 57149128190
- Bootstrapping pay-as-you-go data integration systems
- A. D. Sarma, X. Dong, and A. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008.
- (2008) SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
- Sarma, A.D.¹ Dong, X.² Halevy, A.³

35
- 51149112283
- Probabilistic top- and ranking-aggregate queries
- M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. Probabilistic top- and ranking-aggregate queries. ACM Trans. Database Syst., 33(3), 2008.
- (2008) ACM Trans. Database Syst , vol.33 , Issue.3
- Soliman, M.A.¹ Ilyas, I.F.² Chang, K.C.-C.³

36
- 65449139953
- M. L. Wick, K. Rohanimanesh, K. Schultz, and A. McCallum. A unified approach for schema matching, coreference and canonicalization. In KDD, 2008.
- M. L. Wick, K. Rohanimanesh, K. Schultz, and A. McCallum. A unified approach for schema matching, coreference and canonicalization. In KDD, 2008.

37
- 57149130672
- Cost-based variable-length-gram selection for string collections to support approximate queries efficiently
- X. Yang, B. Wang, and C. Li. Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In SIGMOD Conference, pages 353-364, 2008.
- (2008) SIGMOD Conference , pp. 353-364
- Yang, X.¹ Wang, B.² Li, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.