-
1
-
-
34848818026
-
Aggregating inconsistent information: Ranking and clustering
-
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: ACM Symp. on Theory of Computing (STOC), pp. 684-693 (2005)
-
(2005)
ACM Symp. on Theory of Computing (STOC)
, pp. 684-693
-
-
Ailon, N.1
Charikar, M.2
Newman, A.3
-
2
-
-
33749588820
-
Clean answers over dirty databases: A probabilistic approach
-
Andritsos, P., Fuxman, A., Miller, R.J.: Clean answers over dirty databases: a probabilistic approach. In: IEEE Proc. of the Int'l Conf. on Data Eng., p. 30 (2006)
-
(2006)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 30
-
-
Andritsos, P.1
Fuxman, A.2
Miller, R.J.3
-
3
-
-
52649102939
-
Fast and simple relational processing of uncertain data
-
Antova, L., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 983-992 (2008)
-
(2008)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 983-992
-
-
Antova, L.1
Koch, C.2
Olteanu, D.3
-
4
-
-
85104914015
-
Efficient exact set-similarity joins
-
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pp. 918-929 (2006)
-
(2006)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB)
, pp. 918-929
-
-
Arasu, A.1
Ganti, V.2
Kaushik, R.3
-
5
-
-
67649649597
-
Large-scale deduplication with constraints using dedupalog
-
Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 952-963 (2009)
-
(2009)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 952-963
-
-
Arasu, A.1
-
6
-
-
4644233828
-
The star clustering algorithm for static and dynamic information organization
-
1068.68098 2112265
-
J.A. Aslam E. Pelekhov D. Rus 2004 The star clustering algorithm for static and dynamic information organization J. Graph Algorithm. Appl. 8 1 95 129 1068.68098 2112265
-
(2004)
J. Graph Algorithm. Appl.
, vol.8
, Issue.1
, pp. 95-129
-
-
Aslam, J.A.1
Pelekhov, E.2
Rus, D.3
-
7
-
-
3142665421
-
Correlation clustering
-
1089.68085 10.1023/B:MACH.0000033116.57574.95
-
N. Bansal A. Blum S. Chawla 2004 Correlation clustering Mach. Learn. 56 1-3 89 113 1089.68085 10.1023/B:MACH.0000033116.57574.95
-
(2004)
Mach. Learn.
, vol.56
, Issue.13
, pp. 89-113
-
-
Bansal, N.1
Blum, A.2
Chawla, S.3
-
8
-
-
35348849154
-
Scaling up all pairs similarity search
-
Banff
-
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Int'l World Wide Web Conference (WWW), pp. 131-140, Banff (2007)
-
(2007)
Int'l World Wide Web Conference (WWW)
, pp. 131-140
-
-
Bayardo, R.J.1
Ma, Y.2
Srikant, R.3
-
10
-
-
77954695997
-
Modeling and querying possible repairs in duplicate detection
-
Available as University of Waterloo, Tech. Report CS-2009-15
-
Beskales, G., Soliman, M.A., Ilyas, I.F., Ben-David, S.: Modeling and querying possible repairs in duplicate detection. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), 2009 (To Appear). Available as University of Waterloo, Tech. Report CS-2009-15, 2009
-
(2009)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), 2009 (To Appear)
-
-
Beskales, G.1
Soliman, M.A.2
Ilyas, I.F.3
Ben-David, S.4
-
13
-
-
36348996876
-
Collective entity resolution in relational data
-
I. Bhattacharya L. Getoor 2006 Collective entity resolution in relational data IEEE Data Eng. Bull. 29 2 4 12
-
(2006)
IEEE Data Eng. Bull.
, vol.29
, Issue.2
, pp. 4-12
-
-
Bhattacharya, I.1
Getoor, L.2
-
14
-
-
0141607824
-
Latent dirichlet allocation
-
1112.68379 10.1162/jmlr.2003.3.4-5.993
-
D.M. Blei A.Y. Ng M.I. Jordan 2003 Latent dirichlet allocation J. Mach. Learn. Res. 3 993 1022 1112.68379 10.1162/jmlr.2003.3.4-5.993
-
(2003)
J. Mach. Learn. Res.
, vol.3
, pp. 993-1022
-
-
Blei, D.M.1
Ng, A.Y.2
Jordan, M.I.3
-
15
-
-
29844455478
-
MYSTIQ: A system for finding more answers by using probabilities
-
Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: ACM SIGMOD Int'l Conf. on the Mgmt. of Data, pp. 891-893 (2005)
-
(2005)
ACM SIGMOD Int'l Conf. on the Mgmt. of Data
, pp. 891-893
-
-
Boulos, J.1
Dalvi, N.2
Mandhani, B.3
Mathur, S.4
Re, C.5
Suciu, D.6
-
16
-
-
1142279457
-
Robust and efficient fuzzy match for online data cleaning
-
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: ACM SIGMOD Int'l Conf. on the Mgmt. of Data, pp. 313-324 (2003)
-
(2003)
ACM SIGMOD Int'l Conf. on the Mgmt. of Data
, pp. 313-324
-
-
Chaudhuri, S.1
Ganjam, K.2
Ganti, V.3
Motwani, R.4
-
17
-
-
26444550791
-
Robust identification of fuzzy duplicates
-
Washington
-
Chaudhuri, S., Ganti, V., Motwani, R.: Robust identification of fuzzy duplicates. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 865-876, Washington (2005)
-
(2005)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 865-876
-
-
Chaudhuri, S.1
Ganti, V.2
Motwani, R.3
-
18
-
-
35448937301
-
Leveraging aggregate constraints for deduplication
-
Chaudhuri, S., Das Sarma, A., Ganti, V., Kaushik, R.: Leveraging aggregate constraints for deduplication. In: ACM SIGMOD Int'l Conf. on the Mgmt. of Data, pp. 437-448 (2007)
-
(2007)
ACM SIGMOD Int'l Conf. on the Mgmt. of Data
, pp. 437-448
-
-
Chaudhuri, S.1
Das Sarma, A.2
Ganti, V.3
Kaushik, R.4
-
19
-
-
61349087255
-
Cleaning uncertain data with quality guarantees
-
R. Cheng J. Chen X. Xie 2008 Cleaning uncertain data with quality guarantees Proc. VLDB Endow. (PVLDB) 1 1 722 735
-
(2008)
Proc. VLDB Endow. (PVLDB)
, vol.1
, Issue.1
, pp. 722-735
-
-
Cheng, R.1
Chen, J.2
Xie, X.3
-
20
-
-
11144240583
-
A comparison of string distance metrics for name-matching tasks
-
Acapulco
-
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp. 73-78, Acapulco (2003)
-
(2003)
Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03)
, pp. 73-78
-
-
Cohen, W.W.1
Ravikumar, P.2
Fienberg, S.E.3
-
21
-
-
36148994784
-
Efficient query evaluation on probabilistic databases
-
10.1007/s00778-006-0004-3
-
N. Dalvi D. Suciu 2007 Efficient query evaluation on probabilistic databases Int. J. Very Large Data Bases 16 4 523 544 10.1007/s00778-006-0004-3
-
(2007)
Int. J. Very Large Data Bases
, vol.16
, Issue.4
, pp. 523-544
-
-
Dalvi, N.1
Suciu, D.2
-
23
-
-
33746868385
-
Correlation clustering in general weighted graphs
-
DOI 10.1016/j.tcs.2006.05.008, PII S0304397506003227
-
E.D. Demaine D. Emanuel A. Fiat N. Immorlica 2006 Correlation clustering in general weighted graphs Theor. Comput. Sci. 361 2 172 187 1099.68074 10.1016/j.tcs.2006.05.008 2252576 (Pubitemid 44189103)
-
(2006)
Theoretical Computer Science
, vol.361
, Issue.2-3
, pp. 172-187
-
-
Demaine, E.D.1
Emanuel, D.2
Fiat, A.3
Immorlica, N.4
-
24
-
-
85011051649
-
Data integration with uncertainty
-
Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pp. 687-698 (2007)
-
(2007)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB)
, pp. 687-698
-
-
Dong, X.L.1
Halevy, A.Y.2
Yu, C.3
-
26
-
-
84947399464
-
A theory for record linkage
-
10.2307/2286061
-
I.P. Fellegi A.B. Sunter 1969 A theory for record linkage J. Am. Stat. Assoc. 64 328 1183 1210 10.2307/2286061
-
(1969)
J. Am. Stat. Assoc.
, vol.64
, Issue.328
, pp. 1183-1210
-
-
Fellegi, I.P.1
Sunter, A.B.2
-
27
-
-
84906283185
-
Graph clustering and minimum cut trees
-
1098.68095 2119992
-
G.W. Flake R.E. Tarjan K. Tsioutsiouliklis 2004 Graph clustering and minimum cut trees Internet Math. 1 4 385 408 1098.68095 2119992
-
(2004)
Internet Math.
, vol.1
, Issue.4
, pp. 385-408
-
-
Flake, G.W.1
Tarjan, R.E.2
Tsioutsiouliklis, K.3
-
28
-
-
84944318804
-
Approximate string joins in a database (Almost) for free
-
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (Almost) for free. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pp. 491-500 (2001)
-
(2001)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB)
, pp. 491-500
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
30
-
-
0007508819
-
Algorithms on strings, trees, and sequences
-
Cambridge University Press, Cambridge
-
Gusfield, D.: Algorithms on strings, trees, and sequences. In: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
-
(1997)
Computer Science and Computational Biology
-
-
Gusfield, D.1
-
32
-
-
72649086387
-
Framework for evaluating clustering algorithms in duplicate detection
-
Hassanzadeh, O., Chiang, F., Lee, H.C., Miller, R.J.: Framework for evaluating clustering algorithms in duplicate detection. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), 2009
-
(2009)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB)
-
-
Hassanzadeh, O.1
Chiang, F.2
Lee, H.C.3
Miller, R.J.4
-
33
-
-
63149183776
-
Accuracy of approximate string joins using grams
-
Vienna
-
Hassanzadeh, O., Sadoghi, M., Miller, R.J.: Accuracy of approximate string joins using grams. In: Proc. of the International Workshop on Quality in Databases (QDB), pp. 11-18, Vienna (2007)
-
(2007)
Proc. of the International Workshop on Quality in Databases (QDB)
, pp. 11-18
-
-
Hassanzadeh, O.1
Sadoghi, M.2
Miller, R.J.3
-
34
-
-
0003067623
-
Scalable techniques for clustering the web
-
Dallas
-
Haveliwala, T.H., Gionis, A., Indyk, P.: Scalable techniques for clustering the web. In: Proc. of the Int'l Workshop on the Web and Databases (WebDB), pp. 129-134, Dallas (2000)
-
(2000)
Proc. of the Int'l Workshop on the Web and Databases (WebDB)
, pp. 129-134
-
-
Haveliwala, T.H.1
Gionis, A.2
Indyk, P.3
-
35
-
-
0013331361
-
Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem
-
DOI 10.1023/A:1009761603038
-
M.A. Hernández S.J. Stolfo 1998 Real-world data is dirty: data cleansing and the merge/purge problem Data Min. Know. Discov. 2 1 9 37 10.1023/A:1009761603038 (Pubitemid 128063205)
-
(1998)
Data Mining and Knowledge Discovery
, vol.2
, Issue.1
, pp. 9-38
-
-
Hernandez, M.A.1
Stolfo, S.J.2
-
36
-
-
0030646261
-
Locality-preserving hashing in multidimensional spaces
-
Indyk, P., Motwani, R., Raghavan, P., Vempala, S.: Locality-preserving hashing in multidimensional spaces. In: ACM Symp. on Theory of Computing (STOC), pp. 618-625 (1997)
-
(1997)
ACM Symp. on Theory of Computing (STOC)
, pp. 618-625
-
-
Indyk, P.1
Motwani, R.2
Raghavan, P.3
Vempala, S.4
-
37
-
-
0026979939
-
Techniques for automatically correcting words in text
-
DOI 10.1145/146370.146380
-
K. Kukich 1992 Techniques for automatically correcting words in text ACM Comput. Surv. 24 4 377 439 10.1145/146370.146380 (Pubitemid 23687641)
-
(1992)
ACM Computing Surveys
, vol.24
, Issue.4
, pp. 377-439
-
-
Kukich Karen1
-
38
-
-
85011032600
-
VGRAM: Improving performance of approximate queries on string collections using variable-length grams
-
Vienna
-
Li, C., Wang, B., Yang, X.: VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pp. 303-314, Vienna (2007)
-
(2007)
Proc. of the Int'l Conf. on Very Large Data Bases (VLDB)
, pp. 303-314
-
-
Li, C.1
Wang, B.2
Yang, X.3
-
39
-
-
0034592784
-
Efficient clustering of high-dimensional data sets with application to reference matching
-
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proc. of the Int'l Conf. on Knowledge Discovery & Data Mining, pp. 169-178 (2000)
-
(2000)
Proc. of the Int'l Conf. on Knowledge Discovery & Data Mining
, pp. 169-178
-
-
McCallum, A.1
Nigam, K.2
Ungar, L.H.3
-
40
-
-
85009259903
-
A hidden Markov model information retrieval system
-
Miller, D.R.H., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214-221 (1999)
-
(1999)
ACM SIGIR Conference on Research and Development in Information Retrieval
, pp. 214-221
-
-
Miller, D.R.H.1
Leek, T.2
Schwartz, R.M.3
-
42
-
-
0002490026
-
Data cleaning: Problems and current approaches
-
E. Rahm H. Hai Do 2000 Data cleaning: problems and current approaches IEEE Data Eng. Bull. 23 4 3 13
-
(2000)
IEEE Data Eng. Bull.
, vol.23
, Issue.4
, pp. 3-13
-
-
Rahm, E.1
Hai Do, H.2
-
43
-
-
34548714632
-
Efficient Top-k query evaluation on probabilistic data
-
Re, C., Dalvi, N., Suciu, D.: Efficient Top-k query evaluation on probabilistic data. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 886-895 (2007)
-
(2007)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 886-895
-
-
Re, C.1
Dalvi, .2
Suciu, D.3
-
44
-
-
8844253324
-
Understanding inverse document frequency: On theoretical arguments for IDF
-
DOI 10.1108/00220410410560582
-
S. Robertson 2004 Understanding inverse document frequency: on theoretical arguments for IDF J. Doc. 60 5 503 520 10.1108/00220410410560582 (Pubitemid 39538229)
-
(2004)
Journal of Documentation
, vol.60
, Issue.5
, pp. 503-520
-
-
Robertson, S.1
-
46
-
-
49549112222
-
Representing tuple and attribute uncertainty in probabilistic databases
-
Sen, P., Deshpande, A., Getoor, L.: Representing tuple and attribute uncertainty in probabilistic databases. In: ICDM Workshops, pp. 507-512 (2007)
-
(2007)
ICDM Workshops
, pp. 507-512
-
-
Sen, P.1
Deshpande, A.2
Getoor, L.3
-
48
-
-
34548724406
-
Top-k query processing in uncertain databases
-
Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 896-905 (2007)
-
(2007)
IEEE Proc. of the Int'l Conf. on Data Eng.
, pp. 896-905
-
-
Soliman, M.A.1
Ilyas, I.F.2
Chang, K.C.3
-
49
-
-
51149112283
-
Probabilistic top-k and ranking-aggregate queries
-
10.1145/1386118.1386119
-
M.A. Soliman I.F. Ilyas K.C. Chang 2008 Probabilistic top-k and ranking-aggregate queries ACM Trans. Database Syst. (TODS) 33 3 1 54 10.1145/1386118.1386119
-
(2008)
ACM Trans. Database Syst. (TODS)
, vol.33
, Issue.3
, pp. 1-54
-
-
Soliman, M.A.1
Ilyas, I.F.2
Chang, K.C.3
|