-
1
-
-
0031346696
-
-
A.Z. Broder, On the resemblance and containment of documents, in: Compression and Complexity of Sequences (SEQUENCES'97)', 1997, pp. 21-29.
-
-
-
-
2
-
-
33747183229
-
-
M. Sanderson, Duplicate detection in the Reuters collection, Technical Report TR-1997-5, University of Glasgow, 1997.
-
-
-
-
3
-
-
33747160317
-
-
N. Shivakumar, H. García-Molina, SCAM: a copy detection mechanism for digital documents, in: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995.
-
-
-
-
5
-
-
85043988965
-
-
U. Manber, Finding similar files in a large file system, in: Proceedings of the USENIX Winter 1994 Technical Conference, 1994, pp. 1-10.
-
-
-
-
6
-
-
85198675139
-
-
S. Brin, J. Davis, H. García-Molina, Copy detection mechanisms for digital documents, in: Proceedings of the ACM SIGMOD Annual Conference, 1995, pp. 398-409.
-
-
-
-
7
-
-
33747175257
-
-
N. Heintze, Scalable document fingerprinting, in: 1996 USENIX Workshop on Electronic Commerce, 1996.
-
-
-
-
8
-
-
0010362121
-
Syntactic clustering of the Web
-
Broder A.Z., Glassman S.C., Manasse M.S., and Zweig G. Syntactic clustering of the Web. Computer Networks and ISDN Systems 29 8-13 (1997) 1157-1166
-
(1997)
Computer Networks and ISDN Systems
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
9
-
-
0003756969
-
-
Morgan Kauffman, Los Altos, CA
-
Witten I.H., Moffat A., and Bell T.C. Managing Gigabytes: Compressing and Indexing Documents and Images (1999), Morgan Kauffman, Los Altos, CA
-
(1999)
Managing Gigabytes: Compressing and Indexing Documents and Images
-
-
Witten, I.H.1
Moffat, A.2
Bell, T.C.3
-
11
-
-
0002586462
-
Computing iceberg queries efficiently
-
Morgan Kaufmann, Los Altos, CA
-
Fang M., Shivakumar N., Garcia-Molina H., Motwani R., and Ullman J.D. Computing iceberg queries efficiently. Proceedings of the 24th International Conference on Very Large Data Bases. Morgan Kaufmann, Los Altos, CA (1998) 299-310
-
(1998)
Proceedings of the 24th International Conference on Very Large Data Bases
, pp. 299-310
-
-
Fang, M.1
Shivakumar, N.2
Garcia-Molina, H.3
Motwani, R.4
Ullman, J.D.5
-
12
-
-
33747202054
-
-
N. Shivakumar, H. García-Molina, Finding near-replicas of documents on the web, in: WEBDB, International Workshop on the World Wide Web and Databases, WebDB, Springer, Berlin, 1999.
-
-
-
-
13
-
-
0000523223
-
Compression and explanation using hierarchical grammars
-
Nevill-Manning C.G., and Witten I.H. Compression and explanation using hierarchical grammars. The Computer Journal 40 2/3 (1997) 103-116
-
(1997)
The Computer Journal
, vol.40
, Issue.2-3
, pp. 103-116
-
-
Nevill-Manning, C.G.1
Witten, I.H.2
-
14
-
-
19944392360
-
Offline dictionary-based compression
-
Larsson N.J., and Moffat A. Offline dictionary-based compression. Proc. IEEE 88 11 (2000) 1722-1732
-
(2000)
Proc. IEEE
, vol.88
, Issue.11
, pp. 1722-1732
-
-
Larsson, N.J.1
Moffat, A.2
-
20
-
-
33747150903
-
-
R. Rivest, The MD5 Message-Digest Algorithm, RFC 1321, 1992.
-
-
-
-
21
-
-
0000526256
-
Overview of the second text retrieval conference (TREC-2)
-
Harman D. Overview of the second text retrieval conference (TREC-2). Information Processing and Management 31 3 (1995) 271-289
-
(1995)
Information Processing and Management
, vol.31
, Issue.3
, pp. 271-289
-
-
Harman, D.1
-
22
-
-
0005540823
-
-
Addison-Wesley, Longman, Reading, MA, New York
-
Baeza-Yates R., and Ribeiro-Neto B. Modern Information Retrieval (1999), Addison-Wesley, Longman, Reading, MA, New York
-
(1999)
Modern Information Retrieval
-
-
Baeza-Yates, R.1
Ribeiro-Neto, B.2
-
23
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
Chowdhury A., Frieder O., Grossman D., and McCabe M.C. Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems (TOIS) 20 2 (2002) 171-191
-
(2002)
ACM Transactions on Information Systems (TOIS)
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
|