-
1
-
-
84880240041
-
Searching the web
-
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the web. ACM Transactions on Internet Technology, 1(1):2-43, 2001.
-
(2001)
ACM Transactions on Internet Technology
, vol.1
, Issue.1
, pp. 2-43
-
-
Arasu, A.1
Cho, J.2
Garcia-Molina, H.3
Paepcke, A.4
Raghavan, S.5
-
5
-
-
0029222025
-
On finding duplication and near-duplication in large software systems
-
B. S. Baker. On finding duplication and near-duplication in large software systems. In Proc. 2nd Working Conference on Reverse Engineering, page 86, 1995.
-
(1995)
Proc. 2nd Working Conference on Reverse Engineering
, pp. 86
-
-
Baker, B.S.1
-
6
-
-
0033297070
-
Mirror, mirror on the Web: A study of hst pairs with replicated content
-
K. Bharat and A. Broder. Mirror, mirror on the Web: A study of hst pairs with replicated content. In Proc. 8th International Conference on World Wide Web (WWW 1999), pages 1579-1590, 1999.
-
(1999)
Proc. 8th International Conference on World Wide Web
, pp. 1579-1590
-
-
Bharat, K.1
Broder, A.2
-
7
-
-
0034288398
-
A comparison of techniques to find mirrored hosts on the WWW
-
Aug
-
K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. J American Society for Information Science, 51(12):1114-1122, Aug. 2000.
-
(2000)
J American Society for Information Science
, vol.51
, Issue.12
, pp. 1114-1122
-
-
Bharat, K.1
Broder, A.2
Dean, J.3
Henzinger, M.R.4
-
9
-
-
0038589165
-
The anatomy of a large-scale hypertextual Web search engine
-
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7): 107-117, 1998.
-
(1998)
Computer Networks and ISDN Systems
, vol.30
, Issue.1-7
, pp. 107-117
-
-
Brin, S.1
Page, L.2
-
11
-
-
0034227695
-
Improved bounds for dictionary look-up with one error
-
G. S. Brodai and S. Venkatesh. Improved bounds for dictionary look-up with one error. Information Processing Letters, 75(1-2):57-59, 2000.
-
(2000)
Information Processing Letters
, vol.75
, Issue.1-2
, pp. 57-59
-
-
Brodai, G.S.1
Venkatesh, S.2
-
12
-
-
35348864078
-
-
A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, 1998.
-
A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, 1998.
-
-
-
-
13
-
-
0031620041
-
Min-wise independent permutations
-
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. 30th Annual Symposium on Theory of Computing (STOC 1998), pages 327-336, 1998.
-
(1998)
Proc. 30th Annual Symposium on Theory of Computing (STOC 1998)
, pp. 327-336
-
-
Broder, A.1
Charikar, M.2
Frieze, A.3
Mitzenmacher, M.4
-
14
-
-
0010362121
-
Syntactic clustering of the web
-
A. Broder, S. C. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.1
Glassman, S.C.2
Manasse, M.3
Zweig, G.4
-
18
-
-
0033688075
-
Selectivity estimation for boolean queries
-
Z. Chen, F. Korn, N. Koudas, and S. Muthukrishnan. Selectivity estimation for boolean queries. In Proc. PODS 2000, pages 216-225, 2000.
-
(2000)
Proc. PODS 2000
, pp. 216-225
-
-
Chen, Z.1
Korn, F.2
Koudas, N.3
Muthukrishnan, S.4
-
19
-
-
20444396637
-
Efficient crawling through URL ordering
-
J. Cho, H. García-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1-7): 161-172, 1998.
-
(1998)
Computer Networks and ISDN Systems
, vol.30
, Issue.1-7
, pp. 161-172
-
-
Cho, J.1
García-Molina, H.2
Page, L.3
-
20
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. A CM Transactions on Information Systems, 20(2): 171-191, 2002.
-
(2002)
A CM Transactions on Information Systems
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
21
-
-
0033877655
-
Finding interesting associations without support pruning
-
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. In Proc. 16th Intl. Conf. on Data Engineering (ICDE 2000), pages 489-499, 2000.
-
(2000)
Proc. 16th Intl. Conf. on Data Engineering (ICDE 2000)
, pp. 489-499
-
-
Cohen, E.1
Datar, M.2
Fujiwara, S.3
Gionis, A.4
Indyk, P.5
Motwani, R.6
Ullman, J.D.7
Yang, C.8
-
22
-
-
8644227073
-
Constructing a text corpus for inexact duplicate detection
-
July
-
J. G. Conrad and C. P. Schriber. Constructing a text corpus for inexact duplicate detection. In SIGIR 2004, pages 582-583, July 2004.
-
(2004)
SIGIR 2004
, pp. 582-583
-
-
Conrad, J.G.1
Schriber, C.P.2
-
25
-
-
0033293618
-
Finding related pages in the World Wide Web
-
J. Dean and M. Henzinger. Finding related pages in the World Wide Web. Computer Networks, 31(11-16):1467-1479, 1999.
-
(1999)
Computer Networks
, vol.31
, Issue.11-16
, pp. 1467-1479
-
-
Dean, J.1
Henzinger, M.2
-
26
-
-
84989525001
-
Indexing by latent semantic analysis
-
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. J American Society for Information Science, 41(6):391-407, 1990.
-
(1990)
J American Society for Information Science
, vol.41
, Issue.6
, pp. 391-407
-
-
Deerwester, S.1
Dumais, S.T.2
Furnas, G.W.3
Landauer, T.K.4
Harshman, R.5
-
27
-
-
70350672544
-
Focused crawling using context graphs
-
sep
-
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In 26th International Conference on Very Large Databases, (VLDB 2000), pages 527-534, sep 2000.
-
(2000)
26th International Conference on Very Large Databases, (VLDB 2000)
, pp. 527-534
-
-
Diligenti, M.1
Coetzee, F.2
Lawrence, S.3
Giles, C.L.4
Gori, M.5
-
28
-
-
0028257898
-
Neighborhood preserving hashing and approximate queries
-
D. Dolev, Y. Harari, M. Linial, N. Nisan, and M. Parnas. Neighborhood preserving hashing and approximate queries. In Proc. 5th A CM Symposium on Discrete Algorithms (SODA 1994), 1994.
-
(1994)
Proc. 5th A CM Symposium on Discrete Algorithms (SODA 1994)
-
-
Dolev, D.1
Harari, Y.2
Linial, M.3
Nisan, N.4
Parnas, M.5
-
32
-
-
13844267502
-
Efficient phrase-based document indexing for web document clustering
-
Aug
-
K. M. Hammouda and M. S. Kamel. Efficient phrase-based document indexing for web document clustering. IEEE Transactions on Knowledge and Data Engineering, 16(10):1279-1296, Aug. 2004.
-
(2004)
IEEE Transactions on Knowledge and Data Engineering
, vol.16
, Issue.10
, pp. 1279-1296
-
-
Hammouda, K.M.1
Kamel, M.S.2
-
34
-
-
77953112255
-
Evaluating strategies for similarity search on the Web
-
May
-
T. H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating strategies for similarity search on the Web. In Proc. 11th International World Wide Web Conference, pages 432-442, May 2002.
-
(2002)
Proc. 11th International World Wide Web Conference
, pp. 432-442
-
-
Haveliwala, T.H.1
Gionis, A.2
Klein, D.3
Indyk, P.4
-
35
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR. 2006, pages 284-291, 2006.
-
(2006)
SIGIR. 2006
, pp. 284-291
-
-
Henzinger, M.R.1
-
37
-
-
84938015047
-
A method for the construction of minimum-redundancy codes
-
Sept
-
D. A. Huffman. A method for the construction of minimum-redundancy codes. In Proc. Institute of Radio Engineering, volume 40, pages 1098-1102, Sept. 1952.
-
(1952)
Proc. Institute of Radio Engineering
, vol.40
, pp. 1098-1102
-
-
Huffman, D.A.1
-
38
-
-
2442561063
-
-
S. Joshi, N. Agrawal, R,. Krishnapuram, and S. Negi. A bag of paths model for measuring structural similarity in Web documents. In Proc. 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD 2003), pages 577-582, 2003.
-
S. Joshi, N. Agrawal, R,. Krishnapuram, and S. Negi. A bag of paths model for measuring structural similarity in Web documents. In Proc. 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD 2003), pages 577-582, 2003.
-
-
-
-
39
-
-
4243148480
-
Authoritative sources in a hyperlinked environment
-
Sept
-
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, Sept. 1999.
-
(1999)
Journal of the ACM
, vol.46
, Issue.5
, pp. 604-632
-
-
Kleinberg, J.M.1
-
40
-
-
12244261882
-
Improved robustness of signature-based near-replica detection via lexicon randomization
-
Aug
-
A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In SIGKDD 2004, pages 605-610, Aug. 2004.
-
(2004)
SIGKDD 2004
, pp. 605-610
-
-
Kolcz, A.1
Chowdhury, A.2
Alspector, J.3
-
41
-
-
0033297068
-
Trawling the Web for emerging cyber-communities
-
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks: The Intl. J of Computer and Telecommunications Networks, 31:1481-1493, 1999.
-
(1999)
Computer Networks: The Intl. J of Computer and Telecommunications Networks
, vol.31
, pp. 1481-1493
-
-
Kumar, R.1
Raghavan, P.2
Rajagopalan, S.3
Tomkins, A.4
-
42
-
-
85043988965
-
Finding similar files in a large file system
-
Jan
-
U. Manber. Finding similar files in a large file system. In Proc. 1994 USENIX Conference, pages 1-10, Jan. 1994.
-
(1994)
Proc. 1994 USENIX Conference
, pp. 1-10
-
-
Manber, U.1
-
43
-
-
0034794539
-
-
F. Menczer, G. Pant, P. Srinivasan, and M. E. Ruiz. Evaluating topic-driven web crawlers. In Proc. 24th Annual International ACM SIGIR Conference On Research and Development in Information Retrieval, pages 241-249, 2001.
-
F. Menczer, G. Pant, P. Srinivasan, and M. E. Ruiz. Evaluating topic-driven web crawlers. In Proc. 24th Annual International ACM SIGIR Conference On Research and Development in Information Retrieval, pages 241-249, 2001.
-
-
-
-
46
-
-
33745753308
-
User-centric web crawling
-
S. Pandey and C. Olston. User-centric web crawling. In Proc. 'WWW 2005, pages 401-411, 2005.
-
(2005)
, pp. 401-411
-
-
Pandey, S.1
Olston, C.2
-
47
-
-
35348920411
-
-
W. Pugh and M. R. Henzinger. Detecting duplicate and near-duplicate files. United States Patent 6,658,423, granted on Dec 2, 2003, 2003.
-
W. Pugh and M. R. Henzinger. Detecting duplicate and near-duplicate files. United States Patent 6,658,423, granted on Dec 2, 2003, 2003.
-
-
-
-
49
-
-
0003676885
-
Fingerprinting by random polynomials
-
TR.-15-81, Center for Research in Computing Techonlogy, Harvard University
-
M. O. Rabin. Fingerprinting by random polynomials. Technical Report Report TR.-15-81, Center for Research in Computing Techonlogy, Harvard University, 1981.
-
(1981)
Technical Report Report
-
-
Rabin, M.O.1
-
50
-
-
1142267351
-
Winnowing: Local algorithms for document fingerprinting
-
June
-
S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: Local algorithms for document fingerprinting. In Proc. SIGMOD 2003, pages 76-85, June 2003.
-
(2003)
Proc. SIGMOD 2003
, pp. 76-85
-
-
Schleimer, S.1
Wilkerson, D.S.2
Aiken, A.3
-
53
-
-
0012726646
-
Dictionary look-up with one error
-
A. C. Yao and F. F. Yao. Dictionary look-up with one error. J of Algorithms, 25(1):194-202, 1997.
-
(1997)
J of Algorithms
, vol.25
, Issue.1
, pp. 194-202
-
-
Yao, A.C.1
Yao, F.F.2
|