-
1
-
-
0001882616
-
Fast algorithms for mining association rules
-
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th VLDB, pages 487-499, 1994.
-
(1994)
Proc. 20th VLDB
, pp. 487-499
-
-
Agrawal, R.1
Srikant, R.2
-
2
-
-
35348814715
-
-
Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the DUST: different URLs with similar text. Technical Report CCIT Report #601, Dept. Electrical Engineering, Technion, 2006.
-
Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the DUST: different URLs with similar text. Technical Report CCIT Report #601, Dept. Electrical Engineering, Technion, 2006.
-
-
-
-
3
-
-
0033297070
-
Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content
-
K. Bharat and A. Z. Broder. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content. Computer Networks, 31(11-16):1579-1590, 1999.
-
(1999)
Computer Networks
, vol.31
, Issue.11-16
, pp. 1579-1590
-
-
Bharat, K.1
Broder, A.Z.2
-
4
-
-
0742329413
-
A comparison of techniques to find mirrored hosts on the WWW
-
K. Bharat, A. Z. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. IEEE Data Engin. Bull., 23(4):21-26, 2000.
-
(2000)
IEEE Data Engin. Bull
, vol.23
, Issue.4
, pp. 21-26
-
-
Bharat, K.1
Broder, A.Z.2
Dean, J.3
Henzinger, M.R.4
-
6
-
-
84976810280
-
Copy Detection Mechanisms for Digital Documents
-
S. Brin, J. Davis, and H. Garcia-Molina. Copy Detection Mechanisms for Digital Documents. In Proc. 14th SIGMOD, pages 398-409, 1995.
-
(1995)
Proc. 14th SIGMOD
, pp. 398-409
-
-
Brin, S.1
Davis, J.2
Garcia-Molina, H.3
-
9
-
-
35348901106
-
Detecting Near-replicas on the Web by Content and Hyperlink Analysis
-
E. Di Iorio, M. Diligenti, M. Gori, M. Maggini, and A. Pucci. Detecting Near-replicas on the Web by Content and Hyperlink Analysis. In Proc. 11th WWW, 2003.
-
(2003)
Proc. 11th WWW
-
-
Di Iorio, E.1
Diligenti, M.2
Gori, M.3
Maggini, M.4
Pucci, A.5
-
11
-
-
0030394823
-
-
H. Garcia-Molina, L. Gravano, and N. Shivakumar. dscam: Finding document copies across multiple databases. In Proc. 4th PDIS, pages 68-79, 1996.
-
H. Garcia-Molina, L. Gravano, and N. Shivakumar. dscam: Finding document copies across multiple databases. In Proc. 4th PDIS, pages 68-79, 1996.
-
-
-
-
13
-
-
35348817110
-
-
Google Inc
-
Google Inc. Google sitemaps. http://sitemaps.google.com.
-
Google sitemaps
-
-
-
15
-
-
0037319544
-
Methods for identifying versioned and plagiarized documents
-
T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. J. Amer. Soc. Infor. Sci. Tech., 54(3):203-215, 2003.
-
(2003)
J. Amer. Soc. Infor. Sci. Tech
, vol.54
, Issue.3
, pp. 203-215
-
-
Hoad, T.C.1
Zobel, J.2
-
16
-
-
35348902310
-
Using bloom filters to refine web search results
-
N. Jain, M. Dahlin, and R. Tewari. Using bloom filters to refine web search results. In Proc. 7th WebDB, pages 25-30, 2005.
-
(2005)
Proc. 7th WebDB
, pp. 25-30
-
-
Jain, N.1
Dahlin, M.2
Tewari, R.3
-
17
-
-
38949210729
-
Aliasing on the world wide web: Prevalence and performance implications
-
T. Kelly and J. C. Mogul. Aliasing on the world wide web: prevalence and performance implications. In Proc. 11th WWW, pages 281-292, 2002.
-
(2002)
Proc. 11th WWW
, pp. 281-292
-
-
Kelly, T.1
Mogul, J.C.2
-
18
-
-
33745966135
-
Reliable evaluations of URL normalization
-
S. J. Kim, H. S. Jeong, and S. H. Lee. Reliable evaluations of URL normalization. In Proc. 4th ICCSA, pages 609-617, 2006.
-
(2006)
Proc. 4th ICCSA
, pp. 609-617
-
-
Kim, S.J.1
Jeong, H.S.2
Lee, S.H.3
-
20
-
-
34247390104
-
Evaluation of crawling policies for a web-repository crawler
-
F. McCown and M. L. Nelson. Evaluation of crawling policies for a web-repository crawler. In Proc. 17th HYPERTEXT, pages 157-168, 2006.
-
(2006)
Proc. 17th HYPERTEXT
, pp. 157-168
-
-
McCown, F.1
Nelson, M.L.2
-
21
-
-
34250618783
-
Do not crawl in the DUST: Different URLs with similar text
-
U. Schonfeld, Z. Bar-Yossef, and I. Keidar. Do not crawl in the DUST: different URLs with similar text. In Proc. 15th WWW, pages 1015-1016, 2006.
-
(2006)
Proc. 15th WWW
, pp. 1015-1016
-
-
Schonfeld, U.1
Bar-Yossef, Z.2
Keidar, I.3
-
22
-
-
84956971810
-
Finding Near-Replicas of Documents and Servers on the Web
-
N. Shivakumar and H. Garcia-Molina. Finding Near-Replicas of Documents and Servers on the Web. In Proc. 1st WebDB, pages 204-212, 1998.
-
(1998)
Proc. 1st WebDB
, pp. 204-212
-
-
Shivakumar, N.1
Garcia-Molina, H.2
|