-
3
-
-
0034288398
-
A comparison of techniques to find mirrored hosts on the WWW
-
Bharat K, Broder AZ, Dean J, Henzinger MR (2000) A comparison of techniques to find mirrored hosts on the WWW. J Am Soc Inf Sci (JASIS) 51(12):1114-1122
-
(2000)
J Am Soc Inf Sci (JASIS)
, vol.51
, Issue.12
, pp. 1114-1122
-
-
Bharat, K.1
Broder, A.Z.2
Dean, J.3
Henzinger, M.R.4
-
7
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
Chowdhury A, Frieder O, Grossman D, McCabe MC (2002) Collection statistics for fast duplicate document detection. ACM Trans Inf Syst 20(2):171-191
-
(2002)
ACM Trans Inf Syst
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
10
-
-
0001939946
-
Heavy-tailed probability distributions in the World Wide Web
-
In: Adler R, Feldman R, Taqqu M (eds.) Birkhauser, Boston
-
Crovella ME, Taqqu MS, Bestavros A (1998) Heavy-tailed probability distributions in the World Wide Web. In: Adler R, Feldman R, Taqqu M (eds.) A practical guide to heavy tails: Statistical techniques and applications. Birkhauser, Boston, pp 3-25
-
(1998)
A Practical Guide to Heavy Tails: Statistical Techniques and Applications
, pp. 3-25
-
-
Crovella, M.E.1
Taqqu, M.S.2
Bestavros, A.3
-
13
-
-
77954428596
-
Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages
-
Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics: using statistical analysis to locate spam Web pages. In: Proceedings of the 7th international workshop on the Web and databases (WebDB), pp 1-6
-
(2004)
Proceedings of the 7th International Workshop on the Web and Databases (WebDB)
, pp. 1-6
-
-
Fetterly, D.1
Manasse, M.2
Najork, M.3
-
17
-
-
28044431810
-
Web data extraction based on structural similarity
-
Li Z, Ng WK, Sun A (2005) Web data extraction based on structural similarity. Knowl Inf Syst 8(4):438-461
-
(2005)
Knowl Inf Syst
, vol.8
, Issue.4
, pp. 438-461
-
-
Li, Z.1
Ng, W.K.2
Sun, A.3
-
18
-
-
38649093187
-
Discovering and analyzing World Wide Web collections
-
Mukherjea1 S (2004) Discovering and analyzing World Wide Web collections. Knowl Inf Syst 6(2):230-241
-
(2004)
Knowl Inf Syst
, vol.6
, Issue.2
, pp. 230-241
-
-
Mukherjea1, S.1
-
19
-
-
0003676885
-
-
Technical Report tr-15-81, Center for Research in Computing Technology, Harvard University
-
Rabin M (1981) Fingerprinting by random polynomials, Technical report tr-15-81, Center for Research in Computing Technology, Harvard University
-
(1981)
Fingerprinting By Random Polynomials
-
-
Rabin, M.1
-
21
-
-
0344892160
-
Do TREC Web collections look like the Web?
-
Soboroff I (2002) Do TREC Web collections look like the Web? SIGIR Forum 36(2):23-31
-
(2002)
SIGIR Forum
, vol.36
, Issue.2
, pp. 23-31
-
-
Soboroff, I.1
-
26
-
-
24944501423
-
Generative model-based document clustering: A comparative study
-
Zhong S, Ghosh J (2005) Generative model-based document clustering: A comparative study. Knowl Inf Syst 8(3):374-384
-
(2005)
Knowl Inf Syst
, vol.8
, Issue.3
, pp. 374-384
-
-
Zhong, S.1
Ghosh, J.2
|