-
1
-
-
77953036485
-
-
Robotcop. www.robotcop.org, 2002.
-
(2002)
-
-
-
2
-
-
15844418414
-
-
HT://Dig. http://www.htdig.org/, 2004. GPL software.
-
(2004)
GPL Software
-
-
-
4
-
-
15844418414
-
-
S. Ailleret. Larbin. http://larbin.sourceforge.net/index-eng.html, 2004. GPL software.
-
(2004)
GPL Software
-
-
Ailleret, S.1
-
5
-
-
84880240041
-
Searching the Web
-
August
-
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the Web. ACM Transactions on Internet Technology (TOIT), 1(1):2-43, August 2001.
-
(2001)
ACM Transactions on Internet Technology (TOIT)
, vol.1
, Issue.1
, pp. 2-43
-
-
Arasu, A.1
Cho, J.2
Garcia-Molina, H.3
Paepcke, A.4
Raghavan, S.5
-
6
-
-
15844420628
-
Balancing volume, quality and freshness in web crawling
-
Santiago, Chile, IOS Press Amsterdam
-
R. Baeza-Yates and C. Castillo. Balancing volume, quality and freshness in web crawling. In Soft Computing Systems - Design, Management and Applications, pages 565-572, Santiago, Chile, 2002. IOS Press Amsterdam.
-
(2002)
Soft Computing Systems - Design, Management and Applications
, pp. 565-572
-
-
Baeza-Yates, R.1
Castillo, C.2
-
7
-
-
35048834240
-
Crawling the infinite Web: Five levels are enough
-
Proceedings of the third Workshop on Web Graphs (WAW), Rome, Italy, October Springer
-
R. Baeza-Yates and C. Castillo. Crawling the infinite Web: five levels are enough. In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science, pages 156-167, Rome, Italy, October 2004. Springer.
-
(2004)
Lecture Notes in Computer Science
, vol.3243
, pp. 156-167
-
-
Baeza-Yates, R.1
Castillo, C.2
-
9
-
-
84958778546
-
Web structure, dynamics and page quality
-
Proceedings of String Processing and Information Retrieval (SPIRE), Lisbon, Portugal, Springer
-
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web structure, dynamics and page quality. In Proceedings of String Processing and Information Retrieval (SPIRE), volume 2476 of Lecture Notes in Computer Science, pages 117 - 132, Lisbon, Portugal, 2002. Springer.
-
(2002)
Lecture Notes in Computer Science
, vol.2476
, pp. 117-132
-
-
Baeza-Yates, R.1
Saint-Jean, F.2
Castillo, C.3
-
10
-
-
3042680184
-
UbiCrawler: A scalable fully distributed Web crawler
-
P. Boldi, B. Codenotti, M. Santini, and S. Vigna. UbiCrawler: a scalable fully distributed Web crawler. Software, Practice and Experience, 34(8):711-726, 2004.
-
(2004)
Software, Practice and Experience
, vol.34
, Issue.8
, pp. 711-726
-
-
Boldi, P.1
Codenotti, B.2
Santini, M.3
Vigna, S.4
-
11
-
-
35048887031
-
Do your worst to make the best: Paradoxical effects in pagerank incremental computations
-
Proceedings of the third Workshop on Web Graphs (WAW), Rome, Italy, October Springer
-
P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: Paradoxical effects in pagerank incremental computations. In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science, pages 168-180, Rome, Italy, October 2004. Springer.
-
(2004)
Lecture Notes in Computer Science
, vol.3243
, pp. 168-180
-
-
Boldi, P.1
Santini, M.2
Vigna, S.3
-
12
-
-
2442529470
-
Crawler-friendly web servers
-
O. Brandman, J. Cho, H. Garcia-Molina, and N. Shivakumar. Crawler-friendly web servers. In Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS), Santa Clara, California, USA, June 2000.
-
Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS), Santa Clara, California, USA, June 2000
-
-
Brandman, O.1
Cho, J.2
Garcia-Molina, H.3
Shivakumar, N.4
-
13
-
-
0038589165
-
The anatomy of a large-scale hypertextual Web search engine
-
April
-
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117, April 1998.
-
(1998)
Computer Networks and ISDN Systems
, vol.30
, Issue.1-7
, pp. 107-117
-
-
Brin, S.1
Page, L.2
-
15
-
-
0342652248
-
Crawling towards eternity - Building an archive of the world wide web
-
May
-
M. Burner. Crawling towards eternity - building an archive of the world wide web. Web Techniques, 2(5), May 1997.
-
(1997)
Web Techniques
, vol.2
, Issue.5
-
-
Burner, M.1
-
18
-
-
0033294474
-
Focused crawling: A new approach to topic-specific web resource discovery
-
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. Computer Networks, 31(11-16):1623-1640, 1999.
-
(1999)
Computer Networks
, vol.31
, Issue.11-16
, pp. 1623-1640
-
-
Chakrabarti, S.1
Van Den Berg, M.2
Dom, B.3
-
20
-
-
0041032411
-
Synchronizing a database to improve freshness
-
Dallas, Texas, USA, May
-
J. Cho and H. Garcia-Molina. Synchronizing a database to improve freshness. In Proceedings of ACM International Conference on Management of Data (SIGMOD), pages 117-128, Dallas, Texas, USA, May 2000.
-
(2000)
Proceedings of ACM International Conference on Management of Data (SIGMOD)
, pp. 117-128
-
-
Cho, J.1
Garcia-Molina, H.2
-
21
-
-
67649866504
-
Parallel crawlers
-
Honolulu, Hawaii, USA, May ACM Press
-
J. Cho and H. Garcia-Molina. Parallel crawlers. In Proceedings of the eleventh international conference on World Wide Web, pages 124-135, Honolulu, Hawaii, USA, May 2002. ACM Press.
-
(2002)
Proceedings of the Eleventh International Conference on World Wide Web
, pp. 124-135
-
-
Cho, J.1
Garcia-Molina, H.2
-
24
-
-
33745749111
-
Performance and cost tradeoffs in web search
-
Dunedin, New Zealand, January
-
N. Craswell, F. Crimmins, D. Hawking, and A. Moffat. Performance and cost tradeoffs in web search. In Proceedings of the 15th Australasian Database Conference, pages 161-169, Dunedin, New Zealand, January 2004.
-
(2004)
Proceedings of the 15th Australasian Database Conference
, pp. 161-169
-
-
Craswell, N.1
Crimmins, F.2
Hawking, D.3
Moffat, A.4
-
25
-
-
0034925218
-
Efficient Web searching using temporal factors
-
A. Czumaj, I. Finch, L. Gasieniec, A. Gibbons, P. Leng, W. Rytter, and M. Zito. Efficient Web searching using temporal factors. Theoretical Computer Science, 262(1-2):569-582, 2001.
-
(2001)
Theoretical Computer Science
, vol.262
, Issue.1-2
, pp. 569-582
-
-
Czumaj, A.1
Finch, I.2
Gasieniec, L.3
Gibbons, A.4
Leng, P.5
Rytter, W.6
Zito, M.7
-
26
-
-
34547500958
-
Cobweb - A crawler for the brazilian web
-
Cancun, Mexico, September IEEE CS Press
-
A. S. da Silva, E. A. Veloso, P. B. Golgher, B. A. Ribeiro-Neto, A. H. F. Laender, and N. Ziviani. Cobweb - a crawler for the brazilian web. In Proceedings of String Processing and Information Retrieval (SPIRE), pages 184-191, Cancun, Mexico, September 1999. IEEE CS Press.
-
(1999)
Proceedings of String Processing and Information Retrieval (SPIRE)
, pp. 184-191
-
-
Da Silva, A.S.1
Veloso, E.A.2
Golgher, P.B.3
Ribeiro-Neto, B.A.4
Laender, A.H.F.5
Ziviani, N.6
-
27
-
-
77953035448
-
-
WebBase
-
L. Dacharay. WebBase. http://freesoftware.fsf.org/webbase/, 2002. GPL Software.
-
(2002)
GPL Software
-
-
Dacharay, L.1
-
28
-
-
70350672544
-
Focused crawling using context graphs
-
Cairo, Egypt, September
-
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proceedings of 26th International Conference on Very Large Databases (VLDB), pages 527-534, Cairo, Egypt, September 2000.
-
(2000)
Proceedings of 26th International Conference on Very Large Databases (VLDB)
, pp. 527-534
-
-
Diligenti, M.1
Coetzee, F.2
Lawrence, S.3
Giles, C.L.4
Gori, M.5
-
29
-
-
1242321053
-
Self-similarity in the web
-
S. Dill, R. Kumar, K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Trans. Inter. Tech., 2(3):205-223, 2002.
-
(2002)
ACM Trans. Inter. Tech.
, vol.2
, Issue.3
, pp. 205-223
-
-
Dill, S.1
Kumar, R.2
Mccurley, K.S.3
Rajagopalan, S.4
Sivakumar, D.5
Tomkins, A.6
-
30
-
-
0002371171
-
Optimal robot scheduling for web search engines
-
R. W. Edward G. Coffman, Z. Liu. Optimal robot scheduling for web search engines. Journal of Scheduling, 1(1):15-29, 1998.
-
(1998)
Journal of Scheduling
, vol.1
, Issue.1
, pp. 15-29
-
-
Edward, R.W.1
Coffman, G.2
Liu, Z.3
-
31
-
-
84874252492
-
An adaptive model for optimizing performance of an incremental web crawler
-
Hong Kong, May Elsevier Science
-
J. Edwards, K. S. McCurley, and J. A. Tomlin. An adaptive model for optimizing performance of an incremental web crawler. In Proceedings of the Tenth Conference on World Wide Web, pages 106-113, Hong Kong, May 2001. Elsevier Science.
-
(2001)
Proceedings of the Tenth Conference on World Wide Web
, pp. 106-113
-
-
Edwards, J.1
McCurley, K.S.2
Tomlin, J.A.3
-
34
-
-
27344433890
-
Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages
-
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages. In Proceedings of the seventh workshop on the Web and databases (WebDB), June 2004.
-
Proceedings of the Seventh Workshop on the Web and Databases (WebDB), June 2004
-
-
Fetterly, D.1
Manasse, M.2
Najork, M.3
-
36
-
-
0033705620
-
On near-uniform url sampling
-
Amsterdam, Netherlands, May Elsevier Science
-
M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform url sampling. In Proceedings of the Ninth Conference on World Wide Web, pages 295-308, Amsterdam, Netherlands, May 2000. Elsevier Science.
-
(2000)
Proceedings of the Ninth Conference on World Wide Web
, pp. 295-308
-
-
Henzinger, M.1
Heydon, A.2
Mitzenmacher, M.3
Najork, M.4
-
37
-
-
20444391066
-
The shark-search algorithm. An application: Tailored Web site mapping
-
Elsevier Science, April
-
M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalhaim, and S. Ur. The shark-search algorithm. An application: tailored Web site mapping. In Proceedings of the seventh conference on World Wide Web, pages 317-326. Elsevier Science, April 1998.
-
(1998)
Proceedings of the Seventh Conference on World Wide Web
, pp. 317-326
-
-
Hersovici, M.1
Jacovi, M.2
Maarek, Y.S.3
Pelleg, D.4
Shtalhaim, M.5
Ur, S.6
-
38
-
-
79951675059
-
Mercator: A scalable, extensible web crawler
-
April
-
A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web Conference, 2(4):219-229, April 1999.
-
(1999)
World Wide Web Conference
, vol.2
, Issue.4
, pp. 219-229
-
-
Heydon, A.1
Najork, M.2
-
40
-
-
0040511952
-
Robots in the web: Threat or treat?
-
April
-
M. Koster. Robots in the web: threat or treat ? ConneXions, 9(4), April 1995.
-
(1995)
ConneXions
, vol.9
, Issue.4
-
-
Koster, M.1
-
42
-
-
0000806922
-
Automating the construction of internet portals with machine learning
-
A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127-163, 2000.
-
(2000)
Information Retrieval
, vol.3
, Issue.2
, pp. 127-163
-
-
McCallum, A.K.1
Nigam, K.2
Rennie, J.3
Seymore, K.4
-
43
-
-
0034794539
-
Evaluating topic-driven web crawlers
-
ACM Press
-
F. Menczer, G. Pant, P. Srinivasan, and M. E. Ruiz. Evaluating topic-driven web crawlers. In Proceedings of the 24th conference on research and development in information retrieval (SIGIR), pages 241-249. ACM Press, 2001.
-
(2001)
Proceedings of the 24th Conference on Research and Development in Information Retrieval (SIGIR)
, pp. 241-249
-
-
Menczer, F.1
Pant, G.2
Srinivasan, P.3
Ruiz, M.E.4
-
44
-
-
77953067393
-
Sphinx: A framework for creating personal, site-specific web crawlers
-
Elsevier Science
-
R. Miller and K. Bharat. Sphinx: A framework for creating personal, site-specific web crawlers. In Proceedings of the seventh conference on World Wide Web, Brisbane, Australia, April 1998. Elsevier Science.
-
Proceedings of the Seventh Conference on World Wide Web, Brisbane, Australia, April 1998
-
-
Miller, R.1
Bharat, K.2
-
46
-
-
15844394231
-
What's new on the web?: The evolution of the web from a search engine perspective
-
New York, NY, USA, May ACM Press
-
A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective. In Proceedings of the 13th conference on World Wide Web, pages 1-12, New York, NY, USA, May 2004. ACM Press.
-
(2004)
Proceedings of the 13th Conference on World Wide Web
, pp. 1-12
-
-
Ntoulas, A.1
Cho, J.2
Olston, C.3
-
47
-
-
0003780986
-
-
Technical report, Stanford Digital Library Technologies Project
-
L. Page, S. Brin, R. Motwani, and T. Winograd. The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
-
(1998)
The Pagerank Citation Algorithm: Bringing Order to the Web
-
-
Page, L.1
Brin, S.2
Motwani, R.3
Winograd, T.4
-
48
-
-
35048813582
-
Search engine-crawler symbiosis
-
Proceedings of the European Conference on Digital Libraries (ECDL), Springer, August
-
G. Pant, S. Bradshaw, and F. Menczer. Search engine-crawler symbiosis. In Proceedings of the European Conference on Digital Libraries (ECDL), volume 2769 of Lecture Notes in Computer Science, pages 221-232. Springer, August 2003.
-
(2003)
Lecture Notes in Computer Science
, vol.2769
, pp. 221-232
-
-
Pant, G.1
Bradshaw, S.2
Menczer, F.3
-
51
-
-
0037150740
-
Search engines and web dynamics
-
June
-
K. M. Risvik and R. Michelsen. Search engines and web dynamics. Computer Networks, 39(3), June 2002.
-
(2002)
Computer Networks
, vol.39
, Issue.3
-
-
Risvik, K.M.1
Michelsen, R.2
-
52
-
-
0036204395
-
Design and implementation of a high-performance distributed web crawler
-
San Jose, California, February IEEE CS Press
-
V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proceedings of the 18th International Conference on Data Engineering (ICDE), pages 357-368, San Jose, California, February 2002. IEEE CS Press.
-
(2002)
Proceedings of the 18th International Conference on Data Engineering (ICDE)
, pp. 357-368
-
-
Shkapenyuk, V.1
Suel, T.2
-
53
-
-
0034826587
-
Controlling the robots of web search engines
-
Cambridge, Massachusetts, USA, June
-
J. Talim, Z. Liu, P. Nain, and E. G. C. Jr. Controlling the robots of web search engines. In Proceedings of ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/Performance), pages 236-244, Cambridge, Massachusetts, USA, June 2001.
-
(2001)
Proceedings of ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/Performance)
, pp. 236-244
-
-
Talim, J.1
Liu, Z.2
Nain, P.3
C Jr., E.G.4
-
54
-
-
0036109905
-
Discovery of web robot sessions based on their navigational patterns
-
DOI 10.1023/A:1013228602957
-
P.-N. Tan and V. Kumar. Discovery of web robots session based on their navigational patterns. Data Mining and Knowledge discovery, 6(1):9-35, 2002. (Pubitemid 37113874)
-
(2002)
Data Mining and Knowledge Discovery
, vol.6
, Issue.1
, pp. 9-35
-
-
Tan, P.-N.1
Kumar, V.2
-
55
-
-
0003962632
-
-
Country Profiles
-
The Economist. Country Profiles, 2002.
-
(2002)
The Economist
-
-
-
56
-
-
77953073641
-
-
United Nations. Population Division, 2002
-
United Nations. Population Division, 2002.
-
-
-
-
58
-
-
84937389622
-
Design and implementation of a distributed crawler and filtering processor
-
Proceedings of the fifth Next Generation Information Technologies and Systems (NGITS), Caesarea, Israel, June Springer
-
D. Zeinalipour-Yazti and M. D. Dikaiakos. Design and implementation of a distributed crawler and filtering processor. In Proceedings of the fifth Next Generation Information Technologies and Systems (NGITS), volume 2382 of Lecture Notes in Computer Science, pages 58-74, Caesarea, Israel, June 2002. Springer.
-
(2002)
Lecture Notes in Computer Science
, vol.2382
, pp. 58-74
-
-
Zeinalipour-Yazti, D.1
Dikaiakos, M.D.2
-
59
-
-
35048868826
-
Making eigenvector-based reputation systems robust to collusion
-
Proceedings of the third Workshop on Web Graphs (WAW), Rome, Italy, October Springer
-
H. Zhang, A. Goel, R. Govindan, K. Mason, and B. V. Roy. Making eigenvector-based reputation systems robust to collusion. In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science, pages 92-104, Rome, Italy, October 2004. Springer.
-
(2004)
Lecture Notes in Computer Science
, vol.3243
, pp. 92-104
-
-
Zhang, H.1
Goel, A.2
Govindan, R.3
Mason, K.4
Roy, B.V.5
|