-
1
-
-
57349114760
-
-
Internet Forum Software
-
Internet Forum Software. http://en.wikipedia.org/wiki/category:internet- forum-software.
-
-
-
-
5
-
-
77953053635
-
Crawling a country: Better strategies than breadth-first for Web page ordering
-
Chiba, Japan, May
-
th WWW, pages 864-872, Chiba, Japan, May 2005.
-
(2005)
th WWW
, pp. 864-872
-
-
Baeza-Yates, R.A.1
Castillo, C.2
Marin, M.3
Rodriguez, A.4
-
6
-
-
35348921241
-
Do not crawl in the DUST: Different URLs with similar text
-
Banff, Alberta, Canada, May
-
th WWW, pages 111-120, Banff, Alberta, Canada, May 2007.
-
(2007)
th WWW
, pp. 111-120
-
-
Bar-Yossef, Z.1
Keidar, I.2
Schonfeld, U.3
-
7
-
-
33644554393
-
Crawler-friendly Web servers
-
Sept
-
O. Brandman, Junghoo Cho, H. Garcia-Molina, and N. Shivakumar. Crawler-friendly Web servers. ACM SIGMETRICS Performance Evaluation Review, 28(2):9-14, Sept. 2000.
-
(2000)
ACM SIGMETRICS Performance Evaluation Review
, vol.28
, Issue.2
, pp. 9-14
-
-
Brandman, O.1
Cho, J.2
Garcia-Molina, H.3
Shivakumar, N.4
-
8
-
-
0038589165
-
The anatomy of a large-scale hypertextual Web search engine
-
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30(1-7):107-117, 1998.
-
(1998)
Computer Networks
, vol.30
, Issue.1-7
, pp. 107-117
-
-
Brin, S.1
Page, L.2
-
9
-
-
0010362121
-
Syntactic clustering of the Web
-
Santa Clara, California, USA, Apr
-
th WWW, pages 1157-1166, Santa Clara, California, USA, Apr. 1997.
-
(1997)
th WWW
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
10
-
-
8644236286
-
VIPS: A vision based page segmentation algorithm
-
MSR-TR-2003-79, 2003
-
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. VIPS: a vision based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79, 2003.
-
Microsoft Technical Report
-
-
Cai, D.1
Yu, S.2
Wen, J.-R.3
Ma, W.-Y.4
-
11
-
-
0033294474
-
Focused crawling: A new approach to topic-specific Web resource discovery
-
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks, 31(11-16):1623-1640, 1999.
-
(1999)
Computer Networks
, vol.31
, Issue.11-16
, pp. 1623-1640
-
-
Chakrabarti, S.1
van den Berg, M.2
Dom, B.3
-
13
-
-
18844436436
-
Clustering Web pages based on their structure
-
Sept
-
V. Crescenzia, P. Merialdoa, and P. Missier. Clustering Web pages based on their structure. Data & Knowledge Engineering, 54(3):279-299, Sept. 2005.
-
(2005)
Data & Knowledge Engineering
, vol.54
, Issue.3
, pp. 279-299
-
-
Crescenzia, V.1
Merialdoa, P.2
Missier, P.3
-
14
-
-
4544259509
-
Locality-sensitive hashing scheme based on p-stable distributions
-
NY, USA, Jun
-
th SCG, pages 253-262, NY, USA, Jun. 2004.
-
(2004)
th SCG
, pp. 253-262
-
-
Datar, M.1
Immorlica, N.2
Indyk, P.3
Mirrokni, V.S.4
-
16
-
-
42549134877
-
-
Hong Kong, Dec
-
Y. Guo, K. Li, K. Zhang, and G. Zhang. Board forum crawling: a Web crawling method for Web forum. In Proc. 2006 IEEE/WIC/ACM Int. Conf. Web Intelligence, pages 745-748, Hong Kong, Dec. 2006.
-
(2006)
Proc. 2006 IEEE/WIC/ACM Int. Conf. Web Intelligence, Board forum crawling: A Web crawling method for Web forum
, pp. 745-748
-
-
Guo, Y.1
Li, K.2
Zhang, K.3
Zhang, G.4
-
17
-
-
33750296887
-
Finding near-duplicate Web pages: A large-scale evaluation of algorithms
-
Seattle, Washington, USA, Aug
-
th SIGIR, pages 284-291, Seattle, Washington, USA, Aug. 2006.
-
(2006)
th SIGIR
, pp. 284-291
-
-
Henzinger, M.1
-
18
-
-
1842832183
-
Automatic generation of agents for collecting hidden Web pages for data extraction
-
May
-
J. P. Lage, A. S. da Silva, P. B. Golgher, and A. H. F. Laender. Automatic generation of agents for collecting hidden Web pages for data extraction. Data & Knowledge Engineering, 49(2):177-196, May 2004.
-
(2004)
Data & Knowledge Engineering
, vol.49
, Issue.2
, pp. 177-196
-
-
Lage, J.P.1
da Silva, A.S.2
Golgher, P.B.3
Laender, A.H.F.4
-
19
-
-
35348911985
-
Detecting near-duplicates for Web crawling
-
Banff, Alberta, Canada, May
-
th WWW, pages 141-150, Banff, Alberta, Canada, May 2007.
-
(2007)
th WWW
, pp. 141-150
-
-
Manku, G.S.1
Jain, A.2
Sarma, A.D.3
-
20
-
-
33745753308
-
User-centric Web crawling
-
Chiba, Japan, May
-
th WWW, pages 401-411, Chiba, Japan, May 2005.
-
(2005)
th WWW
, pp. 401-411
-
-
Pandey, S.1
Olston, C.2
-
22
-
-
4644340823
-
Automatic Web news extraction using tree edit distance
-
NY, USA, May
-
th WWW, pages 502-511, NY, USA, May 2004.
-
(2004)
th WWW
, pp. 502-511
-
-
Reis, D.C.1
Golgher, P.B.2
Silva, A.S.3
Laender, A.F.4
-
24
-
-
26444532019
-
Learning important models for Web page blocks based on layout and content analysis
-
Dec
-
R. Song, H. Liu, J.-R. Wen, W.-Y. Ma. Learning important models for Web page blocks based on layout and content analysis. ACMSIGKDD Explorations Newsletter, 6(2): 14-23, Dec. 2004.
-
(2004)
ACMSIGKDD Explorations Newsletter
, vol.6
, Issue.2
, pp. 14-23
-
-
Song, R.1
Liu, H.2
Wen, J.-R.3
Ma, W.-Y.4
-
25
-
-
33750333928
-
-
th SIGIR, pages 292-299, Seattle, USA, Aug. 2006.
-
th SIGIR, pages 292-299, Seattle, USA, Aug. 2006.
-
-
-
-
26
-
-
33750797710
-
Structured data extraction from the Web based on partial tree alignment
-
Dec
-
Y. Zhai and B. Liu. Structured data extraction from the Web based on partial tree alignment. IEEE Trans. Knowl. Data E ng., 18(12): 1614-1628, Dec. 2006.
-
(2006)
IEEE Trans. Knowl. Data E ng
, vol.18
, Issue.12
, pp. 1614-1628
-
-
Zhai, Y.1
Liu, B.2
-
27
-
-
35348926088
-
Expertise networks in online communities: Structure and algorithms
-
Banff, Canada, May
-
th WWW, pages 221-230, Banff, Canada, May 2007.
-
(2007)
th WWW
, pp. 221-230
-
-
Zhang, J.1
Ackerman, M.S.2
Adamic, L.3
-
28
-
-
36849062139
-
Joint optimization of wrapper generation and template detection
-
San Jose, CA, USA, Aug
-
th KDD, pages 894-902, San Jose, CA, USA, Aug. 2007.
-
(2007)
th KDD
, pp. 894-902
-
-
Zheng, S.1
Song, R.2
Wen, J.-R.3
Wu, D.4
|