메뉴 건너뛰기




Volumn 13, Issue 1, 2010, Pages 70-95

Estimating deep web data source size by capture-recapture method

Author keywords

Capture recapture; Deep web; Estimators

Indexed keywords


EID: 76349085418     PISSN: 13864564     EISSN: 15737659     Source Type: Journal    
DOI: 10.1007/s10791-009-9107-y     Document Type: Article
Times cited : (25)

References (37)
  • 2
    • 34547475354 scopus 로고    scopus 로고
    • Siphoning hidden-web data through keyword-based interfaces
    • Barbosa, L., & Freire, J. (2004). Siphoning hidden-web data through keyword-based interfaces. In Proceedings of SBBD, 2004.
    • (2004) Proceedings of SBBD, 2004
    • Barbosa, L.1    Freire, J.2
  • 3
    • 34250665891 scopus 로고    scopus 로고
    • Random sampling from a search engine's index
    • Bar-Yossef, Z., & Gurevich, M. (2006). Random sampling from a search engine's index. In Proceedings of WWW, 2006, pp, 367-376.
    • (2006) Proceedings of WWW, 2006 , pp. 367-376
    • Bar-Yossef, Z.1    Gurevich, M.2
  • 6
    • 20444387298 scopus 로고    scopus 로고
    • A technique for measuring the relative size and overlap of public Web search engines
    • Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of WWW, 1998, pp. 379-388.
    • (1998) Proceedings of WWW, 1998 , pp. 379-388
    • Bharat, K.1    Broder, A.2
  • 7
    • 84949490837 scopus 로고    scopus 로고
    • Can we correctly estimate the total number of pages in Google for a specific language?
    • Bolshakov, I. A., & Galicia-Haro, S. N. (2003). Can we correctly estimate the total number of pages in Google for a specific language? CICLing 2003, pp. 415-419.
    • (2003) CICLing , vol.2003 , pp. 415-419
    • Bolshakov, I.A.1    Galicia-Haro, S.N.2
  • 10
    • 2442546444 scopus 로고    scopus 로고
    • Probe, cluster, and discover: Focused extraction of QA-pagelets from the deep web
    • Caverlee, J., Liu, L., & Buttler, D. (2004). Probe, cluster, and discover: Focused extraction of QA-pagelets from the deep web. In Proceedings of ICDE 2004, pp. 103-114.
    • (2004) Proceedings of ICDE 2004 , pp. 103-114
    • Caverlee, J.1    Liu, L.2    Buttler, D.3
  • 12
    • 84944327150 scopus 로고    scopus 로고
    • RoadRunner: Towards automatic data extraction from large web sites
    • Crescenzi, V., Mecca, G., & Merialdo, P. (2001). RoadRunner: Towards automatic data extraction from large web sites. In Proceedings of VLDB 2001, pp. 109-118.
    • (2001) Proceedings of VLDB 2001 , pp. 109-118
    • Crescenzi, V.1    Mecca, G.2    Merialdo, P.3
  • 13
    • 0000774305 scopus 로고
    • The multiple-recapture census: I. Estimation of a closed population
    • Darroch, J. N. (1958). The multiple-recapture census: I. Estimation of a closed population. Biometrika, 45(3/4), 343-359.
    • (1958) Biometrika , vol.45 , Issue.3-4 , pp. 343-359
    • Darroch, J.N.1
  • 14
    • 12244289051 scopus 로고    scopus 로고
    • How large is the World Wide Web?
    • Springer
    • Dobra, A., & Fienberg, S. (2004). How large is the World Wide Web? Web Dynamics, Springer, pp. 23-44.
    • (2004) Web Dynamics , pp. 23-44
    • Dobra, A.1    Fienberg, S.2
  • 15
    • 77953071782 scopus 로고    scopus 로고
    • The indexable web is more than 11.5 billion pages
    • Gulli, A., & Signorini A. (2005). The indexable web is more than 11.5 billion pages. In Proceedings of WWW 2005, pp. 902-903.
    • (2005) Proceedings of WWW, 2005 , pp. 902-903
    • Gulli, A.1    Signorini, A.2
  • 16
    • 0002812751 scopus 로고
    • Sampling-based estimation of the number of distinct values of an attribute
    • Haas, P. J., Naughton, J. F., Seshadri, S., & Stokes, L. (1995). Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of VLDB 1995, pp. 311-322.
    • (1995) Proceedings of VLDB 1995 , pp. 311-322
    • Haas, P.J.1    Naughton, J.F.2    Seshadri, S.3    Stokes, L.4
  • 18
    • 0018442360 scopus 로고
    • A unified approach to limit theorems for urn models
    • Holst, L. (1979). A unified approach to limit theorems for urn models. Journal of Applied Probability, 16(1), 154-162.
    • (1979) Journal of Applied Probability , vol.16 , Issue.1 , pp. 154-162
    • Holst, L.1
  • 20
    • 0002781191 scopus 로고    scopus 로고
    • Accurately and reliably extracting data from the web: A machine learning approach
    • Knoblock, C. A., Lerman, K., Minton, S., & Muslea, I. (2000). Accurately and reliably extracting data from the web: A machine learning approach. IEEE Data Engineering Bulletin, 23(4), 33-41.
    • (2000) IEEE Data Engineering Bulletin , vol.23 , Issue.4 , pp. 33-41
    • Knoblock, C.A.1    Lerman, K.2    Minton, S.3    Muslea, I.4
  • 23
    • 0037818401 scopus 로고    scopus 로고
    • Discovering the representative of a search engine
    • Liu, K., Yu, C., & Meng, W. (2002). Discovering the representative of a search engine. In Proceedings of CIKM'02, pp. 652-654.
    • (2002) Proceedings of CIKM'02 , pp. 652-654
    • Liu, K.1    Yu, C.2    Meng, W.3
  • 24
    • 70349243672 scopus 로고    scopus 로고
    • Efficient estimation of the size of text deep web data source
    • Lu, J. (2008). Efficient estimation of the size of text deep web data source. In Proceedings of CIKM 2008, pp. 1485-1486.
    • (2008) Proceedings of CIKM 2008 , pp. 1485-1486
    • Lu, J.1
  • 27
    • 27544458897 scopus 로고    scopus 로고
    • Downloading textual hidden web content through keyword queries
    • Ntoulas, A., Zerfos, P., & Cho, J. (2005). Downloading textual hidden web content through keyword queries. In Proceedings of JCDL, 2005, pp. 100-109.
    • (2005) Proceedings of JCDL, 2005 , pp. 100-109
    • Ntoulas, A.1    Zerfos, P.2    Cho, J.3
  • 28
    • 0025659654 scopus 로고
    • Statistical inference for capture crecapture experiments. The Wildlife Society
    • Pollock, K. H., Nichols, J. D., Brownie, C., & Hines, J. E. (1990). Statistical inference for capture crecapture experiments. The Wildlife Society. Wildlife Monographs, 107, 3-97.
    • (1990) Wildlife Monographs , vol.107 , pp. 3-97
    • Pollock, K.H.1    Nichols, J.D.2    Brownie, C.3    Hines, J.E.4
  • 32
    • 33750285514 scopus 로고    scopus 로고
    • SMM Tahaghoghi, capturing collection size for distributed non-cooperative retrieval
    • Shokouhi, M., Zobel, J., & Scholer, F. (2006). SMM Tahaghoghi, capturing collection size for distributed non-cooperative retrieval. In Proceedings of SIGIR'06, pp. 316-323.
    • (2006) Proceedings of SIGIR'06 , pp. 316-323
    • Shokouhi, M.1    Zobel, J.2    Scholer, F.3
  • 33
    • 1542347745 scopus 로고    scopus 로고
    • Relevant document distribution estimation method for resource selection
    • Si, L., & Callan, J. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of SIGIR'03.
    • (2003) Proceedings of SIGIR'03
    • Si, L.1    Callan, J.2
  • 34
    • 36448993278 scopus 로고    scopus 로고
    • Evaluating sampling methods for uncooperative collections
    • Thomas, P., & Hawking, D. (2007). Evaluating sampling methods for uncooperative collections. In Proceedings of SIGIR, 2007.
    • (2007) Proceedings of SIGIR, 2007
    • Thomas, P.1    Hawking, D.2
  • 36
    • 33749617417 scopus 로고    scopus 로고
    • Query selection techniques for efficient crawling of structured web sources
    • Wu, P., Wen, J.-R., Liu, H., & Ma, W.-Y. (2006). Query selection techniques for efficient crawling of structured web sources. In Proceedings of ICDE, 2006, pp. 47-56.
    • (2006) Proceedings of ICDE, 2006 , pp. 47-56
    • Wu, P.1    Wen, J.-R.2    Liu, H.3    Ma, W.-Y.4
  • 37
    • 36448957566 scopus 로고    scopus 로고
    • Estimating collection size with logistic regression
    • Xu, J., Wu, S., & Li, X. (2007). Estimating collection size with logistic regression. In Proceedings of SIGIR'07, pp. 789-790.
    • (2007) Proceedings of SIGIR'07 , pp. 789-790
    • Xu, J.1    Wu, S.2    Li, X.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.