메뉴 건너뛰기




Volumn 22, Issue 1, 2013, Pages 47-72

OXPath: A language for scalable data extraction, automation, and crawling on the deep web

Author keywords

AJAX; Automation; Crawling; Data extraction; DOM; Web applications; Web extraction; XPath

Indexed keywords

AUTOMATION; DATA MINING; EXTRACTION;

EID: 84872659606     PISSN: 10668888     EISSN: 0949877X     Source Type: Journal    
DOI: 10.1007/s00778-012-0286-6     Document Type: Article
Times cited : (76)

References (48)
  • 1
    • 84872661420 scopus 로고    scopus 로고
    • http://www.iopus.com/iMacros
  • 2
    • 84872670215 scopus 로고    scopus 로고
    • http://www.newprosoft.com/web-content-extractor.htm
  • 3
    • 84872657819 scopus 로고    scopus 로고
    • http://www.visualwebripper.com
  • 4
    • 84872650110 scopus 로고    scopus 로고
    • http://www.web-harvest.sourceforge.net
  • 5
    • 84872662003 scopus 로고    scopus 로고
    • http://www.w3.org/TR/CSS2/selector.html
  • 6
    • 63349103097 scopus 로고    scopus 로고
    • Accessing the deep web: When good ideas go bad
    • Alba, A., Bhagwan, V., Grandison, T.: Accessing the deep web: when good ideas go bad. In: OOPSLA (2008)
    • (2008) OOPSLA
    • Alba, A.1    Bhagwan, V.2    Grandison, T.3
  • 7
    • 70350652136 scopus 로고    scopus 로고
    • XPath-wrapper induction by generalizing tree traversal patterns
    • Anton, T.:XPath-wrapper induction by generalizing tree traversal patterns. In: LWA (2005)
    • (2005) LWA
    • Anton, T.1
  • 9
    • 0031649136 scopus 로고    scopus 로고
    • Weboql: Restructuring documents, databases, and webs
    • Arocena, G.O., Mendelzon, A.O.: Weboql: Restructuring documents, databases, and webs. In: ICDE (1998)
    • (1998) ICDE
    • Arocena, G.O.1    Mendelzon, A.O.2
  • 10
    • 33947304810 scopus 로고    scopus 로고
    • L-wrappers: Concepts, properties and construction: A declarative approach to data extraction from web sources
    • Badica, C., Badica, A., Popescu, E., Abraham, A.: L-wrappers: concepts, properties and construction: A declarative approach to data extraction from web sources. Soft Comput. 11(8), 753-772 (2007)
    • (2007) Soft Comput , vol.11 , Issue.8 , pp. 753-772
    • Badica, C.1    Badica, A.2    Popescu, E.3    Abraham, A.4
  • 12
    • 84944318551 scopus 로고    scopus 로고
    • Visual web information extraction with Lixto
    • Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: VLDB (2001)
    • (2001) VLDB
    • Baumgartner, R.1    Flesca, S.2    Gottlob, G.3
  • 13
    • 69149106280 scopus 로고    scopus 로고
    • Xpath leashed
    • Benedikt, M., Koch, C.: Xpath leashed. CSUR 41, 3:1-3:54 (2009)
    • (2009) CSUR , vol.41 , Issue.3
    • Benedikt, M.1    Koch, C.2
  • 14
    • 0003259187 scopus 로고    scopus 로고
    • The deep web: Surfacing hidden value
    • Bergman, M.K.:The deep web: Surfacing hidden value. J. Electron. Publ. 7(1), 1-17 (2001)
    • (2001) J. Electron. Publ , vol.7 , Issue.1 , pp. 1-17
    • Bergman, M.K.1
  • 18
    • 0038589165 scopus 로고    scopus 로고
    • The anatomy of a large-scale hypertextual web search engine
    • Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107-117 (1998)
    • (1998) Comput. Netw. ISDN Syst , vol.30 , Issue.1-7 , pp. 107-117
    • Brin, S.1    Page, L.2
  • 19
    • 84859197607 scopus 로고    scopus 로고
    • WebTables: Exploring the power of tables on the web
    • Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wy, E., Zhang, Y.: WebTables: exploring the power of tables on the web. PVLDB 1(1), 538-549 (2008)
    • (2008) PVLDB , vol.1 , Issue.1 , pp. 538-549
    • Cafarella, M.J.1    Halevy, A.Y.2    Wang, D.Z.3    Wy, E.4    Zhang, Y.5
  • 21
    • 33748336500 scopus 로고    scopus 로고
    • A survey of web information extraction systems
    • Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. TKDE 18(10), 1411-1428 (2006)
    • (2006) TKDE , vol.18 , Issue.10 , pp. 1411-1428
    • Chang, C.-H.1    Kayed, M.2    Girgis, M.R.3    Shaalan, K.F.4
  • 22
    • 0036373394 scopus 로고    scopus 로고
    • Roadrunner: Automatic data extraction from data-intensive web sites
    • Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: automatic data extraction from data-intensive web sites. In: SIGMOD (2002)
    • (2002) SIGMOD
    • Crescenzi, V.1    Mecca, G.2    Merialdo, P.3
  • 25
    • 84861058689 scopus 로고    scopus 로고
    • Oxpath: A language for scalable, memory-efficient data extraction from web applications
    • Furche, T., Gottlob, G., Grasso, G., Schallhart, C., Sellers, A.: Oxpath: A language for scalable, memory-efficient data extraction from web applications. PVLDB 4(11), 1016-1027 (2011)
    • (2011) PVLDB , vol.4 , Issue.11 , pp. 1016-1027
    • Furche, T.1    Gottlob, G.2    Grasso, G.3    Schallhart, C.4    Sellers, A.5
  • 26
    • 23944498592 scopus 로고    scopus 로고
    • Efficient algorithms for processing XPath queries
    • Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. In: TODS (2005)
    • (2005) TODS
    • Gottlob, G.1    Koch, C.2    Pichler, R.3
  • 29
    • 79951675059 scopus 로고    scopus 로고
    • Mercator: A scalable, extensible web crawler
    • Heydon, A., Najork, M.: Mercator: a scalable, extensible web crawler. World Wide Web 2(4), 219-229 (1999)
    • (1999) World Wide Web , vol.2 , Issue.4 , pp. 219-229
    • Heydon, A.1    Najork, M.2
  • 31
    • 56349100780 scopus 로고    scopus 로고
    • Coscripter: Automating & sharing how-to knowledge in the enterprise
    • Leshed, G., Haber, E.M., Matthews, T., Lau, T.: Coscripter: automating & sharing how-to knowledge in the enterprise. In: CHI (2008)
    • (2008) CHI
    • Leshed, G.1    Haber, E.M.2    Matthews, T.3    Lau, T.4
  • 33
    • 0033893885 scopus 로고    scopus 로고
    • Xwrap: An xml-enabled wrapper construction system for web information sources
    • Liu, L., Pu, C., Han, W.: Xwrap: an xml-enabled wrapper construction system for web information sources. In: ICDE (2000)
    • (2000) ICDE
    • Liu, L.1    Pu, C.2    Han, W.3
  • 34
    • 84963903946 scopus 로고    scopus 로고
    • A rule-based query language for html
    • Liu, M., Ling, T.W.: A rule-based query language for html. In: DASFAA (2001)
    • (2001) DASFAA
    • Liu, M.1    Ling, T.W.2
  • 35
    • 33745196199 scopus 로고    scopus 로고
    • Conditional XPath
    • Marx, M.: Conditional XPath. ACM Trans. Database Syst. 30(4), 929-959 (2005)
    • (2005) ACM Trans. Database Syst , vol.30 , Issue.4 , pp. 929-959
    • Marx, M.1
  • 36
    • 24344444895 scopus 로고    scopus 로고
    • Semantic characterizations of navigational XPath
    • Marx, M., de Rijke, M.: Semantic characterizations of navigational XPath. ACM SIGMOD Rec. 34(2), 41-46 (2005)
    • (2005) ACM SIGMOD Rec , vol.34 , Issue.2 , pp. 41-46
    • Marx, M.1    de Rijke, M.2
  • 38
    • 84873896990 scopus 로고    scopus 로고
    • Web-prospector-an automatic, sitewide wrapper induction approach for scientific deep-web databases
    • Mir, S., Staab, S., Rojas, I.: Web-prospector-an automatic, sitewide wrapper induction approach for scientific deep-web databases. In: BTW (2009)
    • (2009) BTW
    • Mir, S.1    Staab, S.2    Rojas, I.3
  • 40
    • 0037025796 scopus 로고    scopus 로고
    • Effective web data extraction with standard xml technologies
    • Myllymaki, J.: Effective web data extraction with standard xml technologies. Comput. Netw. 39(5), 635-644 (2002)
    • (2002) Comput. Netw , vol.39 , Issue.5 , pp. 635-644
    • Myllymaki, J.1
  • 42
    • 33750819991 scopus 로고    scopus 로고
    • The wargo system: Semi-automatic wrapper generation in presence of complex data access modes
    • Raposo, J., Pan, A., Álvarez, M., Hidalgo, J., Viña., A.: The wargo system: semi-automatic wrapper generation in presence of complex data access modes. In: DEXA (2002)
    • (2002) DEXA
    • Raposo, J.1    Pan, A.2    Álvarez, M.3    Hidalgo, J.4    Viña, A.5
  • 43
    • 41149085958 scopus 로고    scopus 로고
    • Web macros by example: Users managing the www of applications
    • In:,. ACM
    • Safonov, A.: Web macros by example: users managing the www of applications. In: CHI, pp. 71-72. ACM (1999)
    • (1999) CHI , pp. 71-72
    • Safonov, A.1
  • 44
    • 0002763572 scopus 로고    scopus 로고
    • Building light-weight wrappers for legacy web data-sources usingw4f
    • Sahuguet, A., Azavant, F.: Building light-weight wrappers for legacy web data-sources usingw4f. In: VLDB, pp. 738-741 (1999)
    • (1999) VLDB , pp. 738-741
    • Sahuguet, A.1    Azavant, F.2
  • 45
    • 57849149062 scopus 로고    scopus 로고
    • Wraplet: Wrapping your web contents with a lightweight language
    • Sawa, N., Morishima, A., Sugimoto, S., Kitagawa, H.: Wraplet: Wrapping your web contents with a lightweight language. In: SITIS, pp. 387-394 (2007)
    • (2007) SITIS , pp. 387-394
    • Sawa, N.1    Morishima, A.2    Sugimoto, S.3    Kitagawa, H.4
  • 46
    • 77951560898 scopus 로고    scopus 로고
    • Declarative information extraction using datalogwith embedded extraction predicates
    • Shen, W., Doan, A., Naughton, J.F., Ramakrishnan, R: Declarative information extraction using datalogwith embedded extraction predicates. In: VLDB (2007)
    • (2007) VLDB
    • Shen, W.1    Doan, A.2    Naughton, J.F.3    Ramakrishnan, R.4
  • 47
    • 77952357091 scopus 로고    scopus 로고
    • On design of browseroriented data extraction system and plug-ins
    • Su, J.-Y., Sun, D.-J., Wu, I.-C., Chen, L.-P.: On design of browseroriented data extraction system and plug-ins. J. Mar. Sci. Technol. 18(2), 189-200 (2010)
    • (2010) J. Mar. Sci. Technol , vol.18 , Issue.2 , pp. 189-200
    • Su, J.-Y.1    Sun, D.-J.2    Wu, I.-C.3    Chen, L.-P.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.