메뉴 건너뛰기




Volumn , Issue , 2008, Pages 421-434

From dirt to shovels: Fully automatic tool generation from ad hoc data

Author keywords

ad hoc data; data description languages; grammar induction; tool generation

Indexed keywords

AD HOC DATUM; AUTOMATIC TOOLS; DATA ANALYSIS TOOL; DATA DESCRIPTION LANGUAGES; FINANCIAL ANALYSTS; FORMAT-INDEPENDENT; GRAMMAR INDUCTION; HUMAN INTERVENTION; INFERENCE ALGORITHM; PROCESSING TOOLS; SEMI STRUCTURED DATA; SEMI-STRUCTURED QUERIES; SOFTWARE INFRASTRUCTURE; SYSTEMS ADMINISTRATOR; TECHNICAL CONTRIBUTION; TOOL GENERATION; TRAINING DATA; TRANSFORMATION TOOLS;

EID: 84865636979     PISSN: 07308566     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1328438.1328488     Document Type: Conference Paper
Times cited : (45)

References (37)
  • 1
    • 84976832596 scopus 로고
    • Inference of reversible languages
    • Dana Angluin. Inference of reversible languages. Journal of the ACM, 29 (3):741-765, 1982.
    • (1982) Journal of the ACM , vol.29 , Issue.3 , pp. 741-765
    • Angluin, D.1
  • 2
    • 1142303684 scopus 로고    scopus 로고
    • Extracting structured data from web pages
    • Arvind Arasu and Hector Garcia-Molina. Extracting structured data from web pages. In SIGMOD, pages 337-348, 2003.
    • (2003) SIGMOD , pp. 337-348
    • Arasu, A.1    Garcia-Molina, H.2
  • 3
    • 84893833429 scopus 로고    scopus 로고
    • Inference of concise DTDs from XML data
    • Geert Jan Bex, Frank Neven, Thomas Schwentick, and Karl Tuyls. Inference of concise DTDs from XML data. In VLDB, pages 115-126, 2006.
    • (2006) VLDB , pp. 115-126
    • Bex, G.J.1    Neven, F.2    Schwentick, T.3    Tuyls, K.4
  • 4
    • 84882718707 scopus 로고    scopus 로고
    • Inferring XML schema definitions from XML data
    • Geert Jan Bex, Frank Neven, and Stijn Vansummeren. Inferring XML schema definitions from XML data. In VLDB, pages 998-1009, 2007.
    • (2007) VLDB , pp. 998-1009
    • Bex, G.J.1    Neven, F.2    Vansummeren, S.3
  • 5
    • 0034832365 scopus 로고    scopus 로고
    • Automatic segmentation of text into structured records
    • New York, NY, USA
    • Vinayak Borkar, Kaustubh Deshmukh, and Sunita Sarawagi. Automatic segmentation of text into structured records. In SIGMOD, pages 175-186, New York, NY, USA, 2001.
    • (2001) SIGMOD , pp. 175-186
    • Borkar, V.1    Deshmukh, K.2    Sarawagi, S.3
  • 8
    • 84944327150 scopus 로고    scopus 로고
    • Roadrunner: Towards automatic data extraction from large web sites
    • San Francisco, CA, USA
    • Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, pages 109-118, San Francisco, CA, USA, 2001.
    • (2001) VLDB , pp. 109-118
    • Crescenzi, V.1    Mecca, G.2    Merialdo, P.3
  • 9
    • 0742267047 scopus 로고    scopus 로고
    • Learning regular languages using RFSAs
    • François Denis, Aurélien Lemay, and Alain Terlutte. Learning regular languages using RFSAs. Theoretical Computer Science, 313(2):267-294, 2004.
    • (2004) Theoretical Computer Science , vol.313 , Issue.2 , pp. 267-294
    • Denis, F.1    Lemay, A.2    Terlutte, A.3
  • 10
    • 34250684633 scopus 로고    scopus 로고
    • PADX: Querying large-scale ad hoc data with XQuery
    • January
    • Mary F. Fernández, Kathleen Fisher, Robert Gruber, and Yitzhak Mandelbaum. PADX: Querying large-scale ad hoc data with XQuery. In PLANX, January 2006.
    • (2006) PLANX
    • Fernández, M.F.1    Fisher, K.2    Gruber, R.3    Mandelbaum, Y.4
  • 11
    • 84899424034 scopus 로고    scopus 로고
    • Learning XML grammars
    • Henning Fernau. Learning XML grammars. In MLDM, pages 73-87, 2001.
    • (2001) MLDM , pp. 73-87
    • Fernau, H.1
  • 12
    • 31844436571 scopus 로고    scopus 로고
    • PADS: A domain specific language for processing ad hoc data
    • June
    • Kathleen Fisher and Robert Gruber. PADS: A domain specific language for processing ad hoc data. In PLDI, pages 295-304, June 2005.
    • (2005) PLDI , pp. 295-304
    • Fisher, K.1    Gruber, R.2
  • 13
    • 33745830891 scopus 로고    scopus 로고
    • The next 700 data description languages
    • January
    • Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. The next 700 data description languages. In POPL, January 2006.
    • (2006) POPL
    • Fisher, K.1    Mandelbaum, Y.2    Walker, D.3
  • 14
    • 0000216094 scopus 로고    scopus 로고
    • XTRACT: A system for extracting document type descriptors from XML documents
    • Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, and Kyuseok Shim. XTRACT: A system for extracting document type descriptors from XML documents. In SIGMOD, pages 165-176, 2000.
    • (2000) SIGMOD , pp. 165-176
    • Minos, N.1    Garofalakis, A.G.2    Rastogi, R.3    Seshadri, S.4    Shim, K.5
  • 15
    • 49949150022 scopus 로고
    • Language identification in the limit
    • E. M. Gold. Language identification in the limit. Information and Control, 10(5):447-474, 1967.
    • (1967) Information and Control , vol.10 , Issue.5 , pp. 447-474
    • Gold, E.M.1
  • 18
    • 0345201769 scopus 로고    scopus 로고
    • TANE: An efficient algorithm for discovering functional and approximate dependencies
    • Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100-111, 1999.
    • (1999) The Computer Journal , vol.42 , Issue.2 , pp. 100-111
    • Huhtala, Y.1    Kärkkäinen, J.2    Porkka, P.3    Toivonen, H.4
  • 20
    • 0003614560 scopus 로고    scopus 로고
    • PhD thesis, University of Washington, Department of Computer Science and Engineering
    • N. Kushmerick. Wrapper induction for information extraction. PhD thesis, University of Washington, 1997. Department of Computer Science and Engineering.
    • (1997) Wrapper Induction for Information Extraction
    • Kushmerick, N.1
  • 21
    • 0001776223 scopus 로고    scopus 로고
    • Wrapper induction for information extraction
    • Nicholas Kushmerick, Daniel S.Weld, and Robert B. Doorenbos. Wrapper induction for information extraction. In IJCAI, pages 729-737, 1997.
    • (1997) IJCAI , pp. 729-737
    • Kushmerick, N.1    Weld, D.S.2    Doorenbos, R.B.3
  • 22
    • 3142742483 scopus 로고    scopus 로고
    • Using the structure of web sites for automatic segmentation of tables
    • New York, NY, USA
    • Kristina Lerman, Lise Getoor, Steven Minton, and Craig Knoblock. Using the structure of web sites for automatic segmentation of tables. In SIGMOD, pages 119-130, New York, NY, USA, 2004.
    • (2004) SIGMOD , pp. 119-130
    • Lerman, K.1    Getoor, L.2    Minton, S.3    Knoblock, C.4
  • 23
    • 0025952277 scopus 로고
    • Divergence measures based on the Shannon entropy
    • J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145-151, 1991.
    • (1991) IEEE Transactions on Information Theory , vol.37 , Issue.1 , pp. 145-151
    • Lin, J.1
  • 26
    • 84880839103 scopus 로고    scopus 로고
    • Active learning with strong and weak views: A case study on wrapper induction
    • IonMuslea, SteveMinton, and Craig Knoblock. Active learning with strong and weak views: a case study on wrapper induction. In IJCAI, pages 415-420, 2003.
    • (2003) IJCAI , pp. 415-420
    • Muslea, I.1    Minton, S.2    Knoblock, C.3
  • 27
    • 85094395318 scopus 로고    scopus 로고
    • Learning to recognize tables in free text
    • Morristown, NJ, USA
    • Hwee Tou Ng, Chung Yong Lim, and Jessica Li Teng Koo. Learning to recognize tables in free text. In ACL, pages 443-450, Morristown, NJ, USA, 1999.
    • (1999) ACL , pp. 443-450
    • Ng, H.T.1    Lim, C.Y.2    Koo, J.L.T.3
  • 29
    • 67650153199 scopus 로고    scopus 로고
    • PADS Project. PADS project. http://www.padsproj.org/, 2007.
    • (2007) PADS Project
  • 30
    • 1542287488 scopus 로고    scopus 로고
    • Table extraction using conditional
    • random fields. New York, NY, USA
    • David Pinto, Andrew McCallum, Xing Wei, and W. Bruce Croft. Table extraction using conditional random fields. In SIGIR, pages 235-242, New York, NY, USA, 2003.
    • (2003) SIGIR , pp. 235-242
    • Pinto, D.1    McCallum, A.2    Wei, X.3    Croft, W.B.4
  • 31
    • 33646425033 scopus 로고    scopus 로고
    • Learning (k, l)-contextual tree languages for information extraction
    • Stefan Raeymaekers, Maurice Bruynooghe, and Jan Van den Bussche. Learning (k, l)-contextual tree languages for information extraction. In ECML, pages 305-316, 2005.
    • (2005) ECML , pp. 305-316
    • Raeymaekers, S.1    Bruynooghe, M.2    Van Den Bussche, J.3
  • 32
    • 84944315993 scopus 로고    scopus 로고
    • Potter's wheel: An interactive data cleaning system
    • Vijayshankar Raman and Joseph M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381-390, 2001.
    • (2001) VLDB , pp. 381-390
    • Raman, V.1    Hellerstein, J.M.2
  • 33
    • 0010645626 scopus 로고
    • The Rufus system: Information organization for semi-structured data
    • San Francisco, CA, USA
    • Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, and II Joachim Thomas. The Rufus system: Information organization for semi-structured data. In VLDB, pages 97-107, San Francisco, CA, USA, 1993.
    • (1993) VLDB , pp. 97-107
    • Shoens, K.A.1    Luniewski, A.2    Schwarz, P.M.3    Stamos, J.W.4    Thomas II, J.5
  • 34
    • 0032624184 scopus 로고    scopus 로고
    • Learning information extraction rules for semistructured and free text
    • Stephen Soderland. Learning information extraction rules for semistructured and free text. Machine Learning, 34(1-3):233-272, 1999.
    • (1999) Machine Learning , vol.34 , Issue.1-3 , pp. 233-272
    • Soderland, S.1
  • 35
    • 84946083373 scopus 로고
    • Inducing probabilistic grammars by bayesian model merging
    • Andreas Stolcke and Stephen Omohundro. Inducing probabilistic grammars by bayesian model merging. In ICGI, pages 106-118, 1994.
    • (1994) ICGI , pp. 106-118
    • Stolcke, A.1    Omohundro, S.2
  • 36
    • 84946757877 scopus 로고
    • Grammatical inference: An introduction survey
    • Enrique Vidal. Grammatical inference: An introduction survey. In ICGI, pages 1-4, 1994.
    • (1994) ICGI , pp. 1-4
    • Vidal, E.1
  • 37
    • 0016129937 scopus 로고
    • Approximate language identification
    • R. M. Wharton. Approximate language identification. Information and Control, 26(3):236-255, 1974.
    • (1974) Information and Control , vol.26 , Issue.3 , pp. 236-255
    • Wharton, R.M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.