메뉴 건너뛰기




Volumn 15, Issue 1, 2015, Pages 59-91

Loose, falling characters and sentences: The persistence of the OCR problem in digital repository E-Books

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84961291743     PISSN: 15312542     EISSN: 15307131     Source Type: Journal    
DOI: 10.1353/pla.2015.0005     Document Type: Article
Times cited : (7)

References (85)
  • 2
    • 34548291336 scopus 로고    scopus 로고
    • Metamorphosis: Remediation in Early English Books Online (EEBO)
    • note
    • Diana Kichuk applies remediation to electronic resources in her article "Metamorphosis: Remediation in Early English Books Online (EEBO)," Literary and Linguistic Computing 22, 3 (2007): 297-300.
    • (2007) Literary and Linguistic Computing , vol.22 , Issue.3 , pp. 297-300
    • Kichuk, D.1
  • 3
    • 33747957440 scopus 로고
    • The Places of Books in the Age of Electronic Reproduction
    • note
    • Geoffrey Nunberg, "The Places of Books in the Age of Electronic Reproduction," Representations 42 (Spring 1993): 2.
    • (1993) Representations , vol.42 , pp. 2
    • Nunberg, G.1
  • 5
    • 79959817869 scopus 로고    scopus 로고
    • A Library Without Walls
    • note
    • See Robert Darnton, "A Library Without Walls," New York Review of Books, NYR Blog (October 4, 2010), accessed July 17, 2014, http://www.nybooks.com/blogs/nyrblog/2010/ oct/04/library-without-walls/. Darnton proposed a National Digital Library in this seminal piece. He was instrumental in the founding of Digital Public Library of America (DPLA), launched in 2013 with the overwhelming support and cooperation of library and other culture communities in the United States.
    • (2010) New York Review of Books, NYR Blog
    • Darnton, R.1
  • 7
    • 84961333795 scopus 로고    scopus 로고
    • note
    • Project Gutenberg, The Project Gutenberg License (2012), accessed April 26, 2013, http:// www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License.
    • (2012) The Project Gutenberg License
  • 9
    • 84961328614 scopus 로고    scopus 로고
    • Digital Transformations and the Archival Nature of Surrogates
    • note
    • Paul Conway, "Digital Transformations and the Archival Nature of Surrogates," Archival Science (online version: April 20, 2014).
    • (2014) Archival Science
    • Conway, P.1
  • 10
    • 84961381421 scopus 로고    scopus 로고
    • note
    • (SI) Failed, The Problem with Being Born Again Is (n.d.), Internet Archive (IA) edition, accessed May 31, 2013, http://archive.org/details/TheProblemWithBeingBornAgainIs
    • The Problem with Being Born Again Is
  • 11
    • 84961329969 scopus 로고    scopus 로고
    • note
    • Cantorion Sheet Music Collection (n.d.), Internet Archive edition, accessed May 31, 2013, http://archive.org/details/Cantorion_sheet_music_collection.
    • Cantorion Sheet Music Collection
  • 13
    • 61449509039 scopus 로고    scopus 로고
    • note
    • The Google Books preview version of Peter Shillingsburg's From Gutenberg to Google, cited later in this paper, is a good example. Chapter 1 launches immediately after the image of the cover page. The front matter and the Introduction, pp. 1-10, are missing. It does not observe the gap, although it notes page gaps elsewhere in the preview. Accessed May 31, 2013, http://books.google.ca/ books?id=rd57F8IjyF0C&printsec=frontcover#v=onepage&q&f=false.
    • From Gutenberg to Google , pp. 1-10
    • Shillingsburg, P.1
  • 14
    • 79960081837 scopus 로고    scopus 로고
    • The Electronic Book
    • note
    • Eileen Gardiner and Ronald G. Musto, "The Electronic Book," in The Oxford Companion to the Book, ed. Michael F. Suarez and H. R. Woudhuysen (Oxford: Oxford University Press, 2010), 271-84, reprinted with permission in the Wall Street Journal, March 4, 2010, accessed August 3, 2013, http://online.wsj.com/article/SB100014240527487041872045751021104263 33220.html?mod=googlenews_wsj.
    • (2010) The Oxford Companion to the Book , pp. 271-284
    • Gardiner, E.1    Musto, R.G.2
  • 15
    • 84961347714 scopus 로고
    • note
    • E. G. Lutz, What to Draw and How to Draw It (New York: Dodd, Mead, 1913), IA edition, accessed August 5, 2014, https://archive.org/details/whattodrawhowtod00lutz.
    • (1913) What to Draw and How to Draw It
    • Lutz, E.G.1
  • 16
    • 84961347714 scopus 로고
    • note
    • E. G. Lutz, What to Draw and How to Draw It (1913), Kindle edition, accessed August 5, 2014, http://www.amazon.com/What-draw-Edwin-George-Lutz/dp/1179652533 [follow Look Inside link].
    • (1913) What to Draw and How to Draw It
    • Lutz, E.G.1
  • 17
    • 0011472087 scopus 로고    scopus 로고
    • note
    • Jane Austen, Pride and Prejudice (n.d.), Project Gutenberg edition, accessed August 3, 2013, http://www.gutenberg.org/ebooks/1342.
    • Pride and Prejudice
    • Austen, J.1
  • 18
    • 0011472087 scopus 로고    scopus 로고
    • note
    • Austen, Pride and Prejudice (n.d.), IA edition, accessed August 1, 2014, https://archive.org/ details/prideandprejudic01342gut.
    • Pride and Prejudice
    • Austen1
  • 21
    • 34247373323 scopus 로고
    • note
    • Michael Hart, "The History and Philosophy of Project Gutenberg," 1992, Project Gutenberg, accessed April 26, 2013, http://www.gutenberg.org/wiki/Gutenberg:The_History_and_ Philosophy_of_Project_Gutenberg_by_Michael_Hart.
    • (1992) The History and Philosophy of Project Gutenberg
    • Hart, M.1
  • 23
    • 84922178126 scopus 로고    scopus 로고
    • note
    • DPLA, The Digital Public Library of America (n.d.), April 26, 2013, http://dp.la/info/wpcontent/ uploads/2011/08/DPLA_PressKit_About-the-Digital-Public-Library-of-America_ version-w-fact-box1.pdf.
    • (2013) The Digital Public Library of America
  • 24
    • 84961348696 scopus 로고    scopus 로고
    • Search strategy: http://dp.la/search?q=alice%27s+adventures+in+wonderland.
    • Search strategy
  • 26
    • 84961326777 scopus 로고    scopus 로고
    • CRL, Global Resources Network
    • note
    • CRL, Global Resources Network, TRAC and TDR Checklists (2007), accessed August 12, 2014, http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and- certifying-0.
    • (2007) TRAC and TDR Checklists
  • 27
    • 3042511217 scopus 로고    scopus 로고
    • note
    • Research Libraries Group (RLG), Trusted Digital Repositories: Attributes and Responsibilities, An RLG-OCLC [Online Computer Library Center] Report (2002), accessed August 11, 2014, http://www.oclc.org/content/dam/research/activities/trustedrep/repositories. pdf?urlm=161690.
    • (2002) Trusted Digital Repositories: Attributes and Responsibilities, An RLG-OCLC
  • 30
    • 84961298888 scopus 로고    scopus 로고
    • CRL, Global Resources Network
    • CRL, Global Resources Network, TRAC and TDR Checklists.
    • TRAC and TDR Checklists
  • 31
    • 84926505180 scopus 로고    scopus 로고
    • Measuring Content Quality in a Preservation Repository: HathiTrust and Large-Scale Book Digitization
    • note
    • Paul Conway, "Measuring Content Quality in a Preservation Repository: HathiTrust and Large-Scale Book Digitization," Proceedings of 7th International Conference on Preservation of Digital Objects, iPres 2010, September 19-24, 2010, Vienna, Austria, 95-102, accessed August 11, 2014, http://deepblue.lib.umich.edu/bitstream/handle/2027.42/85227/C06%20 Conway%20Measuring%20Content%20Quality%20iPres%202010.pdf?sequence=1.
    • (2010) Proceedings of 7th International Conference on Preservation of Digital Objects
    • Conway, P.1
  • 32
    • 84961303190 scopus 로고    scopus 로고
    • CRL, Global Resources Network
    • note
    • CRL, Global Resources Network, Certification and Assessment, accessed August 13, 2014, http://www.crl.edu/archiving-preservation/digital-archives/certification-and- assessment-digital-repositories.
    • Certification and Assessment
  • 37
    • 0003631629 scopus 로고    scopus 로고
    • note
    • I consulted a broad range of resources to derive this generalized schematic of e-book production, including digitization project guides, blogs, and FAQs. The following are a selection of prominent sources: tutorial created by Anne R. Kenney and Oya Rieger at Cornell University: "Moving Theory into Practice: Digital Imaging Tutorial," http:// www.library.cornell.edu/preservation/tutorial/index.html.
    • Moving Theory into Practice: Digital Imaging Tutorial
    • Kenney, A.R.1    Rieger, O.2
  • 38
    • 84961295681 scopus 로고    scopus 로고
    • Federal Agencies Digitization Guidelines Initiative
    • Federal Agencies Digitization Guidelines Initiative (2009), Digitization Activities: Project Planning and Management Outline, http://www.digitizationguidelines.gov/guidelines/DigActivities-FADGI-v1-20091104. pdf.
    • (2009) Digitization Activities: Project Planning and Management Outline
  • 41
    • 84961362946 scopus 로고    scopus 로고
    • note
    • digital repository FAQ pages, for example, PG's Volunteer FAQ, http://www. gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ.
    • Volunteer FAQ
  • 42
    • 84961306751 scopus 로고    scopus 로고
    • note
    • an e-book production manual, James Simmons, E-Book Enlightenment (Amsterdam: Floss Manuals, 2010), http:// en.flossmanuals.net/_booki/e-book-enlightenment/e-book-enlightenment.pdf.
    • (2010) E-Book Enlightenment
    • Simmons, J.1
  • 43
    • 84961302632 scopus 로고    scopus 로고
    • note
    • Deutsche Forschungsgemeinschaft (DFG), DFG Practical Guidelines on Digitisation (Bonn, Ger.: DFG, 2013), accessed July 24, 2014, http://www.dfg.de/formulare/12_151/12_151_ en.pdf.
    • (2013) DFG Practical Guidelines on Digitisation
  • 44
    • 84930483454 scopus 로고    scopus 로고
    • Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEIAnnotated Historical Text
    • note
    • Susanne Haaf, Frank Wiegand, and Alexander Geyken, "Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEIAnnotated Historical Text," in Fotis Jannidis, Malte Rehbein, and Laurent Romary, eds., Journal of the Text Encoding Initiative [TEI], Selected Papers from the 2011 TEI Conference 4 (March 2013), accessed July 28, 2014, http://jtei.revues.org/739.
    • (2013) Journal of the Text Encoding Initiative [TEI], Selected Papers from the 2011 TEI Conference , pp. 4
    • Haaf, S.1    Wiegand, F.2    Geyken, A.3
  • 45
    • 70450287533 scopus 로고    scopus 로고
    • How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs
    • note
    • Rose Holley, "How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs," D-Lib Magazine 15, 3/4 (March/April 2009), accessed May 31, 2013, http://www.dlib.org/dlib/march09/holley/03holley.html.
    • (2009) D-Lib Magazine , vol.15 , Issue.3-4
    • Holley, R.1
  • 46
  • 50
    • 84961353078 scopus 로고
    • note
    • Padraic Colum, The Forge in the Forest (New York: Macmillan, 1925), HathiTrust edition, accessed April 26, 2013, http://catalog.hathitrust.org/Record/000743736.
    • (1925) The Forge in the Forest
    • Colum, P.1
  • 53
    • 28244475190 scopus 로고    scopus 로고
    • note
    • Progress Report is dated November 2007 (http://www.ulib.org/ULIBProgressReport.htm).
    • (2007) Progress Report
  • 54
    • 84894294519 scopus 로고    scopus 로고
    • note
    • Universal Library, IA, https://archive.org/ details/universallibrary.
    • Universal Library
  • 56
    • 84961313952 scopus 로고
    • note
    • Esmé Wingfield-Stratford, Churchill: The Making of a Hero (London: Victor Gollancz, 1942), accessed April 29, 2013, http://books.google.ca/books/ about/Churchill.html?id=kFhnAAAAMAAJ&redir_esc=y.
    • (1942) Churchill: The Making of a Hero
    • Wingfield, E.1
  • 57
    • 84961363117 scopus 로고    scopus 로고
    • note
    • The image file was viewable as a compiled e-book when the book was first accessed in April 2013. IA has since disassembled it into a file stack. A metadata note indicates the title is under "review" by the Million Book Project.
  • 59
    • 84961386444 scopus 로고    scopus 로고
    • Crowd Sourcing
    • note
    • Christoph Albers, "Crowd Sourcing," News from IFLA [International Federation of Library Associations and Institutions], July 16, 2012, accessed August 29, 2013, http://www.ifla. org/news/crowd-sourcing.
    • (2012) News from IFLA
    • Albers, C.1
  • 60
    • 79957867143 scopus 로고    scopus 로고
    • An Analysis of Problems in Metadata Records
    • Chuttur M. Yasser, "An Analysis of Problems in Metadata Records," Journal of Library Metadata 11, 2 (2011): 52.
    • (2011) Journal of Library Metadata , vol.11 , Issue.2 , pp. 52
    • Yasser, C.M.1
  • 61
    • 11044233805 scopus 로고    scopus 로고
    • Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process
    • Sarah Currier, Jane Barton, Rónán O'Beirne, and Ben Ryan, "Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process," Research in Learning Technology 12, 1 (2004), 1-20.
    • (2004) Research in Learning Technology , vol.12 , Issue.1 , pp. 1-20
    • Currier, S.1    Barton, J.2    O'Beirne, R.3    Ryan, B.4
  • 62
    • 84961372482 scopus 로고    scopus 로고
    • Dublin Core Metadata Initiative (DCMI)
    • note
    • Dublin Core Metadata Initiative (DCMI), Dublin Core Metadata Element Set, Version 1.1, accessed July 30, 2014, http://dublincore.org/documents/dces/.
    • Dublin Core Metadata Element Set, Version 1.1
  • 63
    • 84961294714 scopus 로고
    • note
    • Mitchell Smith, The Art of Caricaturing (Chicago: Frederick J. Drake, 1941), IA edition, accessed April 26, 2013, https://archive.org/details/artofcaricaturin006061mbp.
    • (1941) The Art of Caricaturing
    • Smith, M.1
  • 64
    • 84961380926 scopus 로고    scopus 로고
    • HathiTrust Digital Library
    • note
    • HathiTrust Digital Library, Bibliographic Metadata Correction Policy, accessed August 11, 2014, http://www.hathitrust.org/bib_metadata_correction.
    • Bibliographic Metadata Correction Policy
  • 65
    • 84961322648 scopus 로고    scopus 로고
    • Rethinking HathiTrust Metadata to Support Workset Creation for Scholarly Analysis
    • note
    • Katrina Fenlon, Timothy Cole, Myung-Ja Han, Craig Willis, and Colleen Fallaw, Rethinking HathiTrust Metadata to Support Workset Creation for Scholarly Analysis, poster presented at Digital Humanities 2014, Lausanne, Switz., July 7-12, 2014, accessed August 11, 2014, http://dharchive.org/paper/DH2014/Poster-438.xml.
    • Digital Humanities 2014
    • Fenlon, K.1    Cole, T.2    Han, M.-J.3    Willis, C.4    Fallaw, C.5
  • 66
    • 84859261220 scopus 로고    scopus 로고
    • Google Books: The Metadata Mess
    • note
    • Geoffrey Nunberg, "Google Books: The Metadata Mess," in Google Book Settlement Conference, University of California, Berkeley, August 28, 2009, accessed April 5, 2013, http://people.ischool.berkeley.edu/~nunberg/GBook/GoogBookMetadataSh.pdf.
    • (2009) Google Book Settlement Conference
    • Nunberg, G.1
  • 67
    • 77956894398 scopus 로고    scopus 로고
    • Google Books: A Metadata Train Wreck
    • note
    • Geoffrey Nunberg, "Google Books: A Metadata Train Wreck," Language Log, August 29, 2009, accessed April 5, 2013, http://languagelog.ldc.upenn.edu/nll/?p=1701.
    • (2009) Language Log
    • Nunberg, G.1
  • 68
    • 84961325324 scopus 로고    scopus 로고
    • Catalog of Errors
    • note
    • Matthew Reisz, "Catalog of Errors," Inside Higher Ed, December 8, 2011, accessed April 5, 2013, http://www.insidehighered.com/news/2011/12/08/scholar-continues-find- flawed-metadata-google-books.
    • (2011) Inside Higher Ed
    • Reisz, M.1
  • 69
    • 84900123647 scopus 로고
    • note
    • James Terry White, ed., The National Cyclopaedia of American Biography (New York: J. T. White, 1898), Google Books edition, accessed April 29, 2013, http://books.google.ca/books ?id=jPApAQAAMAAJ&q=greta+garbo&dq=greta+garbo&hl=en&sa=X&ei=9b9-UfvhCNO LqQHt1oCgBw&ved=0CDAQ6AEwADgK.
    • (1898) The National Cyclopaedia of American Biography
    • White, J.T.1
  • 71
    • 84961299650 scopus 로고    scopus 로고
    • note
    • While the reward of association with a contributed e-book or "favored volunteer" designation may exercise some quality control, IA Community Text volunteers remain anonymous, making accountability difficult. While volunteers must register for an IA Virtual Library Card using a valid e-mail address, the public knows them only through aliases or "screen" names to preserve anonymity.
  • 72
    • 33746126444 scopus 로고    scopus 로고
    • Languages, Books, and Reading from the Printed Word to the Digital Text
    • Roger Chartier, "Languages, Books, and Reading from the Printed Word to the Digital Text," Critical Inquiry 31, 1 (2004): 133-41.
    • (2004) Critical Inquiry , vol.31 , Issue.1 , pp. 133-141
    • Chartier, R.1
  • 75
    • 84955141635 scopus 로고    scopus 로고
    • Distributed System Principles
    • note
    • Wolfgang Emmerich, "Distributed System Principles," in Distributed Systems 98/99 (lecture notes, Department of Computer Science, University College London, 1997), accessed April 26, 2013, http://www.cs.ucl.ac.uk/staff/ucacwxe/lectures/ds98-99/dsee3.pdf.
    • (1997) Distributed Systems 98/99
    • Emmerich, W.1
  • 77
    • 84961330348 scopus 로고    scopus 로고
    • Why Are E-Books Riddled with Typos?
    • note
    • Tim Worstall, "Why Are E-Books Riddled with Typos?" Technology, October 27, 2012, accessed April 5, 2013, http://www.forbes.com/sites/timworstall/2012/10/27/why- are-e-books-riddled-with-typos/.
    • (2012) Technology
    • Worstall, T.1
  • 80
    • 84866871914 scopus 로고    scopus 로고
    • note
    • IA hosts several versions of Alice in Wonderland. If download statistics are any indication, then readers have voted overwhelmingly for quality by downloading IA's ingested version from PG, proofread by the Distributed Proofreaders project, much more frequently than other versions, including IA's own raw OCR text file. https://archive. org/search.php?query=alice%27s%20adventures%20in%20wonderland%20AND%20 mediatype%3Atexts, accessed July 23, 2014.
    • Alice in Wonderland
  • 82
    • 84961309204 scopus 로고    scopus 로고
    • eMOP Project Receives Funding from Andrew W. Mellon Foundation
    • note
    • "eMOP Project Receives Funding from Andrew W. Mellon Foundation," Initiative for Digital Humanities, Media, and Culture (IDHMC), September 28, 2012, accessed July 19, 2013, http://idhmc.tamu.edu/blog/2012/09/28/emop-project-to-be-funded-by- andrew-w-mellon-foundation/.
    • (2012) Initiative for Digital Humanities, Media, and Culture (IDHMC)
  • 84
    • 84961292272 scopus 로고    scopus 로고
    • Potential Content
    • note
    • "Potential Content," [Wiki] DPLA, April 5, 2012, accessed 26 April 2013, http://cyber.law. harvard.edu/dpla/Potential_Content.
    • (2012) [Wiki] DPLA


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.