Continuing Education to Advance Web Archiving (CEDWARC)


  • Edward Fox, Professor, Department of Computer Science, Virginia Tech

  • Martin Klein, Scientist, Los Alamos National Lab Research Library

  • Michael Nelson, Professor, Department of Computer Science, Old Dominion University

  • Daniel Kerchner, Senior Software Developer, George Washington University Library

  • Laura Wrubel, Software Development Librarian, George Washington University Library (now at Stanford University Library)

  • Helge Holzmann, Internet Archive

  • Ian Milligan, Professor of History, Associate Vice-President for Research Oversight and Analysis, University of Waterloo

External Funding

  • Zhiwu Xie (PI), Edward Fox (Co-PI), Martin Klein (Co-PI), Michael Nelson (Co-PI), Daniel Kerchner (Co-PI), Helge Holzmann (Co-PI), Ian Milligan (Co-PI). Continuing Education to Advance Web Archiving. $248,451, Institute of Museum and Library Services. RE-70-18-0005-18


Web archiving is a promising growth area for library and archive services. In the past few decades, tens of petabytes of web content have been collected and archived by memory institutions. Since the web is by nature of high volume, velocity, variety, and veracity, web archives are increasingly used in ways beyond traditional searching, browsing and close reading. Suites of open-source tools have been developed, many supported by IMLS in the NDP project category, to assist researchers conducting analyses and extracting knowledge. These tools usually assume a high level of data literacy, sometimes even proficiency in big data processing and analysis. Yet, it is unreasonable to require patrons to be highly tech-savvy in order to use web archives. Neither is it realistic to perpetually fixate tool builders’ time and efforts on customer support. Accordingly, very few librarians or archivists have been trained to understand, utilize, maintain, and manage these tools. This at least partially explains why so few libraries and archives are providing web archiving and analytics services that satisfactorily address the needs of their patrons.

This project aims to bridge this skill gap by training library and archive professionals to work on real-life web archive research questions using the cutting-edge tools developed for these purposes. They will be exposed to perspectives of researchers interested in archived content, archive patrons, and tool builders. The training will equip them with a deeper understanding of the patrons’ needs, the web archives used as data sources, the tools developed to process the data, and the potential library services that can be offered based on the above.



  • Oct 28, 2019, Gelman Library, George Washington University. CEDWARC in-person workshop, attended by 39 people from around the world

  • Oct 1 - 29, 2021. CEDWARC online workshop. 111 people registered.

  • For workshop videos and slides, please visit the project website.

Related Publications

  • Shawn Jones. (2021). Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives [PhD Dissertation, Old Dominion University].

  • Liuqing Li. (2020). Event-related Collections Understanding and Services [PhD Dissertation, Virginia Tech].

  • Li, L., Geissinger, J., Ingram, W. A., & Fox, E. A. (2020). Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning. Data and Information Management, 4(1), 18–43.