Library of Congress Library Releases Growing Coronavirus Web Archive Collection | Compendium

Collection Includes 450 Web Archives Documenting COVID-19 Pandemic

February 2, 2022     After collecting a wide variety of web content documenting the COVID-19 pandemic over the past two years, the Library of Congress is now making its growing Coronavirus Web Archive available to the public.

The collection, which now includes 450 web archives, aims to balance government, science, business and policy content with human stories that will give future historians a sense of how the COVID-19 pandemic impacted the daily lives of individuals, families and communities.

The Library has been capturing coronavirus web content in many of its existing web collections since the start of the COVID-19 pandemic, well before establishing a formal collection plan in June 2020. Since the Library is a member of the International Internet Preservation Consortium, Library staff also nominated sites for that effort.

For the Coronavirus Web Archive, a core team of 10 recommending officers representing a variety of skills, perspectives and subject matter expertise from across the Library have worked together to build a well-rounded collection. Additionally, international collections librarians and overseas offices made contributions to ensure that the COVID-19 pandemic is represented in a truly global collection.

“We didn’t know anything about COVID-19 when the pandemic began, but at the Library of Congress, we did know how historical pandemics are researched,” said Jennifer Harbster, head of the Library’s Science Reference Section. “We may not know exactly what future historians will be looking for when they tell the story of these remarkable years, but by looking at our materials from the Influenza of 1918 and broadening our scope to include areas beyond science like, policy, the arts, and social content, we hope to present a collection that will serve future researchers.”

The Library began building web archive collections in 2000 to gather web-based information that focused on specific themes or events as they unfolded. Over the past two decades, the Library’s web archive collections have grown to hold over 2.8 petabytes of data in over 21 billion files. With so much content published on the web, curators still cannot capture everything, so the Library has refined its collections process with a multidisciplinary, team-driven approach.

The Coronavirus Web Archive team continues to seek good examples of items that represent how Americans and people from across the globe are responding to the pandemic. The collection includes topics such as containment efforts, legal responses, human resource approaches, virtual education methods, unemployment trends, and artistic responses to the global challenge.

Library subject specialists are currently collecting content on vaccine rollouts, testing, virus variants, face mask guidance and developing subjects, such as guidance for students and teachers returning to the classroom. New content will continue to be released monthly, following a one-year embargo, as a part of this ongoing collection.

The Library of Congress is the world’s largest library, offering access to the creative record of the United States — and extensive materials from around the world — both on-site and online. It is the main research arm of the U.S. Congress and the home of the U.S. Copyright Office. Explore collections, reference services and other programs and plan a visit at; access the official site for U.S. federal legislative information at; and register creative works of authorship at