Quantcast
Channel: Mashable
Viewing all articles
Browse latest Browse all 154870

5 Things the Library of Congress is Archiving Online

$
0
0

In 2000, the Library of Congress started a pilot web archiving project focused on the presidential election. After the Sept. 11, terrorist attacks in 2001, the pilot project expanded and eventually became a permanent fixture of our national archives. Five full-time staff members orchestrate an open-source web crawler called Heretix to capture the Internet’s content for future generations.

“Part of it is the election question: What do we want to archive?” says Abbie Grotke, a digital media project specialist on the Library of Congress’s web archiving team. “We can’t easily identify what is the ‘U.S. web.’ We can’t just say we want to get everything that’s ‘.com’ or ‘.gov.’ So we do have to do this selective process.”

So what does the Library of Congress think is worth saving? Here are the portions of today’s web your grandchildren will be able to access through the Library of Congress:


1. Twitter feeds—all of them


The Library of Congress announced in April that it would begin archiving Twitter feeds. Some Twitter feeds had already been archived in the past as part of special projects—for instance, some tweets regarding the nomination of Supreme Court Justice Sonia Sotomayor were included in the collection about Supreme Court changes. But now Twitter has plans to donate their entire archive of public content.

Which means your tweets, my tweets, and Britney Spears’s tweets will all become a part of the archives. What is not yet clear, is exactly how all of these tweets will be used.

“The point is not to provide a Twitter interface at the library that you can go in and use like they do on the current website,” Grotke says. “There’s talk of more of a researcher, data mining –type access to it. We’re still trying to figure out what that is exactly, but people probably won’t be able to go in and look for you specifically.”


2. National Election Candidates’ Internet Presences


The Library’s web archive started with a project that documented an election, and much of its work continues to revolve around this topic. The archive collects about 2,500 snapshots of websites during every election cycle.

“A lot of what we do, particularly with the elections, goes away rather quickly,” Grotke says. “If the candidate loses the election, their website disappears.”

The archives include presidential, congressional, and even overseas elections. The Library’s foreign operations offices document elections in those regions. Researchers of the future, for instance, will be able to see the web that surrounded the 2009 general elections in India and Indonesia.


3. Facebook Pages—A Selective Few


The web crawler often follows candidates’ or congress people’s websites to their public Facebook Pages. While Facebook has made no Twitter-like deal to donate archives to the Library, sites on the social media platform inevitably come up while documenting major events.

Thus far, the Library has left it up to the author of the Page—not Facebook—to give permission to archive relevant pages.

“The position that we’ve taken so far is that the content we’re archiving is actually owned by the site owner who put it up there,” Grotke says. “We’ve been asking permission of the original site owner.”

So unless you’re a national election candidate who has given permission, you probably don’t have to worry about your grandchild stumbling across an embarrassing Facebook photo while doing archival research for his or her college thesis.


4. Notable Historical Events


The Library has also been archiving Congressional websites since 2002. The web archive team has collected websites regarding Supreme Court changes, the Sept. 11, attacks, the 2005 papal transition, Hurricane Katrina, the Iraq war and the crisis in Darfur. A full list of current projects is available here.


5. News Sites That Give Permission


Unlike libraries in some other countries, the Library of Congress has no legal mandate to preserve the web. Therefore, the web archive team can’t collect everything they would like to without asking permission. Because news sites and blogs earn money on their content, the Library needs to get consent before it includes their pages in the archives.

Grotke says that few news organizations that the web archive team contacts for permission ever respond, which means that not much of the content in the web archives comes from news sites.


More social media resources from Mashable:


- 5 Ways Government Works Better With Social Media
- How the U.S. Engages the World with Social Media
- How Social Media Can Effect Real Social and Governmental Change
- 6 Ways Law Enforcement Uses Social Media to Fight Crime
- Why Open Source is the New Software Policy in San Francisco

Image courtesy of iStockphoto, LawrenceSawyer


Reviews: Facebook, Twitter, iStockphoto

Tags: archive, facebook, government, history, Library of Congress, politics, research, twitter



Viewing all articles
Browse latest Browse all 154870

Trending Articles