Libraries Against Filters

Why Internet Filters Don’t Work and Why Libraries who Filter are Wrong

Librarian in Black | Sarah Houghton | May 7, 2010

http://librarianinblack.net/librarianinblack/2010/05/filtering.html

A Washington State Supreme Court decided yesterday in a 6-3 decision that public library internet filtering is not censorship, because filtering is “collection development.” You can read more in Library Journal, on ReadWriteWeb, or read the actual court decision and the majority and dissenting opinions.

My reaction is simple, as someone who has fought, and won, an internet filtering challenge in my own library. Our communities’ intellectual freedom is at risk.

This is a huge step backward for intellectual freedom. And if we follow the logic in this case, the Library is leaving their internet collection development up to an automated software system and some untrained minimum-wage lackeys at the filtering company. Filters are not collection development and filters don’t work. My frustration at the decision-makers’ lack of education about these issues is immeasurable.

I posted comments on the LJ & RWW sites.  Those comments are duplicated below.  If you want to know more about filters, read on.

This is a gigantic issue for public libraries and I have serious fear about what this means for our communities’ future of information access.

ReadWriteWeb’s coverage brought up the ethical argument against filtering. Just because someone is using a library computer, does that mean that he or she automatically has less access to information? It shouldn’t, and libraries are fighting for information access rights every day.

Besides the ethical argument against filtering there are plenty of practical arguments. Namely, filters don’t work, they cost a lot of money, and take a lot of time to operate.

I’m the Digital Futures Manager for the San Jose Public Library. A couple of years ago, a filtering challenge was brought by one of our city council members to the library. We were told to filter, we said no, and we embarked upon an extensive study about the effectiveness of filters, which you can find at:http://www.sjpl.org/sites/all/files/userfiles/agen0208_report.pdf. The overall results? Internet filtering software **does not work**.

Looking at our own library’s study as well as all of the published studies done in the last decade [**see the end of this post for a complete table**], it’s consistently found that 15-20% of the time, content is over-blocked (e.g. benign sites that are blocked incorrectly). And 15-20% of the time, content is under-blocked (e.g. sites deemed “bad” gets through anyway). We found that overall, filters have only about a 60-70% accuracy rate for traditional text content.  Looking at all surveys of filtering accuracy from 2001-2008 (no studies have been done in 2009 or 2010 that I’m aware of), the average accuracy of all the tests combined from 2001-2008 was 78.347%, and that is measuring only text content with only one study looking at images.  If we think “well, filters get better over time, right?” and only look at studies from 2007-2008, we see a nominally higher accuracy percentage: 83.316%.  So, while filters may be getting a little better…they’re still wrong 17% of the time for text content, and over half the time for image, video, and other non-text content.  If you think about what that means practically speaking for your browsing experience, you may think: “We’re spending money and time on these systems why again?”

Filters simply do not work on multimedia content, which is usually what people think the filters are for (naughty videos and photos). The accuracy in filtering images, audio, video, RSS feeds, and social networking content is embarrassingly low: about 40%. That means that *over half the time*, the filter makes the wrong decision about blocking a photo or video. Again, why would we foist these failed systems willingly upon our communities?

And how do filters work? There are automated little spiders crawling the web, looking for naughty content — usually there’s a formula (which the companies will never tell you) that looks for some combination of trigger keywords, trigger URLs, if there are too many images on the site, a weird combination of letters & numbers in the URL, etc. If the spider determines something fits in the “naughty” category, then there it goes. If the company is particularly vigilant (often not the case), they will have some minimum-wage untrained lackey spot-checking results from the spider. So if a filter constitutes collection development, we have left our online collection development in the hands of an automated software system and untrained non-library staff. Worse yet, the company won’t even tell us why or how they choose to categorize items. You usually do have the ability to add things to the white list (OK stuff) or black list (naughty stuff). But as subjectivity is key in issues of content, even among library staff, who gets to decide what is bad and what isn’t?

Also concerning is that library customers report usually not being willing to ask for something to be unblocked for them as they are embarrassed as the library has automatically put them in the position of looking at something “naughty” even if it isn’t. So how many of our library customers walk away without the information they need? And whose fault is that? Ours!

Beyond that, the time that it takes for staff to unblock sites and handle the administrative paperwork to do so is incredible — many libraries estimate it at 60 minutes of staff time per request. The return on investment of dollar and time investment is negative. You lose when you install a filter.

And that’s the bottom line. Filters make the library lose money and time. Filters make the customers lose access, time, and confidence in the library’s use and relevance.

People who want to install filters in libraries have the best intentions (usually). They think that it will “protect the children” or “filter out pictures of penises.” Sadly, the technology has not caught up with our expectations for how it should work. People truly believe that filters work, but only because they haven’t looked at the research or tried one out themselves. If there were filters that didn’t overblock or underblock, I’d be the first in line to take a look at them. But the software is fallible. And turning over an entire community’s freedom of information access to a known-failed software system is just about the most foolish thing any library could choose to do.

Filtering Studies and Their Findings, 2001=2008 (no studies found in 2009 or 2010)

Average accuracy 2001-2008: 78.347%
Average accuracy 2007-2008: 83.316%

(someone made an argument that if we only count recent survey results, the accuracy will be significantly higher, but it’s less than 5% higher, within the margin of error cited in all of these surveys)

Date Title Source Summarized Conclusions
2008 Protecting Children on the Net with Content Filtering EU Safer Internet
  • Average score from tests of 26 different filtering tools showed the following levels of accuracy in both blocking trigger websites and allowing non-trigger sites:
  • Testing content appropriate for kids 10 years or younger – 2.2/4.0 (55%)
  • Testing content appropriate for kids 10 years or younger (testing pornographic content only) – 2.8/4.0 (70%)
  • Testing content appropriate for kids 11-17 – 2.0/4.0 (50%)
  • Testing content appropriate for kids 11-17 (testing pornographic content only) – 2.8/4.0 (70%)
  • Study noted that “filtering solutions are not yet capable of accurately filtering typical Web 2.0 user generated content, such as video clips on YouTube or MySpace, and harmful scenes in Second Life” and “filtering on chat and IM is often inconsistent.”
2008 Closed Environment Testing of ISP-level Internet Content Filtering Australian Communications and Media Authority
  • 84%-95% accuracy blocking trigger websites
  • 92%-97% accuracy allowing non-trigger sites
  • The success of all products at blocking trigger websites was inversely proportional to their success at allowing non-trigger sites.
2008 Deep Throat Fight Club Open Testing of Porn Filters Untangle
  • Fortinet 97.7% accuracy blocking trigger websites
  • Watchguard 97.3% accuracy blocking trigger websites
  • Websense 97.0% accuracy blocking trigger websites
  • SonicWall 96.1% accuracy blocking trigger websites
  • Barracuda 94.0% accuracy blocking trigger websites
  • Average of 99% accuracy allowing non-trigger sites
2008 Expert Report Dr. Paul Resnick (for North Central Regional Library District)
  • 93.1% accuracy blocking trigger websites
  • 48% accuracy blocking trigger images
2007 Report on the Accuracy Rate of FortiGuard Bennet Haselton (for the ACLU)
  • 88.1% overall accuracy on .com sites
  • 76.4% overall accuracy on .org sites
2006 Expert Report Philip B. Stark (for the DOJ)
  • 87.2%-98.6% accuracy blocking “sexually explicit materials”
  • 67.2%-87.1% accuracy allowing “non-sexually explicit materials”
2006 Websense: Web Filtering Effectiveness Study Veritest (for Websense)
  • WebSense: 85% overall accuracy
  • SmartFilter: 68% overall accuracy
  • SurfControl: 74% overall accuracy
2004 Report on the evaluation of the final version of the NetProtect Product Net-Protect.org
  • Surf-mate: 85% accuracy blocking trigger content and 89% accuracy allowing non-trigger content
  • CyberPatrol: 44% accuracy blocking trigger content and 95% accuracy allowing non-trigger content
  • Net Nanny: 18% accuracy blocking trigger content and 97% accuracy allowing non-trigger content
  • CYBERsitter: 24% accuracy blocking trigger content and 97% accuracy allowing non-trigger content
  • Cyber Snoop: 3% accuracy blocking trigger content and 99% accuracy allowing non-trigger content
  • NetProtect 2: 96% accuracy blocking trigger content and 83% accuracy allowing non-trigger content
2003 Internet Blocking in Public Schools Online Policy Group
  • School curriculum materials accessed with filters set to least restrictive settings: 95-99.5% accuracy
  • School curriculum materials accessed with filters set to most restrictive settings: 30% accuracy
2002 Corporate Content Filtering Performance and Effectiveness Testing Websense Enterprise v4.3 eTesting Labs (for Websense)
  • SuperScout: 90% accuracy blocking “adult” materials
  • SmartFilter: 90% accuracy blocking “adult” materials
  • WebSense: 95% correct accuracy blocking “adult” materials
2002 No Evil: How Internet Filters Affect the Search for Health Information Kaiser Family Foundation
  • 98.6% accuracy in accessing health information on least restrictive settings
  • 95% accuracy in accessing health information on intermediate restrictive settings
  • 76% accuracy in accessing health information on most restrictive settings
2001 Expert report of Dr. Joseph Janes Dr. Joseph Janes (for the ACLU)
  • 34.3% accuracy in allowing non-trigger content
2001 Internet Filtering Accuracy Review Cory Finnell for the Certus Consulting Group (for the DOJ)
  • CyberPatrol: 92.01%-95.31% overall accuracy
  • Websense: 89.97%-94.75% overall accuracy
  • Bess: 93.08%-91.64% overall accuracy
2001 Updated Web Content Software Filtering Comparison Study eTesting Labs (for the DOJ)
  • 92% average accuracy of four filters in blocking “objectionable” content
  • 96% average accuracy of four filters in allowing non-trigger content
2001 Digital Chaperones for Kids Consumer Reports
  • Cybersitter 2000: 78% accuracy blocking “objectionable” content
  • Internet Guard Dog: 70% accuracy blocking “objectionable” content
  • AOL’s Young Teen Control: 63% accuracy blocking “objectionable” content
  • CyberPatrol: 77% accuracy blocking “objectionable” content
  • NetNanny: 48% accuracy blocking “objectionable” content
  • NIS Family Edition: 80% accuracy blocking “objectionable” content
2001 Effectiveness of Internet Filtering Software Products Paul Greenfield, Peter Rickwood, and Huu Cuong Tran (for the AustralianBroadcasting Authority)
  • N2H2 (now Bess), set to “maximum filtering,” was reported as the most effective  filter tested in this study
  • 95% accuracy blocking the “pornography/erotica” category
  • 75% accuracy blocking the “bomb-making/terrorism” category
  • 65% accuracy blocking the “racist/supremacist/Nazi/hate” category
  • 40% accuracy allowing non-trigger content in the “art/photography” category
  • 60% accuracy allowing non-trigger content in the “sex education” category
  • 70% accuracy allowing non-trigger content in the “atheism/
  • anti-church” category
  • 80% accuracy allowing non-trigger content in the “gay rights/politics” category
  • 85% accuracy allowing non-trigger content in the “drug education” category
2001 Report for the EuropeanCommission: Review of Currently Available COTS Filtering Tools Sylvie Brunessaux et al.
  • Average of the 10 filters tested
  • 67% accuracy blocking trigger sites in English
  • 52% accuracy blocking trigger sites in five languages
  • 91% accuracy allowing non-trigger content

Sarah Houghton

My name is Sarah Houghton and I am working as the Director for the San Rafael Public Library (California), a two library system serving our town of 60,000. And, of course, I write this Librarian in Black blog website thing which has been around since 2003.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s