Cyber Patrol error rate for first 1,000 .com domains

10/23/2000
Bennett Haselton, bennett@peacefire.org

Introduction

Using "zone files" from Network Solutions (which list all .com domains in existence), we obtained a list of the first 1,000 active ".com" domains on the Internet as of June 14, 2000. From this sample, we determined how many of these sites were blocked as "pornography" by Cyber Patrol, and of those blocked sites, how many were actually pornographic.

Results

Of the first 1,000 working .com domains, 121 were blocked by Cyber Patrol as "Sexually Explicit". Of these 121 blocked sites, we eliminated 100 sites that were "Under construction" pages (the list of 100 non-functioning sites is here).

Error rate for domains: 81% About four non-pornographic domains blocked by Cyber Patrol as "Sexually Explicit", for every one pornographic domains blocked.

We tested Cyber Patrol with the "Full Nudity", "Partial Nudity" and "Sexual Acts/Text" categories turned on. Cyber Patrol's published category definitions define these categories as follows:

Partial Nudity
Pictures exposing the female breast or full exposure of either male or female buttocks except when exposing genitalia. The Partial Nudity category does not include swimsuits (including thongs).

Full Nudity
Pictures exposing any or all portions of the human genitalia.
Please note: The Partial Nudity and Full Nudity categories do not include sites containing nudity or partial nudity of a non-prurient nature. For example: web sites for publications such as National Geographic or Smithsonian Magazine or sites hosted by museums such as the Guggenheim, the Louvre, or the Museum of Modern Art.

Sexual Acts/Text
Pictures, descriptive text or audio of anyone or anything involved in explicit sexual acts and/or lewd and lascivious behavior, including masturbation, copulation, pedophilia, intimacy involving nude or partially nude people in heterosexual, bisexual, lesbian or homosexual encounters. Also includes phone sex ads, dating services, adult personal ads, CD-ROM’s and videos.

(from http://www.surfcontrol.com/products/cyberpatrol_for_home/product_overview/cybernot_cats.html)

However, there were no blocked sites that we considered to be "borderline" cases, e.g. non-pornographic black-and-white nude photography sites. The sites in this sample were either pornographic sites (e.g. http://a-1sex.com) or completely innocuous sites (e.g. http://a-attorney-virginia.com/).

We considered the following blocked sites to be "errors":

http://a-19.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1american.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1bonded.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1casino.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1janitorial.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1lawyers.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1radiatorservice.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-1recoveryservices.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-2-s.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-20.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-aaa-payday.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-actionhomeinspection.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-archer.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-attorney-virginia.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-bbreviate.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-bis-z.com/   [(screen shot of Cyber Patrol blocking this site)]
http://a-breviate.com/   [(screen shot of Cyber Patrol blocking this site)]

We considered the following blocked sites to be "non-errors":
http://a-1sex.com/
http://a-a-asexpics.com/
http://a-amateurs.com/
http://a-anal-sex.com/

We obtained these results with Cyber Patrol on Windows 98, using a blocked-site list that was current as of October 21, 2000. Only the "Full Nudity", "Partial Nudity" and "Sexual Acts/Text" categories were enabled.

How the list of 1,000 domains was constructed

We started with zone files from Network Solutions listing all .com domains in alphabetical order. Michael Sims supplied the first 10,000 domains in alphabetical order from that list, after eliminating sites at the top whose names started mostly with all "-" dashes. (A disproportionate number of these were pornographic sites that chose their domain name solely in order to show up at the top of an alphabetical listing, so a sample that included these sites would not be a representative cross-section.)

Jamie McCarthy supplied the perl script which isolated the first 1,000 domains that were actually "up":

gunzip -c com.20000614.entire.sorted.gz | grep '^a' | grep -v
'.*--\|-.*-.*-' | perl -ne 'chomp; $a = system("ping -c 1 -q
www.$_ >/dev/null 2>&1"); print "$_\n" if !$a;' | head -1000

We used this script to narrow down the list to the first 1,000 pingable domains sorted alphabetically by domain name.

We used the first 1,000 working domain names in our sample in order to make our sample "provably random". A truly random sample chosen from the entire list of domain names would have been better, but it would be impossible to prove that such a sample had really been chosen randomly; a third party could easily claim that we had "stacked the deck" by choosing a disproportionate number of sites blocked incorrectly by Cyber Patrol.

Potential sources of error

A sample of 21 "real" sites that are blocked by Cyber Patrol, is a small sample from which to draw any precise conclusions. The problem with the sample size is that we had to start with 1,000 randomly chosen Web domains just to get a sample of 21 blocked domains. The 81% figure should not be taken as being accurate to even two significant figures; across all .com domains in existence, the error rate for Cyber Patrol could be as high as 95% or as low as 65%. However, the test does establish that the likelihood of Cyber Patrol having an error rate of, say, less than 60% across all domains, is virtually zero.

A note on interpreting these results: the results are not weighted by Web site traffic, so some of the sites in this experiment may cause more "Access Denied" messages than others. The 82% error rate should also not be interpreted to apply across all domains, since we only used .com domains in our experiment, which are more likely to contain commercial pornography than, say, .org domains. (In other words, we should expect the error rate to be even higher for .org sites that are blocked.)

Conclusion

Cyber Patrol's Vice President of marketing, Susan Getgood, stated publicly in a response to an inquiry:

A site that is added to the CyberNOT list is viewed by a person before being added. --http://www.peacefire.org/censorware/Cyber_Patrol/sg.on.cyberspider.4-22-97.txt

Given the high error rate for sites blocked by Cyber Patrol in their sex-related categories, we believe that Cyber Patrol's claim of "100% human review" is false.