Positive Energy

Friday, August 12, 2005

Searching For Good News

I spend a lot of effort crawling through the web, looking for items that can extend my understanding of nuclear fission. Usually this consists of jumping from web site to web site based on links that others think are interesting, or I end up doing searches with various tools and scanning through their returns looking for new sites. This got me thinking about the differences between search engines and whether or not one is better than another. I am now trying to measure these differences to answer this question.

I started off by deciding to do a measured seach based on these keywords:

nuclear reactor positive benefit

I selected five search engines for comparison:

S1 - Google - http://www.google.ca
S2 - A9 - http://a9.com
S3 - Scirus - http://www.scirus.com
S4 - Teoma - http://www.teoma.com/
S5 - MSN - http://beta.search.msn.com/

For each search I ranked the returned URLs as 1, 2, 3, 4, 5, or 6. Anything beyond fifth place got a rank of six. Using five search engines, if each one returned something different in the first five places I would have 25 URLs to consider. In fact, I got 20 so there is some overlap, but not as much as I expected. The ranks are listed in the following table:

Ranks
S1
S2
S3
S4
S5
URL1
1
1
6
6
6
URL2 2
2
6
6
6
URL3 3
3
6
6
6
URL4 4
5
6
6
6
URL5 5
6
6
6
6
URL6 6
4
6
6
6
URL7 6
6
1
6
6
URL8 6
6
2
6
6
URL9 6
6
3
6
6
URL10 6
6
4
6
6
URL11 6
6
5
6
6
URL12 6
6
6
1
3
URL13 6
6
6
2
6
URL14 6
6
6
3
6
URL15 6
6
6
4
6
URL16 6
6
6
5
6
URL17 6
6
6
6
1
URL18 6
6
6
6
2
URL19 6
6
6
6
4
URL20 6
6
6
6
5

This data allows me to compare one search engine with another by calculating the sum of the squared differences of their ranks for each URL. This calculation will produce a value of 0 if both search engines rank all the URLs with the same values, and a value of 110 if their rankings are as different as possible. So similar search results will produce low scores and differing results will yield high values. The calculations gave:

Search
Engine
Search
Engine
Comparison
Score
S1
S2
6
S1
S3
110
S1
S4
110
S1
S5
110
S2
S3
110
S2
S4
110
S2
S5
110
S3
S4
110
S3
S5
110
S4
S5
80

I concluded from this that S1 and S2 (Google and A9) are very similar so there is no need to use A9 if Google has been run.

S4 and S5 (Teoma and MSN) also showed some overlap, but not enough to make me think they are similar.

So I think I get my best search power by using a combination of Google, Scirus, Teoma, and MSN. I have a list of about thirty search engines that I use often, and I am going to continue collecting this data to determine the best subset.

The URLs found for this study were:

Identifier
URL
Positive?
URL1
http://www.umich.edu/~gs265/society/nuclear.htm
positive
URL2
http://www.info.gov.za/speeches/2001/0106281145a1003.htm
positive
URL3
http://www.inthenationalinterest.com/Articles/Vol3Issue35/Vol3Issue35Realist.html
neutral
URL4
http://www.reactnow.org/about_reactor.html
negative
URL5
http://www.american.edu/TED/irannuke.htm
neutral
URL6
http://www.uic.com.au/nip29.htm
positive
URL7
http://www.nrc.gov/reading-rm/doc-collections/commission/tr/2001/20010117b.html
positive
URL8
http://www.lib.ncsu.edu/archives/etext/engineering/reactor/NEfurther010052.html
error ?
URL9
http://www.vanderbilt.edu/radsafe/9709/msg00075.html
neutral
URL10
http://www.volpe.dot.gov/opsad/risk/risk.pdf
negative
URL11
http://www.engr.wisc.edu/alumni/perspective/27.3/Gift01.html
neutral
URL12
http://www.neis.org/literature/Reports%26Testimonies/full_terrorist_report_10-22-01.htm
negative
URL13
http://www.akaction.net/FTGreely.pdf
negative
URL14
http://www.sea-us.org.au/no2reactor/anstomisinfo.html
negative
URL15
http://www.msnbc.msn.com/id/5591511/
negative
URL16
http://www.world-nuclear.org/education/ral.htm
positive
URL17
http://www.nuclearfaq.ca
positive
URL18
http://www-formal.stanford.edu/jmc/progress/nuclear-faq.html
positive
URL19
http://neinuclearnotes.blogspot.com
positive
URL20
http://positiveenergy.blogspot.com
positive

These results indicate another problem. I wanted to find articles that discussed the positive benefits of nuclear reactors. Many of the found pages were decidely negative. The search engine does not make a distinction between "no benefit" and "benefit". So I am looking for a better way to search for relevant articles. The simple keyword list does not do it. In this regard it seems that MSN did a better job than the others.