Searching For Good News
I spend a lot of effort crawling through the web, looking for items that can extend my understanding of nuclear fission. Usually this consists of jumping from web site to web site based on links that others think are interesting, or I end up doing searches with various tools and scanning through their returns looking for new sites. This got me thinking about the differences between search engines and whether or not one is better than another. I am now trying to measure these differences to answer this question.
I started off by deciding to do a measured seach based on these keywords:
nuclear reactor positive benefit
I selected five search engines for comparison:
S1 - Google - http://www.google.ca
S2 - A9 - http://a9.com
S3 - Scirus - http://www.scirus.com
S4 - Teoma - http://www.teoma.com/
S5 - MSN - http://beta.search.msn.com/
For each search I ranked the returned URLs as 1, 2, 3, 4, 5, or 6. Anything beyond fifth place got a rank of six. Using five search engines, if each one returned something different in the first five places I would have 25 URLs to consider. In fact, I got 20 so there is some overlap, but not as much as I expected. The ranks are listed in the following table:
This data allows me to compare one search engine with another by calculating the sum of the squared differences of their ranks for each URL. This calculation will produce a value of 0 if both search engines rank all the URLs with the same values, and a value of 110 if their rankings are as different as possible. So similar search results will produce low scores and differing results will yield high values. The calculations gave:
I concluded from this that S1 and S2 (Google and A9) are very similar so there is no need to use A9 if Google has been run.
S4 and S5 (Teoma and MSN) also showed some overlap, but not enough to make me think they are similar.
So I think I get my best search power by using a combination of Google, Scirus, Teoma, and MSN. I have a list of about thirty search engines that I use often, and I am going to continue collecting this data to determine the best subset.
The URLs found for this study were:
These results indicate another problem. I wanted to find articles that discussed the positive benefits of nuclear reactors. Many of the found pages were decidely negative. The search engine does not make a distinction between "no benefit" and "benefit". So I am looking for a better way to search for relevant articles. The simple keyword list does not do it. In this regard it seems that MSN did a better job than the others.
I started off by deciding to do a measured seach based on these keywords:
nuclear reactor positive benefit
I selected five search engines for comparison:
S1 - Google - http://www.google.ca
S2 - A9 - http://a9.com
S3 - Scirus - http://www.scirus.com
S4 - Teoma - http://www.teoma.com/
S5 - MSN - http://beta.search.msn.com/
For each search I ranked the returned URLs as 1, 2, 3, 4, 5, or 6. Anything beyond fifth place got a rank of six. Using five search engines, if each one returned something different in the first five places I would have 25 URLs to consider. In fact, I got 20 so there is some overlap, but not as much as I expected. The ranks are listed in the following table:
Ranks | S1 | S2 | S3 | S4 | S5 |
URL1 | 1 | 1 | 6 | 6 | 6 |
URL2 | 2 | 2 | 6 | 6 | 6 |
URL3 | 3 | 3 | 6 | 6 | 6 |
URL4 | 4 | 5 | 6 | 6 | 6 |
URL5 | 5 | 6 | 6 | 6 | 6 |
URL6 | 6 | 4 | 6 | 6 | 6 |
URL7 | 6 | 6 | 1 | 6 | 6 |
URL8 | 6 | 6 | 2 | 6 | 6 |
URL9 | 6 | 6 | 3 | 6 | 6 |
URL10 | 6 | 6 | 4 | 6 | 6 |
URL11 | 6 | 6 | 5 | 6 | 6 |
URL12 | 6 | 6 | 6 | 1 | 3 |
URL13 | 6 | 6 | 6 | 2 | 6 |
URL14 | 6 | 6 | 6 | 3 | 6 |
URL15 | 6 | 6 | 6 | 4 | 6 |
URL16 | 6 | 6 | 6 | 5 | 6 |
URL17 | 6 | 6 | 6 | 6 | 1 |
URL18 | 6 | 6 | 6 | 6 | 2 |
URL19 | 6 | 6 | 6 | 6 | 4 |
URL20 | 6 | 6 | 6 | 6 | 5 |
This data allows me to compare one search engine with another by calculating the sum of the squared differences of their ranks for each URL. This calculation will produce a value of 0 if both search engines rank all the URLs with the same values, and a value of 110 if their rankings are as different as possible. So similar search results will produce low scores and differing results will yield high values. The calculations gave:
Search Engine | Search Engine | Comparison Score |
S1 | S2 | 6 |
S1 | S3 | 110 |
S1 | S4 | 110 |
S1 | S5 | 110 |
S2 | S3 | 110 |
S2 | S4 | 110 |
S2 | S5 | 110 |
S3 | S4 | 110 |
S3 | S5 | 110 |
S4 | S5 | 80 |
I concluded from this that S1 and S2 (Google and A9) are very similar so there is no need to use A9 if Google has been run.
S4 and S5 (Teoma and MSN) also showed some overlap, but not enough to make me think they are similar.
So I think I get my best search power by using a combination of Google, Scirus, Teoma, and MSN. I have a list of about thirty search engines that I use often, and I am going to continue collecting this data to determine the best subset.
The URLs found for this study were:
Identifier | URL | Positive? |
URL1 | http://www.umich.edu/~gs265/society/nuclear.htm | positive |
URL2 | http://www.info.gov.za/speeches/2001/0106281145a1003.htm | positive |
URL3 | http://www.inthenationalinterest.com/Articles/Vol3Issue35/Vol3Issue35Realist.html | neutral |
URL4 | http://www.reactnow.org/about_reactor.html | negative |
URL5 | http://www.american.edu/TED/irannuke.htm | neutral |
URL6 | http://www.uic.com.au/nip29.htm | positive |
URL7 | http://www.nrc.gov/reading-rm/doc-collections/commission/tr/2001/20010117b.html | positive |
URL8 | http://www.lib.ncsu.edu/archives/etext/engineering/reactor/NEfurther010052.html | error ? |
URL9 | http://www.vanderbilt.edu/radsafe/9709/msg00075.html | neutral |
URL10 | http://www.volpe.dot.gov/opsad/risk/risk.pdf | negative |
URL11 | http://www.engr.wisc.edu/alumni/perspective/27.3/Gift01.html | neutral |
URL12 | http://www.neis.org/literature/Reports%26Testimonies/full_terrorist_report_10-22-01.htm | negative |
URL13 | http://www.akaction.net/FTGreely.pdf | negative |
URL14 | http://www.sea-us.org.au/no2reactor/anstomisinfo.html | negative |
URL15 | http://www.msnbc.msn.com/id/5591511/ | negative |
URL16 | http://www.world-nuclear.org/education/ral.htm | positive |
URL17 | http://www.nuclearfaq.ca | positive |
URL18 | http://www-formal.stanford.edu/jmc/progress/nuclear-faq.html | positive |
URL19 | http://neinuclearnotes.blogspot.com | positive |
URL20 | http://positiveenergy.blogspot.com | positive |
These results indicate another problem. I wanted to find articles that discussed the positive benefits of nuclear reactors. Many of the found pages were decidely negative. The search engine does not make a distinction between "no benefit" and "benefit". So I am looking for a better way to search for relevant articles. The simple keyword list does not do it. In this regard it seems that MSN did a better job than the others.