Wednesday, July 1, 2009

Yahoo: a real alternative to Google

The performance evaluation of the search engines is one of the most sought-after topics in search engines history. In a recent study pertaining to my research work on the performance evaluation of search engines in retrieving web resources, it has been found that the popularity status of search engines are almost remaining same among the five search engines year over year. The study was conducted on five most popular search engines, i.e. Google, Yahoo, Live, Ask, and AOL. The basic findings of the study was as follows:

In terms of coverage and user statistics GOOGLE is still remaining the most popular search engine on the web followed by YAHOO.

The basic concepts of search strategies and searching processes of all the five search engines is remain same.

All the five search engines followed different indexing method and the relevance ranking system is based on their owned unique algorithms.

There is significant difference of performance among the evaluated search engines. From the overall analysis of the findings of the matrices it was founded that the GOOGLE has statistically higher rate of performance in retrieving web resources as compared with the other four search engines. GOOGLE is followed by YAHOO in terms of measuring the retrieval performance. The other three search engines, i.e. AOL, LIVE, and ASK, have not performed satisfactorily as compared with GOOGLE and YAHOO.

Another conclusion of this evaluation is that, users could, and should, contemplate YAHOO as a real alternative to GOOGLE, especially because both engines were able to answer all the queries with highest mean of relevant hits. Beyond that, YAHOO was found to retrieve the greatest number of hits, less number of bad links, great stability in retrieving relevant hits, and maximum number of unique hits, after the GOOGLE.

Given these findings, we can consider that this study will provide important insight into the effectiveness of five major types of search engines and their support in retrieving relevant internet resources.

The present research tried to achieve results as valid and reliable as possible within the given resources. The available knowledge and evaluation experience was used to consider web-specific retrieval peculiarities and evaluation standards in a sufficient manner. This study has produced key findings that are important for all Web search engine users and researchers, and the Web industry.

Tuesday, June 2, 2009

Search Engines: The Invader to Privacy

HOW SEARCH ENGINES MAY TRACK THE USERS AND THEIR PRIVACY

People often view search engines as benign blank boxes to which they can pose any question they want and not suffer the consequences. Search engines large and small typically keep logs of users search terms, with some search engines going further and matching those terms to users computer address, name, and other items, depending on how much information they have shared with the search engine. When a web surfer first visits a Search Engine, it will most likely log the IP address of the computer being used, the date and time of the access, and probably the browser configuration. If available, referrer information may also be logged, for if the user arrived at the Search Engine page by clicking on a link in some other web page, then this referrer data will contain the URL of this previously visited page. This analysis can yield important information for them, such as the sites that better feed traffic to their sites and the approximate regional location of the visitors.

However, the major difference between regular websites and Search Engines is that the latter have the possibility to keep track of all the searches that the user formulates during a visit to the site. All of the entered keywords can be traced, and even the links in the results page that were ultimately accessed by the user can be logged. This type of information collection comes off as very natural since Search Engines are merely keeping track of the service they are offering to the users, but it is one of the first practices carried out by Search Engine sites which is a matter of privacy concerned. Cookies appear simple enough on the surface. They are actually nothing more than small text files used to keep some information on the client computer. The main purpose of Search Engines when placing persistent cookies (which remain stored in the user's hard drive until erased or expired) is not to trace the search habits of their users per se, but rather to have a way of individualizing the preferences of those accessing the site, in order to provide a more personalized experience to returning users. However, it is possible to utilize cookies in a way that might eventually compromise the privacy of the users. Since all information stored in cookies is handled under the hood by the sites that originally created them, it can be hard to tell exactly what they are being used for. Still, cookies provide great power to Search Engine sites as far as recording any type of information about their users and even keeping those tracking activities secret through the use of careful encoding and encryption.

The first serious privacy breach by a search engine arose in August 2006. AOL accidentally published search queries from more than half a million users, made over a three-month period, on its website. By the time AOL realised its mistake, the information had been copied to a number of other Internet sites, which are still available today. In 2005, the U.S. Department of Justice subpoenaed Google, Yahoo, MSN, and AOL for tens of millions of users' search queries. Google successfully fought the request, and was able to limit its disclosure, but it is unknown how much data other companies may have turned over.

Like AOL, Google also stores users' search terms. This is done in the form of a server log. Each search creates a new entry in the log.

Structure of Google's search logs



The server log information has greater potential to identify the individual than the information disclosed by AOL. It not only contains search terms but also traffic data, such as the IP address of the user and the user's Google cookie. This information could, in theory, be used to actually identify, rather than just guess the identity of, the user. Google collects additional information about users if they use other services such as Google Mail, Talk and Calendar. Of these, Google Mail has attracted the most controversy because of its use of “content extraction”. This analyses incoming and outgoing e-mail in order to target the advertising to the user while they are using the Google Mail service. Google continues to use this technique on Google Mail accounts despite numerous privacy complaints from various organizations such as the Electronic Privacy Information Centre.

DoubleClick
The privacy concerns surrounding Google have been heightened recently by its plans to acquire DoubleClick, Inc., one of the leading providers of Internet advertising. In order to help its clients target the advertising it also allows users to be tracked as they visit different websites across the Internet. This is done by placing a DoubleClick cookie on the user's computer the first time they visit a site with a DoubleClick advert. This cookie is then read each time the user visits another website containing a DoubleClick advert, thus building up a picture of the user's surfing habits.

Google’s My Search History
A new feature launched by Google allows users to see all of their past searches. The service, called My Search History, is similar to, but more comprehensive than, the feature Amazon.com, Ask, and America Online have offered for some time. It is intended to help people who use Google locate the information they sought during earlier searches so they can avoid repeating past queries. Once a user has set up the account, he or she will be able to see the search words previously entered as well as the sites visited previously that contain information on that search term. This may sound interesting and useful, but computer experts said there are risks to privacy the technology has now generated. By this, a user allowing Google to store search history on their computers and as long as Google holds up its end of the privacy policy, that information should remain safely on its servers. It is a universal truth that all search engines, including Yahoo, Google and MSN, retain search data of their users which can easily give a clue about the person's identity and a glimpse into his mind and online activity.

SOME REMEDIAL MEASURES TO SEARCH ENGINE PRIVACY
Individuals concerned with privacy but wishing to use the valuable services of search engines can adopt a number of measures to safeguard their personal confidentiality. Before type a search terms into a search engine box or register for extra services at a search engine, it is necessary to be aware of the potential consequences. Searches can come back to haunt, especially if they are problematic and can be tied directly to users in some way. Here are some tips to help to enhance Web searching privacy ranging from high protection steps to simple steps we can start to take right away.

Watch what you search for: By avoid using terms that include full legal name attached to any information. For example, searching for users’ full name and ID card number in a query is not optimal. If these types of search have conducted, then the users name and ID will appear together in the search string, and may be stored for a long time by the search engine.

Special note about passwords and user names: By avoid typing passwords or user names into search engines. If there is a security breach that allows users data to be released to others, these passwords and user names can potentially be used to identify the surfer, or even potentially cause some mischief. Therefore, it is a good idea to change passwords and user names, if the passwords or user names has been entered into a search engine.

Considering using an anonymizing tool or a proxy: The simplest way to disassociate from search terms is to use an anonymizing tool. There are free services available that allow using the Web without revealing users computer address, and there are also pay services. We may not realize it, but it is true that a computer disclose a lot of information as we traverse the Web. For example, login through IP ID at http://ipid.shat.net, we can verify someone computer or IP address and the kind of information that computer is disclosing.
TOR Onion Router and Privoxy: TOR (http://tor.eff.org) is a free tool that can be install in combination with a tool called Privoxy (http://www.privoxy.org/), which helps to mask yours computer's address, among other things. TOR and Privoxy are a good tool set and are well worth considering. These two tools should be used together.

Use Scandoo - Scandoo is a wonderful wrapper written around search engines that warns you of malicious websites in search results. Scandoo can help you search Google, Yahoo or MSN without disclosing your actual geographic location or IP address to the search engine. Scandoo interface remains invisible to the end user.

Download HideMyIp software - Your IP address is one big link between your search queries. If you are using a static IP, you can still hide it with HideMyIP address software. HideMyIP conceals your real IP address and shows a fake IP with a hostname to the sites that you visit. You can set Hide-My-IP to change your IP address every minute.

Download CustomizeGoogle for Firefox - If you Google using Firefox, this is a highly recommended extension that completely enhances your Googling experience. It can help remove Google Ads, anonymize your Google user ID, remove click tracking or filter Google search results.

Disallow Google to Store Cookies - The important thing is that it doesn't suffice blocking cookies from just google.com domain; you must also block cookies from google site in your country. For example, in India, one would block google.com and google.co.in. This is because Google redirects you to your local country page when you type in google.com in the browser address bar. To block cookies, open the Cookie blocking dialog in your
browser, type the site URL and click disallow or block.


SOME GENERAL TIPS FOR USING SEARCH ENGINES

These following tips are small steps that will not completely protect us from all search engine privacy issues, but they can potentially help to make incremental improvements.

• Do not accept search engine cookies. If you already have some on your computer, delete them. Cookies can be used to correlate a variety of information.

• Do not sign up for email at the same search engine where you regularly search. If you do so, then your email address can potentially be tied to your search terms. Whether or not a search engine does this is usually disclosed in the search engine's privacy policy.

• If you surf using a cable modem, or a static Internet connection, ask your service provider to give you a new IP address. Changing IP addresses every once in a while can be helpful for people who primarily surf the Web from one computer in one location over a long period of time.

E-Books: boon or bane

Information is a commodity that is of vital importance to everyone. In the past, a wide range of paper-based approaches have been used to provide access to information. Undoubtedly, books and related artefacts have been one of the most common. Nowadays, there is an increasing interest in the use of electronic books and other forms of online documentation in order to disseminate information and provide global access to it.


In recent years there has been much debate about the future of the book in a digital world. E-books have already become a reality. This is not surprising, as the technology of the book has seen a number of transitions. Over the centuries there has been a series of changes in the way that words are presented. Clay, wax, papyrus, vellum, cloth and paper have all be used and stored as tablets, scrolls or folios or books. The digitization of the world’s enormous store of library books, an effort dating to the early 1990s in the United Kingdom, the United States come into reality, when the search engine giant Google announced the ‘Google Book Search’ service in December 2004. Google has already singed agreement with Oxford, Harvard, Stanford, University of Michigan, and New York Public Library, to convert the full text of millions of library books into searchable web pages.

There are at present three types of electronic books available. The first are web books, which can be viewed via the World Wide Web. For example, Stephen King's web book “Riding the Bullet”, which was downloaded 5000,000 times within the first 48 hours of its release in March 2000 and is the most popular web book to date. The second type of e-book could be called Palm books, and can either be a type of PDA or a dedicated reader device such as an e-book reader. They are portable and do not necessarily require an Internet connection. The third one is still under development, would use electronic ink to display content on ultra thin high-resolution flexible sheets. Books using this technology are likely to have several hundred blank pages, bound in a conventional cover and will be able to be loaded with many volumes of selected books. They will be as easy to read as traditional books today but will have an instant memory to select what the user requires.


Advantages of electronic books

One of the biggest advantages to electronic books is their ability to store data. Even today's relatively primitive appliances hold 10 or 20 books and a software book reader mounted on a high-end laptop can store hundreds of books easily. The portability of e-book readers is also one of the outstanding advantages of electronic books. The ability to take whole library and stick it in pocket is absolutely amazing. The idea of portable personal digital libraries, not portable electronic books, is probably the future role of these appliances.


A further advantage of electronic books, in particular e-book readers, is their versatility and increasing capabilities. They offer many features not possible with print on paper. Content can be expanded, customized or updated. Notes can be written in the electronic equivalent of margins. Text can be searched much more efficiently than using an index in traditional book form. The size of print and font can be changed to suit one's taste and the size adjusted for aging eyes. E-books will also have the facility to read aloud and people who are visually impaired can have access to books in audio format at the same time that the books are released in print format. For people reading a second language, e-book reader will be very beneficial. Having an interactive dictionary, glossaries and vocabulary lists will add value and meaning to reading.


Another important advantage of the e-book for a user is for doing research. They have the ability to search through the text looking for specific words or phrases. Reference books, factual books, encyclopedias, dictionaries, bibliographies, abstracts and indexing guides are already widely acceptable in electronic form where the content is the most important feature. The tourist can quickly locate a map of the area being visited by keying in a street name, look up nearby tourist attractions and read about features. The PDA computer is smaller and lighter to carry than a traditional guidebook, no large map needs to be unfolded to find directions and the key word search is much more efficient than attempting to use an index to a book. Electronic books allow hyper linking instant jumps from one idea to another in another part of the book or in another work.

A fundamental advantage of e-books is their method of publication. An electronic text, which could be created by anyone, has the potential for instant worldwide distribution over the Internet. Project Gutenberg has published out of print texts of the classics and other copyright free information which can all be downloaded without cost by anyone in the world. Updates to scientific, technical and medical information can be published instantly anytime with large benefit to the education community. It also allows the publishing of novels and other books that might never be published in print format. The economic advantages to e-books are readily apparent. Although the overhead publication and distribution costs of electronic books are already lower than using a printing press, these have not yet been passed onto the consumer. Once an efficient and comfortable method of on-line distribution is available, then it will take much less effort to purchase books and at considerable reduction in price. Production, transportation and warehousing costs could be reduced or eliminated, intermediaries could be bypassed, trees could be saved, and with reduced book shelving at home, more wall space could be devoted to pictures of your children, artwork, and degrees,et al.


Disadvantages of e-books

However, e-book could attract the eyes of the netizens; still it is far away to overtake the smooth flexible conventional paper-based book because of some common pitfalls. Screen glare and eyestrain are a serious concern for many potential users of e-book technology. The display resolution of computer screens and electronic devices is considerably less than the print quality produced by a printing press. This problem has been recognized by the groups developing e-book readers.


Reading from a computer does not appear to work very well for many people, in particular when reading long texts. Various surveys showed that when reading from a computer, most people would prefer to print out material and read it. It appears that many people prefer to use on-line to browse, scan articles and see if they want to read them. If so they either print the article if that is convenient or buy the book. One of the disadvantages of e-books is that reading from a computer lacks the familiarity and comfort of reading from a book. The computer is fine for e-mail and browsing but there is something about curling up on a comfortable chair with a book that lap-top and hand-helds simply fail to match.


The usability of the e-books does not yet approach that of paper books. A paper book can be opened and flipped through, while an electronic text is more difficult to navigate. This disadvantage is recognized by the manufacturers of e-books. In an effort to overcome these problems, the e-book reader is copying many of the characteristics of books just as Gutenberg did when he began printing with movable metal type. He made print look just like the manuscript books that had been created by scribes.


A disadvantage of e-books for reading fiction is that the World Wide Web has made the use of electronic information very different in character from that in a paper book. People are accustomed to using electronic forms to look up information on computers, rather than taking the journey into other worlds that a good fiction book will do. When reading a printed fiction book, the reader chooses to leave reality for a while but when using e-books, there is still the element of having to control the electronic equipment. Another disadvantage of electronic books is that they all require hardware for viewing. Unless the hardware, Internet connection or battery power that is required by an e-book reader is readily available, then its electronic documents are useless. A drawback to the acceptance of e-books is their unreliable life span. Paper has a much longer life span than most digital forms of storage. Because of the rapid development of new computer systems it is difficult to judge when the software or hardware will become outdated. A distinct handicap to the use of e-books is the fact that there is not a set standard for e-book readers at the moment


The Internet has provided a huge instant worldwide distribution of works without the restrictions that have traditionally been part of the publishing industry. Copyright issues could be considered a serious drawback to digital forms of publication. Methods of distribution of e-books have not yet been developed to match those of the printed book. There are still questions to be asked about whether the purchaser of the e-book owns it and can lend it to friends and whether the purchaser is able to take copies of pages of the e-book. Reliance on e-books for research has disadvantages for the learning of students. The ease of cutting and pasting information makes plagiarism common. Cheat sites that offer essays and reports are available and it is possible to copy or purchase finished essays. With the advent of the Internet, students do not think in the sequential logical way that is fostered by reading, instead they click away after reading approximately 700 words. The books that help shape culture will be neglected because of this inability to read long prose.


Despite the tremendous utility and popularity of printed books, there are many reasons why we should now consider a partial shift away from the use of conventional paper-based books towards the greater use of publications that are based on the use of electronic media. Because of the rapidly emerging era of digital information and communications technologies, five basic issues can be taken into consideration; first, the problems associated with the ever increasing publication rate of both paper-based and electronic documents; second, the limitations imposed by the rates at which humans can read text-based material; third, the implications of 'media competition'; fourth, the speed and accuracy with which electronic material can be accessed; and fifth, the ways in which electronic material can be re-organized in dynamic ways in order to achieve more flexible presentation and access. Bearing in mind the five points raised above, it is becoming increasingly apparent that new ways of publishing books are needed. These new techniques should allow the flexible flow of information across different media forms in a dynamic way - to suit an individual's needs of the moment.


Information, thoughts, and ideas are useful whether they are presented on paper or by some other means. Methods of printing, of distribution, and of sales are being transformed. New technologies do not erase the past, but build on it. Sometimes new technologies supplant old ones. But, in the case of books, it can’t happen. Because of its unique characteristics, the printed book will not disappear. However it would appear that once the e-book has the readability, usability and other features of a printed book, and then it will be successful, particularly with new generations of readers who have grown up familiar with computers. In the future we will therefore need a comprehensive 'media strategy' which allows information to be moved from one medium to another as the needs of its users change.