A YaCy search engine node at Tangle Ball

As of today, Tangle Ball has attached a node to the Yacy search engine network. Yacy is a decentralised crawler and search engine software. It runs on several hundred nodes over the planet, which share their crawl indices with each other. The node hosted by Tangle Ball can be reached here, where web searches can be carried out.

What is the significance of this?

The majority of people connected to the internet use a commercial search engine, such as Google, Bing or Yahoo. There are multiple problems with this massive centralisation of searching. Researchers have found that Google promotes its own results over those of competitors, and builds up profiles of those who perform searches. The results of these profiles are used for two reasons, firstly to return results which the person searching will be more likely to want to see. This sounds great, of course we want search engines to return what we’re looking for, but it rapidly descends into a situation where we are “protected” from seeing anything we might disagree with. Eli Pariser has conducted studies into the effect of this, in an effect known as “bubbling”, the results of which are presented in his book The Filter Bubble. This is one of the reasons behind Duck Duck Go, a search engine which neither profiles nor bubbles those who use it.

However, the results from Duck Duck Go search engine are still produced by commercial entities, mainly Microsoft and Yahoo, and thus conform to their values. Particularly in the case of Microsoft, these values have long been recognised as not in the best interests of those who use their services and products. Numerous court cases, fines and criminal convictions stand testament to this.

A second use of the profiles built up from using these commercial search engines is the aggregation of the data, which is then used to target adverts at users. Google offers a variety of services, which mean most people are almost constantly logged into either Google Mail, Google Documents or some other product, meaning any searches they make are saved and stored alongside other data from their email, the places they visit, and the RSS feeds they subscribe to. These allow a sophisticated model of each person to be built up, allowing precise direction of advertising at the person. The popular retort to this, as to any advertising, is “you don’t have to buy what they advertise”, but this is only partly true as it ignores the methods which advertisers use. The success alone of marketing, promotion and advertising demonstrates that the methods used are persuasive, and can induce people to buy items they might not otherwise. For an insight into how marketing works, take a look at “The Century of the Self”, by Adam Curtis at the BBC; it’s very informative, and free to watch, available here.

Yacy suffers from none of these problems: owing to its decentralised nature, no single entity controls the search results, and no entity can profile users, as they do not have access to search data. Further, it has no single point of failure, so is tolerant to any node failing.

If you would like to take part in the Yacy network, you can do so by carrying out searches here. To contribute data to the Yacy network, you can use any internet-capable computer, one with a recent Linux-based distribution is best, although it will also work on Windows and Mac OSX. The software can be downloaded from here. The computer you install it on must be reachable from the net, that is you must configure your router to allow connections from other computers.

Yacy is free software, released under the GNU General Public License, and thus can be freely examined, used, modified and redistributed.

I can set the software to crawl any site, if there are any in particular you think worth indexing, let me know in the comments below – include a few words why it’s worthwhile.

Leave a Reply

Your email address will not be published. Required fields are marked *