Wikiwix crawling robot
The Wikiwix robot works in 3 different modes:
- Crawling the Wikimedia projects: for all projects of the wikimedia foundation, our robot listens to the RC IRC bot irc.wikimedia.org and go crawling the pages as soon as they have been modified or created. It keeps our search engine on Wikipedia and sister projects up to date.
- Crawling web pages for our Twitter based search engine. This is complementary to the encyclopedic Wikipedia search, and gives you access to what is buzzing now on the internet.
- Crawling entire web sites on demand: one can customize a search engine on its favorite web with Wikimarks.
How to control our robot
Prevent our robot to crawl your website
Prevent our robot to crawl some pages
We respect meta-tags for robots :
- Putting a line < meta name="robots" content="noindex,nofollow"/ > in the page will prevent us from crawling the page nor from following links in it.
- any combination of noindex or index, with nofollow or follow, will lead us to index/not to index, follow links/not to follow links in the page.