Wikipedia annotations

According to our experience, the best annotation for wikipedia article is beginning of the article, with skipping of several interface-like paragraphs and markup cleanup. Also there is actuality requiremnent, we had to hourly dump changed articles. Thereby we always have actual state of all wikipedia articles. For cleanup of wiki-markup by original algorithm it is necessary to access to other pages of wikipedia for templates expansion. If template-page uses othere templates, algorithm begins to slow. That’s why we decide to write our own expansion rules for several popular templates.

Also, in wikipedia there are redirection-pages. In most cases in database of our content-system not original one URL was saved, but redirection page. Thereby for successful match we had to present annotation for every redirect page from origianal page. Separate database of redirect-pages  is supporting.

Leave a Reply