| | 8 | |
| | 9 | <td class="app" style="background-color:#000080 ; background-image:url('/chrome/site/justext_nb.png')"> |
| | 10 | <p><a href="/wiki/Justext"> |
| | 11 | JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences.</a><p> |
| | 12 | <p> |
| | 13 | <a class="lnk" href="http://is.muni.cz/th/45523/fi_d/phdthesis.pdf">Paper</a> |
| | 14 | | |
| | 15 | <a class="lnk" href="/wiki/Justext/Cite">Cite</a> |
| | 16 | | |
| | 17 | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| | 18 | </p> |
| | 19 | </td> |
| | 20 | |
| | 21 | <td class="app" style="background-color:#800000 ; background-image:url('/chrome/site/_nb.png')"> |
| | 22 | <p><a href="/wiki/Chared"> |
| | 23 | Chared is a tool for detecting the character encoding of a text in a known language. It contains models for a wide range of languages.</a><p> |
| | 24 | <p> |
| | 25 | <a class="lnk" href="#">Paper</a> |
| | 26 | | |
| | 27 | <a class="lnk" href="/wiki/Chared/Cite">Cite</a> |
| | 28 | | |
| | 29 | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| | 30 | </p> |
| | 31 | </td> |
| | 32 | |
| | 33 | </tr><tr> |
| | 34 | |
| | 35 | <td class="app" style="background-color:#800080 ; background-image:url('/chrome/site/_nb.png')"> |
| | 36 | <p><a href="/wiki/SpiderLing">Spiderling is a web spider for linguistics. It can crawl text-rich parts of the web and collect a lot of data suitable for text corpora. |
| | 37 | </a><p> |
| | 38 | <p> |
| | 39 | <a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a> |
| | 40 | | |
| | 41 | <a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a> |
| | 42 | | |
| | 43 | <a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a> |
| | 44 | </p> |
| | 45 | </td> |
| 33 | | </tr><tr> |
| 34 | | |
| 35 | | <td class="app" style="background-color:#000080 ; background-image:url('/chrome/site/justext_nb.png')"> |
| 36 | | <p><a href="/wiki/Justext"> |
| 37 | | JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences.</a><p> |
| 38 | | <p> |
| 39 | | <a class="lnk" href="http://is.muni.cz/th/45523/fi_d/phdthesis.pdf">Paper</a> |
| 40 | | | |
| 41 | | <a class="lnk" href="/wiki/Justext/Cite">Cite</a> |
| 42 | | | |
| 43 | | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| 44 | | </p> |
| 45 | | </td> |
| 46 | | |
| 47 | | <td class="app" style="background-color:#800080 ; background-image:url('/chrome/site/_nb.png')"> |
| 48 | | <p><a href="/wiki/SpiderLing">Spiderling is a web spider for linguistics. It can crawl text-rich parts of the web and collect a lot of data suitable for text corpora. |
| 49 | | </a><p> |
| 50 | | <p> |
| 51 | | <a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a> |
| 52 | | | |
| 53 | | <a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a> |
| 54 | | | |
| 55 | | <a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a> |
| 56 | | </p> |
| 57 | | </td> |
| 58 | | |
| 59 | | </tr><tr> |
| 60 | | |
| 61 | | <td class="app" style="background-color:#808000 ; background-image:url('/chrome/site/_nb.png')"> |
| 62 | | <p><a href="/wiki/Chared"> |
| 63 | | Chared is a tool for detecting the character encoding of a text in a known language. It contains models for a wide range of languages.</a><p> |
| 64 | | <p> |
| 65 | | <a class="lnk" href="#">Paper</a> |
| 66 | | | |
| 67 | | <a class="lnk" href="/wiki/Chared/Cite">Cite</a> |
| 68 | | | |
| 69 | | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| 70 | | </p> |
| 71 | | </td> |
| 72 | | |
| 73 | | <td class="app" style="background-color:#000000 ; background-image:url('/chrome/site/_nb.png')"> |
| | 73 | <td class="app" style="background-color:#008080 ; background-image:url('/chrome/site/_nb.png')"> |