| | 1 | Apr 11 2018 |
| | 2 | Norwegian joined wordlist |
| | 3 | Apr 11 2018 |
| | 4 | More wordlists |
| | 5 | Sep 11 2017 |
| | 6 | Lowercased stoplist |
| | 7 | Aug 24 2017 |
| | 8 | New and updated wordlists |
| | 9 | Aug 24 2017 |
| | 10 | Justext 1.4 |
| | 11 | Aug 24 2017 |
| | 12 | Web demo |
| | 13 | Aug 24 2017 |
| | 14 | max_good_distance, a new context classification parameter |
| | 15 | Maximum distance (in paragraphs) of a short paragraph from a good |
| | 16 | paragraph to re-classify the short paragraph as good. |
| | 17 | Jun 30 2017 |
| | 18 | Minor package updates |
| | 19 | Jun 30 2017 |
| | 20 | Justext 1.3 |
| | 21 | Jun 29 2017 |
| | 22 | Preprocess split to get_html_root and preprocess_html_root |
| | 23 | Allows using the DOM root before the head (and other possibly useful |
| | 24 | elements) are removed. Needed to get the page title from the head. |
| | 25 | Apr 12 2017 |
| | 26 | new README |
| | 27 | Apr 12 2017 |
| | 28 | filter out HTML(5) elements |
| | 29 | Feb 24 2017 |
| | 30 | remove words containing Latin characters from Korean stoplist |
| | 31 | Jan 12 2015 |
| | 32 | Move * out of trunk/ |
| | 33 | Nov 11 2012 |
| | 34 | Temporary workaround for issue #2: Remove any text nodes that cannot be decoded. |
| | 35 | Jan 26 2012 |
| | 36 | Added stoplists for Kazakh, Kyrgyz, Turkmen and Uzbek. |
| | 37 | Dec 6 2011 |
| | 38 | Fixed inserting spaces between text nodes. Before, content such as "abc<b>efg</b>" became "abc efg" after processing. Now it correctly becomes "abcefg". |
| | 39 | Aug 8 2011 |
| | 40 | jusText 1.2 |
| | 41 | Aug 8 2011 |
| | 42 | Edited wiki page Algorithm through web user interface. |
| | 43 | Aug 4 2011 |
| | 44 | Use character counts instead of word counts where possible (length-low, length-high, max-heading-distance and for computing link density). This is to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc). The default thresholds have been adjusted correspondingly. |
| | 45 | Aug 4 2011 |
| | 46 | More robust parsing of meta tags containing the information about used charset. |
| | 47 | Jun 6 2011 |
| | 48 | Bug fix: Corrected decoding of HTML entities € to Ÿ |
| | 49 | Mar 28 2011 |
| | 50 | Edited wiki page Algorithm through web user interface. |
| | 51 | Mar 28 2011 |
| | 52 | Edited wiki page Algorithm through web user interface. |
| | 53 | Mar 23 2011 |
| | 54 | Edited wiki page Algorithm through web user interface. |
| | 55 | Mar 17 2011 |
| | 56 | Edited wiki page Algorithm through web user interface. |
| | 57 | Mar 9 2011 |
| | 58 | Edited wiki page Algorithm through web user interface. |
| | 59 | Mar 9 2011 |
| | 60 | Edited wiki page Algorithm through web user interface. |
| | 61 | Mar 9 2011 |
| | 62 | Edited wiki page Algorithm through web user interface. |
| | 63 | Mar 9 2011 |
| | 64 | Edited wiki page Algorithm through web user interface. |
| | 65 | Mar 9 2011 |
| | 66 | Created wiki page through web user interface. |
| | 67 | Mar 9 2011 |
| | 68 | jusText 1.1 |
| | 69 | Mar 9 2011 |
| | 70 | Initial import. |