Changes between Version 22 and Version 23 of Justext
- Timestamp:
- 02/10/26 15:16:34 (5 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Justext
v22 v23 1 = jusText 4=1 = jusText 5 = 2 2 3 3 jusText is a tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages. It is designed to preserve mainly text containing full sentences and it is therefore well suited for creating linguistic resources such as Web corpora. … … 20 20 21 21 == Installation == 22 1. Make sure you have Python 3.1 1or newer. Required packages are installed via pip automatically (or system-wide, e.g. python3-lxml and python3-html5-parser in Fedora).22 1. Make sure you have Python 3.12 or newer. Required packages are installed via pip automatically (or system-wide, e.g. python3-lxml and python3-html5-parser in Fedora). 23 23 2. Download, extract, install: 24 24 {{{ 25 wget https://corpus.tools/raw-attachment/wiki/Downloads/justext-4.3.tar.gz 26 tar xzvf justext-4.3.tar.gz 27 cd justext-4.3/ 28 pip install --user . #omit --user to install for all users 25 pip install --user https://corpus.tools/raw-attachment/wiki/Downloads/justext-5.0.tar.gz 29 26 }}} 30 27 … … 32 29 Python 3.6 & Python 2.7 compatible 33 30 {{{ 34 wget https://corpus.tools/raw-attachment/wiki/Downloads/justext- 4.2.5.tar.gz35 tar xzvf justext- 4.2.5.tar.gz36 cd justext- 4.2.5/31 wget https://corpus.tools/raw-attachment/wiki/Downloads/justext-5.0.tar.gz 32 tar xzvf justext-5.0.tar.gz 33 cd justext-5.0/ 37 34 python3 setup.py install --user #omit --user to install for all users 38 35 }}}

