| | 266 | === Changelog 1.3 → 2.0 === |
| | 267 | Major bugfixes |
| | 268 | * ignored redirection to path + "/" fixed |
| | 269 | * binary files discard fixed (text extraction from pdf, doc,... works now) |
| | 270 | Major updates |
| | 271 | * multilingual webiste support (see util/config.py) |
| | 272 | * keeping near-good or even bad paragraphs allowed |
| | 273 | Minor updates |
| | 274 | * machine translation filter (based on some known MT identifiers in HTML) |
| | 275 | * extract text from ODF format (.odt files) |
| | 276 | * get file type from Content-Type from the HTTP header |
| | 277 | * add HTTP Last-Modified date to prevertical |
| | 278 | * Justext classification added to paragraph attributes |
| | 279 | |
| | 280 | === Changelog 1.1 → 1.3 === |
| | 281 | * decode IDNA hostnames in prevertical |
| | 282 | * adding URLs to download on-the-fly enabled |
| | 283 | * bugfixes |
| | 284 | |