Bleu+pdf+work
| Pitfall | Effect on BLEU | Solution | |--------|----------------|------------| | PDF extracts text out of order | BLEU near 0 | Use reading-order preservation (e.g., Adobe Extract) | | References include OCR typos | BLEU artificially low | Post-OCR correction or manual proofing | | Different tokenization (MT vs eval) | Inconsistent scores | Use sacreBLEU with standardized tokenizer | | Paragraph merging changes sentence boundaries | N-gram mismatch | Enforce consistent segmentation across all pipelines | | Using BLEU for creative/literary translation | Misleading scores | Supplement with human metrics (COMET, BERTScore) |
Elias opened the split screen. On the left, the PDF. On the right, the machine’s output. bleu+pdf+work
18;write_to_target_document1a;_MdHsaZCfKrmp1sQP7fzqmQw_10;56; | Pitfall | Effect on BLEU | Solution