For training and evaluating custom risk prediction, LSPs and enterprises provide their post-editing data - the original, the machine translation and the final human translation.
It's surprisingly difficult for users to extract these triplets from a TMS. They end up writing custom scripts to extract many XLIFFs and combine, or even reconstructing by reinvoking machine translation and joining with TMs.
Is it possible to do this more directly with translate5? Are these types of data stored in an actual database? Are MT suggestions distinguished from other machine suggestions like TM fuzzy matches?
There are other great reasons to have this sort of logs-based storage, for example to have dashboards for metrics like post-editing distance on an organisational level, or to run queries for certain machine translation output.
translate5 does know all this information on the DB level for all tasks.
Right now it is not possible though to export it in a structured way through GUI or API.
Yet it would be possible to implement with a reasonable effort in a reasonable time frame. Would make much more sense to do it this way than fiddling around with scripts and xliff outside of a TMS.
translate5 project community infrastructure is provided by MittagQI - Quality Informatics