How to extract data for risk prediction?

For training and evaluating custom risk prediction, LSPs and enterprises provide their post-editing data - the original, the machine translation and the final human translation.

It's surprisingly difficult for users to extract these triplets from a TMS. They end up writing custom scripts to extract many XLIFFs and combine, or even reconstructing by reinvoking machine translation and joining with TMs.

Is it possible to do this more directly with translate5? Are these types of data stored in an actual database? Are MT suggestions distinguished from other machine suggestions like TM fuzzy matches?

There are other great reasons to have this sort of logs-based storage, for example to have dashboards for metrics like post-editing distance on an organisational level, or to run queries for certain machine translation output.

How to extract data for risk prediction?

1 answer