This refers to how tags are passed to the various services such as Deepl, Google, OpenAI, etc. Some services do not deal well with the tags commonly used in XLIFF. Therefore, these tags are converted into a different form before transmission. Unfortunately, there is no one-size-fits-all solution that works equally well for all services.
In addition to that, we have downstream repair routines for all types of Tag-Handler. These repair routines attempt to correct formal errors in the return of a service. For example, incorrect tag order, incorrect nesting, missing tags or the like.
Here is a small example of how Tag-Handler works in Translate5
Original-Text:
<strong>Hallo Welt</strong> |
in Xliff:
<bpt id="1" ctype="x-strong"><strong></bpt>Hallo Welt,<ept id="1"></strong></ept> |
looks like this in Translate5:
<1>Hallo Welt,</1> |
And this is what will be sent to the services for the different Tag-Handler settings:
Simply removes all tags. Will often bring good translation results but, well, you do not have any tags afterwards.
sent text:
Hallo Welt, |
will replace all tag-structures with <img ...> tags, which are well known in HTML. This is always a good idea for services who can handle HTML better than XML or other formats.
sent text:
<img id="t5tag-start-1" src="example.jpg" />Hallo Welt,<img id="t5tag-end-1" src="example.jpg" /> |
this is a more XML-like approach. Tag-structures are replaced with simpler tags. Services that deliver better results for XML structures should use this setting.
sent text:
<bx id="1" rid="1" />Hallo Welt,<ex id="2" rid="1" /> |
we found out that some services could not handle tag attributes in a correct way. Therefore we added this replacement with tags that do not need any attribiutes.
Currently this is our favourite proposal for most of the services.
sent text:
<t5x_1_1>Hallo Welt,</t5x_2_1> |
When dealing with tags Translate5 stores them in internal format.
When Translate5 communicates with 3rd party system like t5memory or OpenAI it converts internal tags format to the format understandable by that system. Moreover 3rd party system can return result with malformed (wrong order etc.) or missing tags.
There are several ways how Translate5 deals with tags conversion for communicating with 3rd party systems.
There are configs for each resource that support processing tags. Configs are prefixed with runtimeOptions.LanguageResources then goes system name and then tagHandler. Currently the following configs exist
runtimeOptions.LanguageResources.translate24.tagHandler
runtimeOptions.LanguageResources.deepl.tagHandler
runtimeOptions.LanguageResources.google.tagHandler
runtimeOptions.LanguageResources.groupshare.tagHandler
runtimeOptions.LanguageResources.microsoft.tagHandler
runtimeOptions.LanguageResources.openai.tagHandler
runtimeOptions.LanguageResources.pangeamt.tagHandler
runtimeOptions.LanguageResources.t5memory.tagHandler - system level, not available to be edited in UI
runtimeOptions.LanguageResources.textshuttle.tagHandler
runtimeOptions.LanguageResources.tildemt.tagHandler
Remover is a tag handler which purpose is to remove tags from the text completely so the plain text is sent to the corresponding 3rd party system.
Html image is a tag handler that converts internal tags to html IMG tags.
XLF repair is a tag handler that converts internal tags to XML tags (bx, ex, x etc.).
Tag handler tjat acts similar to the xlf_repair, but specific to the t5memory.
All tag handlers (except remover) are doing tag repair if corresponding system returns result with malformed tags (wrong order, wrong nesting) or missing tags by trying to place appropriate tag on it's expected position if it is possible.