Tag-Handler
This refers to how tags are passed to the various services such as Deepl, Google, OpenAI, etc. Some services do not deal well with the tags commonly used in XLIFF. Therefore, these tags are converted into a different form before transmission. Unfortunately, there is no one-size-fits-all solution that works equally well for all services.
In addition to that, we have downstream repair routines for all types of Tag-Handler. These repair routines attempt to correct formal errors in the return of a service. For example, incorrect tag order, incorrect nesting, missing tags or the like.
Example
Here is a small example of how Tag-Handler works in Translate5
Original-Text:
<strong>Hallo Welt</strong>
in Xliff:
<bpt id="1" ctype="x-strong"><strong></bpt>Hallo Welt,<ept id="1"></strong></ept>
looks like this in Translate5:
<1>Hallo Welt,</1>
And this is what will be sent to the services for the different Tag-Handler settings...
remover:
Simply removes all tags. Will often bring good translation results but, well, you do not have any tags afterwards.
sent text:
Hallo Welt,
html_image:
will replace all tag-structures with <img ...> tags, which are well known in HTML. This is always a good idea for services who can handle HTML better than XML or other formats.
sent text:
<img id="t5tag-start-1" src="example.jpg" />Hallo Welt,<img id="t5tag-end-1" src="example.jpg" />
xlf_repair:
this is a more XML-like approach. Tag-structures are replaced with simpler tags. Services that deliver better results for XML structures should use this setting.
sent text:
<bx id="1" rid="1" />Hallo Welt,<ex id="2" rid="1" />
xliff_paired_tags:
we found out that some services could not handle tag attributes in a correct way. Therefore we added this replacement with tags that do not need any attributes.
Currently this is our favourite proposal for most of the services.
sent text:
<t5x_1_1>Hallo Welt,</t5x_2_1>
Whitespace Handling
As with tags, you can control how whitespace is sent. For this we have the 'sendWhitespaceAsTag' parameter. If this parameter is set to 'Disabled', all whitespace will be transmitted as it appears in the original text.
However, because some services cannot distinguish between different whitespaces, special whitespaces are often returned as plain spaces. To avoid this, these whitespaces can be converted to tags before transmission. This is done by setting the 'sendWhitespaceAsTag' parameter to 'Enabled'. This is the recommended default setting.
What are whitespaces?
As well as normal spaces, there are many other special characters. One of the best known is the 'non-breakable space', which is a space that cannot be broken into a new line. This is often used for currency or price information, for example.