Tag-Handler
Whitespace Handling
General
...
Config
There are configs for each resource that support processing tags. Configs are prefixed with runtimeOptions.LanguageResources then goes system name and then tagHandler. Currently the following configs exist
...
This refers to how tags are passed to the various services such as Deepl, Google, OpenAI, etc. Some services do not deal well with the tags commonly used in XLIFF. Therefore, these tags are converted into a different form before transmission. Unfortunately, there is no one-size-fits-all solution that works equally well for all services.
In addition to that, we have downstream repair routines for all types of Tag-Handler. These repair routines attempt to correct formal errors in the return of a service. For example, incorrect tag order, incorrect nesting, missing tags or the like.
Example
Here is a small example of how Tag-Handler works in Translate5
Original-Text:
| Code Block |
|---|
<strong>Hallo Welt</strong> |
in Xliff:
| Code Block |
|---|
<bpt id="1" ctype="x-strong"><strong></bpt>Hallo Welt,<ept id="1"></strong></ept> |
looks like this in Translate5:
| Code Block |
|---|
<1>Hallo Welt,</1> |
And this is what will be sent to the services for the different Tag-Handler settings...
remover:
Simply removes all tags. Will often bring good translation results but, well, you do not have any tags afterwards.
sent text:
| Code Block |
|---|
Hallo Welt, |
html_image:
will replace all tag-structures with <img ...> tags, which are well known in HTML. This is always a good idea for services who can handle HTML better than XML or other formats.
sent text:
| Code Block |
|---|
<img id="t5tag-start-1" src="example.jpg" />Hallo Welt,<img id="t5tag-end-1" src="example.jpg" /> |
(Depricated)xlf_repair:
this is a more XML-like approach. Tag-structures are replaced with simpler tags. Services that deliver better results for XML structures should use this setting.
sent text:
| Code Block |
|---|
<bx id="1" rid="1" />Hallo Welt,<ex id="2" rid="1" /> |
xliff_paired_tags:
we found out that some services could not handle tag attributes in a correct way. Therefore we added this replacement with tags that do not need any attributes.
Currently this is our favourite proposal for most of the services.
sent text:
| Code Block |
|---|
<t5x_1_1>Hallo Welt,</t5x_2_1> |
Whitespace Handling
As with tags, you can control how whitespace is sent. For this we have the 'sendWhitespaceAsTag' parameter. If this parameter is set to 'Disabled', all whitespace will be transmitted as it appears in the original text.
However, because some services cannot distinguish between different whitespaces, special whitespaces are often returned as plain spaces. To avoid this, these whitespaces can be converted to tags before transmission. This is done by setting the 'sendWhitespaceAsTag' parameter to 'Enabled'. This is the recommended default setting.
What are whitespaces?
As well as normal spaces, there are many other special characters. One of the best known is the 'non-breakable space', which is a space that cannot be broken into a new line. This is often used for currency or price information, for example
Possible values
remover
Remover is a tag handler which purpose is to remove tags from the text completely so the plain text is sent to the corresponding 3rd party system.
html_image
Html image is a tag handler that converts internal tags to html IMG tags.
xlf_repair
XLF repair is a tag handler that converts internal tags to XML tags (bx, ex, x etc.).
t5memoryxliff
Tag handler tjat acts similar to the xlf_repair, but specific to the t5memory.
All tag handlers (except remover) are doing tag repair if corresponding system returns result with malformed tags (wrong order, wrong nesting) or missing tags by trying to place appropriate tag on it's expected position if it is possible.