Page tree

Versioning

Current translate 5 version7.21.0
Changelogs documented up to version7.20.4

Version Published Changed By Comment
CURRENT (v. 1) Apr 02, 2025 16:27

This functionality can be used to protect various character/number sequences such as dates, combinations of numbers and units of measurement, times, or article numbers following a specific character pattern.

What is to be protected is defined using a regular expression. The function is implemented in such a way that the occurrences of the protectable sequences in the source and target segments recognized via regular expressions are protected as a tag. It is also possible to configure rules that automatically transfer certain protected source language sequences to the target language according to a predefined pattern. translate5 offers an extensive selection of such regular expressions, which can be used out of the box or customized to your individual needs. And of course you can also set up more rules. The format is recorded for each source and each target language, i.e. not for language combinations.

The rules are created in the preferences.


Overview

The settings for the content protection functionality can be found in the preferences under “Content protection”.

There are three tabs here in which different settings can be made:

  1. Content Recognition Rules:
    Here, you will find a set of predefined rules that translate5 uses to recognize and protect certain character/number sequences by converting the sequences into tags using a matching regular expression. Some rules are already stored as standard and can be used out of the box. However, you can also store additional rules or customize existing rules to suit your needs.
  2. Active Input Rules:
    Here, you can specify which rules should apply to which source language by activating the rules from the “Rules for content” tab for specific source languages.

    An input rule only applies if it is combined with an output rule. Exception: “keep content” rules.
  3. Active Output Rules:
    Here, the rules defined for a source language are linked with rules for a specific target language.


Content Recognition Rules: Overview of all rules

Add new rule

If you need a specific rule for content protection that is not available by default in translate5, you can add it yourself under “Preferences” > “Content protection” by clicking on the button. The “Create content recognition rule” window appears, in which you need to provide the following info.

  • Type: Select the corresponding type for the rule that you want to create:
    • date: Select this type if you are entering a rule for a date.
    • float: Select this type if you want to create a rule for a floating-point number.
    • integer: Select this type if you are entering a rule for whole numbers.
    • ip-address: This type is intended for IP addresses.
    • mac-address: This type is intended for MAC addresses.
    • keep-content: This type is intended for rules where the content is to be retained.
    • replace-content: This type is intended for rules where the content is to be replaced.
  • Name (mandatory field): Enter a meaningful name for the rule here.
  • Description: You can enter a brief description of the rule in this field.
  • Regex (mandatory field): Enter the regular expression that shall be used to find and protect the desired number/character sequence here.
  • Protected regex group: If only part of the rule is to be protected, the relevant section of the regex can be marked with opening and closing brackets (). Then enter the nth group of the regex to be protected in this field.
  • Format: The format specifies the pattern for the output, e.g. DD.MM.YYYY for a date or #,###,###.## for a 7-digit number with two decimals and comma as thousands separator.
  • Format render example: An example preview of the number/character sequence that is protected by the rule is displayed here based on the regex entered.
  • Keep as is: If this checkbox is ticked, the entire number/character sequence is always transferred one-to-one from the source language to the target language and protected with tags in both places.

If a tick is placed next to “Keep as is”, this rule only needs to be activated for the source language, but not for the target language. Because, as defined, the rule will be applied one-to-one from source to target language in projects with the corresponding source language.

The following list shows placeholders that use predefined output formats:

  • #: single-digit number, without leading/trailing zeros.
  • m: month as a two-digit number, e.g. 02 for February.
  • y: year as a two-digit number, e.g. 25 for 2025.
  • d: day as a two-digit number, e.g. 01 for the first day of a month.
  • Y: year as a four-digit number.
  • D: abbreviated day as a word, e.g. Mon for Monday.
  • h: hour as a two-digit number.
  • i: minute as a two-digit number.
  • s: second as a two-digit number.
  • sec: Complete time zone information with year, month, day, time, etc.


Active Input Rules: Activating rules for source language(s)

After clicking on the button, the “Create mapping” window appears. The following information is entered here:

  • Type: This drop-down contains the types that are used in the available active rules. It helps to narrow down the values shown in the next drop-down “Name”.
  • Name: Choose the appropriate rule created in the “Content Recognition Rules” tab from this drop-down.
  • Language: Select the source language for which the rule should be applied.
  • Priority: Enter the priority with which the rule is to be applied. For each source language, each rule must have a different priority, or formulated differently: No two rules for one and the same source language can have the same priority.

By clicking the button, the rule is applied to the source language along with the information entered here.

It may help to specify the value for the priorities with some values as buffers in between, so that when a new rule that needs prioritizing will be entered between two existing rules, not all priorities have to be changed upwards and downwards.

The higher the number, the higher the priority.


Active output rules: Activating rules for target language(s)

After clicking on the button, the “Create mapping” window appears. The following information is entered here:

  • Type: This drop-down contains the types that are used in the available active rules. It helps to narrow down the values shown in the next drop-down “Name”.
  • Input Rule name: Choose the input rule that you have activated for a source language and want to map to an output rule for a target language.
  • Language: Select the target language for which the input rule selected above is to be converted into the output rule.
  • Output Rule name: Choose the output rule that you want to apply to the target language in connection with the input rule selected above.

By clicking the button, the rule is applied to the target language along with the information entered here.

If a rule is missing in the “Output Rule name” drop-down, this is most likely due to the rule being set to “keep as is” in the “Content Recognition Rules” tab, or it does not exist yet.


Some examples of Content protection applications

If content protection rules have been created and activated for certain source and target languages, the corresponding number/character sequences in the projects created using these rules are automatically protected by tags and – if defined accordingly – converted into a different format for the target language.

For example:

Jira Issue ID is protected:

translate5 as brand name is protected:

ISBN-10 number is protected:


Converting Translation memories

translate5 automatically recognizes when a translation memory no longer matches the Content protection settings and signals this in the language resources overview with a symbol.

The affected translation memories can be converted by clicking on this button so that they once again correspond to the Content protection settings.

The translation memory will not be accessible during the conversion. It is possible that converting a translation memory takes a lot of time.

The conversion of the translation memory is applied immediately. This means that the matches can also be influenced by the conversion in ongoing tasks the concerned TM is linked to.

Recommendations for TM conversions with ongoing projects

The following procedure is recommended for very large and/or important projects that were started before the conversion but are not yet completed at the time of the conversion:

  1. Once the TM conversion is complete, navigate to the language resources overview.
  2. Make sure that the “Task TM” column is visible.
  3. Select the task TM that belongs to the current project by clicking on the button.
  4. The “Associated tasks: Task TM id 'TM-ID'” appears, in which the task(s) for which the TM is used is/are visible.
  5. Tick the box next to the relevant task.
  6. Click on the “Save all segments to TM” button.
  7. Close the window.
  8. Export the Task TM in .tmx format by clicking on thebutton and then click on “export as TMX file”.
  9. Now import the exported .tmx file into the converted main TM by clicking on the button.
  10. The “TMX file imported: main TM name” appears.
  11. Upload the .tmx file.
  12. Select how framing tags should be handled during import.
  13. Click on “Save” to start the import.

During the import of the .tmx file into the main TM, all active rules of the Content Protection settings are applied to the imported segments. This procedure for the new matches is therefore equivalent to converting the entire main TM again, but depending on the size of the TM (and thus the amount of time required for the conversion) it is the faster way.

In the language resources view, you can use a filter to display only those translation memories that have not (yet) been converted: