Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

curl -X POST "http://<host>:9001/termTagger/termTag/"   -H "Content-Type: application/json"   -H "x-tbxid: 2b412073d5185fa4b8d7831e0ee6472d"   -d '{
    "tbxFile": "2b412073d5185fa4b8d7831e0ee6472d",
    "sourceLang": "en",
    "targetLang": "en",
    "segments": [
      {
        "id": "387278",
        "field": "targetEdit",
        "source": "Outstanding features",
        "target": "Outstanding features"
      }
    ],
    "debug": 0,
    "fuzzy": 0, //enables or disables the fuzzy search for terms. Not used in translate5 since the test phase of TermTagger in 2015/16, because of bad results
    "stemmed": 1,//Apache Lucene stemmer is used to find non-exact matches of terminology. Active in translate5 by default since 2015/16
    "fuzzyPercent": 70,
//if fuzzy is used: fuzzy rate, how much of the found word and the word in the terminology must match to be the term in questio    "maxWordLengthSearch": 2,
    //max. word count for fuzzy search
    "minFuzzyStartLength": 2,
    , //min. number of chars at the beginning of a compared word in the text, which have to be identical to be matched in a fuzzy search
    "minFuzzyStringLength": 5,
     //min. char count for words in the text compared in fuzzy search
    "targetStringMatch": 0, //defines, if in target the stemmer should be active or not (translate5 deactivates it for zh, ja, ko)
    "task": "{a4393eb5-46a7-4f5e-ba1a-70873c74a7a6}"
  }'

...

<div class="term preferredTerm transFound exact" data-tbxid="term_01_1_en_1_00003">VisualTranslation</div>

...

Internal Tags

The termTagger works only properly with HTML Synax, so internal tags in the segment content have to be replaced with img tags: 

Code Block
<img class="content-tag" src="1" alt="TaggingError"  />

If the class "content-tag" and alt text "TaggingError" are needed, is currently unclear. But that is at least the way since translate5 sends the img tag placeholders to the TermTagger since years.

The number in the src attribute is an ID to identify the real tag on the client side, there is no semantic on termtagger side.

The same img tags are returned by the termtagger then.

Terminology Tags

The TermTagger uses div tags to mark the text containing terminology.

<div class="term preferredTerm transFound exact" data-tbxid="term_01_1_en_1_00003">VisualTranslation</div> 
where each div contains several CSS classes to tag the term with several flags. Also the ID of the term from the TBX file is added to the attribute data-tbxid.

CSS Classes: 

  • term: always set
  • preferredTerm the normativeAuthorization value of the term
  • transFound|transNotFound: flag to mark the term found in target or not. This is buggy on TermTagger side and is corrected by translate5 itself. Effort to fix this in termTagger itself is welcome. We started enhancing stuff in TermTagger itself at the start of this year and the corrections of TermTagger problems in translate5 were done before this.
  • exact: flag if the found term was found exactly or by stemming / fuzzy match.


5. Remove TBX File

Description
Removes a TBX file from the TermTagger memory.

...