Documentation is incomplete (especially regarding possible HTTP errorcodes and error messages from the termtagger).
Base URL:
http://<host>:9001/termTagger
1. Version & Health Check
Description
Checks if the service is running and returns version information.
Request
curl -X GET "http://<host>:9001/termTagger" -H "Accept: text/html"
Response (200 OK, shortened)
<html>
<h1>TermTagger Version Information</h1>
<h2>TermTagger REST Server</h2>
<b>Version:</b> 0.16
<h2>TermTagger:</h2>
<b>Version:</b> 9.01
<h2>OpenTMS Version: </h2>0.2.1
</html>
2. Upload TBX File
Description
Uploads a TBX file to the TermTagger.
The TBX file is referenced using its MD5 hash as its ID.
The header Field x-tbxid is optional for an optional nginx proxy before the termtagger instances to distribute same TBX files to the same instance.
Request
curl -X POST "http://<host>:9001/termTagger/tbxFile/" -H "Content-Type: application/json" -H "x-tbxid: 2b412073d5185fa4b8d7831e0ee6472d" -d '{
"tbxFile": "2b412073d5185fa4b8d7831e0ee6472d",
"tbxdata": "<?xml version=\"1.0\"?><martif>...</martif>"
}'
Response (200 OK)
{
"uuid": "2b412073d5185fa4b8d7831e0ee6472d",
"added": true
}
3. Check TBX File
Description
Checks whether a TBX file with the given ID exists in memory.
Request
curl -I "http://<host>:9001/termTagger/tbxFile/2b412073d5185fa4b8d7831e0ee6472d" -H "x-tbxid: 2b412073d5185fa4b8d7831e0ee6472d"
Response
- 200 OK – File exists
- 404 Not Found – File not found
4. Tag Text with Terms
Description
Sends segments to the TermTagger to find and mark terms from the loaded TBX file.
Batchsize: Multiple segments can be tagged by one call, as bigger the batch size as longer needs the answer. A optimal batch size is somewhere between 5 and 25 Segments, also depending on the segment size.
Request
curl -X POST "http://<host>:9001/termTagger/termTag/" -H "Content-Type: application/json" -H "x-tbxid: 2b412073d5185fa4b8d7831e0ee6472d" -d '{ "tbxFile": "2b412073d5185fa4b8d7831e0ee6472d", "sourceLang": "en", "targetLang": "en", "segments": [ { "id": "387278", "field": "targetEdit", "source": "Outstanding features", "target": "Outstanding features" } ], "debug": 0, "fuzzy": 0, //enables or disables the fuzzy search for terms. Not used in translate5 since the test phase of TermTagger in 2015/16, because of bad results "stemmed": 1,//Apache Lucene stemmer is used to find non-exact matches of terminology. Active in translate5 by default since 2015/16 "fuzzyPercent": 70,//if fuzzy is used: fuzzy rate, how much of the found word and the word in the terminology must match to be the term in questio
"maxWordLengthSearch": 2, //max. word count for fuzzy search "minFuzzyStartLength": 2, //min. number of chars at the beginning of a compared word in the text, which have to be identical to be matched in a fuzzy search "minFuzzyStringLength": 5, //min. char count for words in the text compared in fuzzy search "targetStringMatch": 0, //defines, if in target the stemmer should be active or not (translate5 deactivates it for zh, ja, ko) "task": "{a4393eb5-46a7-4f5e-ba1a-70873c74a7a6}" }'
Response (200 OK, example)
{
"bCorrectRequest": true,
"segments": [
{
"field": "targetEdit",
"id": "387278",
"source": "Outstanding features",
"target": "Outstanding features"
}
],
"tbxFile": "2b412073d5185fa4b8d7831e0ee6472d"
}
Note:
Matches are marked in the HTML of the segments with <div class="term ...">
:
<div class="term preferredTerm transFound exact" data-tbxid="term_01_1_en_1_00003">VisualTranslation</div>
Internal Tags
The termTagger works only properly with HTML Synax, so internal tags in the segment content have to be replaced with img tags:
<img class="content-tag" src="1" alt="TaggingError" />
If the class "content-tag" and alt text "TaggingError" are needed, is currently unclear. But that is at least the way since translate5 sends the img tag placeholders to the TermTagger since years.
The number in the src attribute is an ID to identify the real tag on the client side, there is no semantic on termtagger side.
The same img tags are returned by the termtagger then.
Terminology Tags
The TermTagger uses div tags to mark the text containing terminology.
<div class="term preferredTerm transFound exact" data-tbxid="term_01_1_en_1_00003">VisualTranslation</div>
where each div contains several CSS classes to tag the term with several flags. Also the ID of the term from the TBX file is added to the attribute data-tbxid.
CSS Classes:
- term: always set
preferredTerm
the normativeAuthorization value of the term- transFound|transNotFound: flag to mark the term found in target or not. This is buggy on TermTagger side and is corrected by translate5 itself. Effort to fix this in termTagger itself is welcome. We started enhancing stuff in TermTagger itself at the start of this year and the corrections of TermTagger problems in translate5 were done before this.
- exact: flag if the found term was found exactly or by stemming / fuzzy match.
5. Remove TBX File
Description
Removes a TBX file from the TermTagger memory.
Request
curl -X DELETE "http://<host>:9001/termTagger/tbxFile/2b412073d5185fa4b8d7831e0ee6472d" -H "x-tbxid: 2b412073d5185fa4b8d7831e0ee6472d"
Response
- 200 OK
- Empty body
Example Workflow
# 1. Check service
curl -X GET "http://<host>:9001/termTagger"
# 2. Upload TBX file
curl -X POST "http://<host>:9001/termTagger/tbxFile/" ...
# 3. Process segments
curl -X POST "http://<host>:9001/termTagger/termTag/" ...
# 4. Delete TBX file
curl -X DELETE "http://<host>:9001/termTagger/tbxFile/<id>" ...