Table of Contents
Overview and API introduction
...
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
...
Values | |
---|---|
%service% | Name of service(default - t5memory, could be changed in t5m3mory.conf file |
%tm_name% | Name of Translation Memory |
Example | http://localhost:4040/t5memory/examle_tm/fuzzysearch/? |
Endpoints overview | default endpoint/example | Is async? | ||||
---|---|---|---|---|---|---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | / |
%service%/ | /t5memory/ | ||||
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | / |
t5memory/ | ||||
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | / |
Status of TM
Available end points
%service%/ | /t5memory/ | |||||
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url | |
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status | |
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status | |
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch | |
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch | |
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry | |
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete | |
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms | |
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown | |
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement | |
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources | |
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search |
Available end points
List of TMs | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Returns JSON list of TMs | |||||||||
Request | GET /%service%/ | |||||||||
Params | - | |||||||||
Returns list of open TMs and then list of available(excluding open) in the app.
|
Create TM | |
---|---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
List of TMs
-
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:{
[
{name:examle_tm
},
{name:mem_gt_issue
}
]
} |
Create TM
Required: name, sourceLang
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example
{ "name": "examle_tm",
"sourceLang": "bg-BG"
["data": "base64_encoded_archive_see_import_in_internal_format"]
["loggingThreshold": 0]
}
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 1,
"ErrorMsg": ""
} |
Create/Import TM in internal format
{ "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" }
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example:{ "name": "mem_internal_format", "sourceLang": "bg-BG", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
Response example:{
"name": "examle_tm"
}
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg": ""
} |
Delete TM
-
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:%empty_anyway% |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:%empty_anyway% |
Import provided base64 encoded TMX file into TM
{"tmxData": "base64EncodedTmxFile" }
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example:{
"tmxData": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHRteCB2ZXJzaW9uPSIxLjQiPgogIDxoZWFkZXIgY3JlYXRpb250b29sPSJTREwgTGFuZ3VhZ2UgUGxhdGZvcm0iIGNyZWF0aW9udG9vbHZlcnNpb249IjguMCIgby10bWY9IlNETCBUTTggRm9ybWF0IiBkYXRhdHlwZT0ieG1sIiBzZWd0eXBlPSJzZW50ZW5jZSIgYWRtaW5sYW5nPSJlbi1HQiIgc3JjbGFuZz0iYmctQkciIGNyZWF0aW9uZGF0ZT0iMjAxNTA4MjFUMDkyNjE0WiIgY3JlYXRpb25pZD0idGVzdCIvPgogIDxib2R5PgoJPHR1IGNyZWF0aW9uZGF0ZT0iMjAxODAyMTZUMTU1MTA1WiIgY3JlYXRpb25pZD0iREVTS1RPUC1SNTlCT0tCXFBDMiIgY2hhbmdlZGF0ZT0iMjAxODAyMTZUMTU1MTA4WiIgY2hhbmdlaWQ9IkRFU0tUT1AtUjU5Qk9LQlxQQzIiIGxhc3R1c2FnZWRhdGU9IjIwMTgwMjE2VDE2MTMwNVoiIHVzYWdlY291bnQ9IjEiPgogICAgICA8dHV2IHhtbDpsYW5nPSJiZy1CRyI+CiAgICAgICAgPHNlZz5UaGUgPHBoIC8+IGVuZDwvc2VnPgogICAgICA8L3R1dj4KICAgICAgPHR1diB4bWw6bGFuZz0iZW4tR0IiPgogICAgICAgIDxzZWc+RXRoIDxwaCAvPiBkbmU8L3NlZz4KICAgICAgPC90dXY+CiAgICA8L3R1PgogIDwvYm9keT4KPC90bXg+Cg=="
}Response example:Error in case of errorFrom v0_2_15
{ "%tm_name%":"deleted"} in case of success
Check status of import using status call |
Export TMX from TM
Accept - applicaton/xml
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:<?xml version="1.0" encoding="UTF-8" ?>
<tmx version="1.4">
<header creationtoolversion="0.2.14" gitCommit="60784cf * refactoring and cleanup" segtype="sentence" adminlang="en-us" srclang="en-GB" o-tmf="t5memory" creationtool="t5memory" datatype="xml" />
<body>
<tu tuid="1" datatype="xml" creationdate="20190401T084052Z">
<prop type="tmgr:segNum">10906825</prop>
<prop type="tmgr:markup">OTMXML</prop>
<prop type="tmgr:docname">none</prop>
<tuv xml:lang="en-GB">
<prop type="tmgr:language">English(U.K.)</prop>
<seg>For > 100 setups.</seg>
</tuv>
<tuv xml:lang="de-DE">
<prop type="tmgr:language">GERMAN(REFORM)</prop>
<seg>Für > 100 Aufstellungen.</seg>
</tuv>
</tu>
</body>
</tmx> |
Export in internal format
application/zip
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:%binary_data% |
Get the status of TM
|
|
|
|
|
Create/Import TM in internal format | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name | |||||||||
Request | POST /%service%/ | |||||||||
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } | |||||||||
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder
|
Clone TM localy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Creates TM with the provided name | |||||||||
Request | Post /%service%/%tm_name%/clone | |||||||||
Params | Required: name, sourceLang | |||||||||
Endpoint is sync(blocking)
|
Delete TM | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Purpose | Deletes .TMD, .TMI, .MEM files | ||||||||||||||||||
Request | Delete /%service%/%tm_name%/ | ||||||||||||||||||
Params | - | ||||||||||||||||||
Fuzzy search | |||||||||||||||||||
Purpose | Returns enrties\translations with small differences from requested | ||||||||||||||||||
Request | POST /%service%/%tm_name%/fuzzysearch | Params |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example:
Request example:
{ "sourceLang":"en-GB", "targetLang":"de", "source":"For > 100 setups.", ["documentName":"OBJ_DCL-0000000845-004_pt-br.xml"], ["segmentNumber":""], ["markupTable":"OTMXUXLF"], ["context":"395_408"], ["numOfProposals":20], ["loggingThreshold": 0]}
Response example:
Success:
{
"ReturnValue": 0,
"ErrorMsg": "",
"NumOfFoundProposals": 1,
"results": [
{
"source": "For > 100 setups.",
"target": "Für > 100 Aufstellungen.",
"segmentNumber": 10906825,
"id": "",
"documentName": "none",
"documentShortName": "NONE",
"sourceLang": "en-GB",
"targetLang": "de-DE",
"type": "Manual",
"matchType": "Exact",
"author": "",
"timestamp": "20190401T084052Z",
"matchRate": 100,
"markupTable": "OTMXML",
"context": "",
"additionalInfo": ""
}
]
}
Not found:
{
"ReturnValue": 133,
"ErrorMsg": "OtmMemoryServiceWorker::concordanceSearch::"
}
|
Concordance search
Required: searchString - what we are looking for , searchType ["Source"|"Target"|"SourceAndTarget"] - where to look
iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5'
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
RequestResponse example: success: { "searchStringnewBtree3_cloned2": "Thedeleted" }, "searchType": "source", ["searchPosition": "",] |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example: failed: { ["numResultsnewBtree3_cloned2": 20,] ["msSearchAfterNumResults": 250,] ["loggingThreshold": 0] } Response example:Success: { "ReturnValue": 0, "NewSearchPosition": null, "results": [ { "source": "For > 100 setups.", "target": "Für > 100 Aufstellungen.", "segmentNumber": 10906825, "id": "", "documentName": "none", "documentShortName": "NONE", "sourceLang": "en-GB",← rfc5646 "targetLang": "de-DE",← rfc5646 "type": "Manual", "matchType": "undefined", "author": "", "timestamp": "20190401T084052Z", "matchRate": 0, "markupTable": "OTMXML", "context": "", "additionalInfo": "" } ], "ErrorMsg": "" } Success, but with NewSearchPosition - not all TM was checked, use this position to repeat search: { "ReturnValue": 0, "NewSearchPosition": "8:1", "results": [ { "source": "For > 100 setups.", "target": "Für > 100 Aufstellungen.", "segmentNumber": 10906825, "id": "", "documentName": "none", "documentShortName": "NONE", "sourceLang": "en-GB", "targetLang": "de-DE", "type": "Manual", "matchType": "undefined", "author": "", "timestamp": "20190401T084052Z", "matchRate": 0, "markupTable": "OTMXML", "context": "", "additionalInfo": "" } ], "ErrorMsg": "" } SearchPosition / NewSearchPositionFormat: "7:1" First is segmeng\record number, second is target number The NextSearchposition is an internal key of the memory for the next position on sequential access. Since it is an internal key, maintained and understood by the underlying memory plug-in (for EqfMemoryPlugin is it the record number and the position in one record), no assumptions should be made regarding the content. It is just a string that, should be sent back to OpenTM2 on the next request, so that the search starts from there. So is the implementation in Translate5: The first request to OpenTM2 contains SearchPosition with an empty string, OpenTM2 returns than a string in NewSearchPosition, which is just resent to OpenTM2 in the next request. Not found:{ "ReturnValue": 0, "NewSearchPosition": null, "ErrorMsg": "" }TM not found:{ "ReturnValue": 133, "ErrorMsg": "OtmMemoryServiceWorker::concordanceSearch::" } |
Update entry
Only sourceLang, targetLang, source and target are required
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example:
{
"source": "The end",
"target": "The target",
"sourceLang": "en",
"targetLang": "de",
["documentName": "Translate5 Demo Text-en-de.xlf"],
["segmentNumber": 8,]
["author": "Thomas Lauria"],
["timeStamp": "20210621T071042Z"],
["context": "2_2"],
["addInfo": "2_2"],
["type": "Manual"],
["markupTable": "OTMXUXLF"],
["loggingThreshold": 0]
}
Response example:
|
Delete entry
Only sourceLang, targetLang, source, and target are required
Deleting based on strict match(including tags and whitespaces) of target and source
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Request example:
{
"sourceLang": "bg",
"targetLang": "en",
"source": "The end",
"target": "Eth dne"
["documentName": "my file.sdlxliff",]
["segmentNumber": 1,]
["markupTable": "translate5",]
["author": "Thomas Lauria",]
["type": "",]
["timeStamp": ""],
["context": "",]
["addInfo": ""] , ["loggingThreshold": 0]
}
|
Save all TMs
Flushes all filebuffers(TMD, TMI files) into the filesystem. Reset 'Modified' flags for file buffers.
Filebuffer is a file instance of .TMD or .TMI loaded into RAM. It provides better speed and safety when working with files.
-
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:{
'saved files': '/home/or/.t5memory/MEM/bg_internal_format.TMD; /home/or/.t5memory/MEM/bg_internal_format.TMI; /home/or/.t5memory/MEM/mem_gt_issue.TMD; /home/or/.t5memory/MEM/mem_gt_issue.TMI; EQFSYSW.PRP; '
}
List of saved files |
Shutdown service
dontsave=1(optional in address) - skips saving tms, for now value doesn't matter, only presence
If try to save tms before closing, would check if there is still import process going on
If there is some, would wait 1 second and check again.
Repeats last step up to 10 min, then closes service anyway.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Response example:%Empty% |
Test tag replacement call
Required: src, trg,
Optional: req
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Fuzzy search tag replacement test:
Request example:
{
"src": "Tap <ph x='1'/>View <ph x='2' />o<bpt i='1' x='3'/> get <ph x='4'>strong</ph>displayed<ph x='5'>View</ph> two strong<ept i='1' x='6'/>US patents.",
"trg": "View <ph x='1'/> tap <ph x='2' />to<bpt i='1' x='3'/> got <ph x='4'>strong</ph>dosplayd<ph x='5'>Veiw</ph> two strong<ept i='1' x='6'/>US patents.",
"req": "Tap <x id='123'/>View <x id='222' />o<g> get <x id='44'>strong</x>displayed<x id='51'>View</x> two strong</g>US patents."
}
Response example:
//'1' - request result
//'2' - src result
//'3' - trg result
{
'1' :'Tap <x id="123"/>View <x id="222"/>o<bx/> get <x id="44"/>displayed<x id="51"/> two strong<ex/>US patents.',
'2' :'Tap <x id="123"/>View <x id="222"/>o<g> get <x id="44"/>displayed<x id="51"/> two strong</g>US patents.',
'3' :'View <x id="123"/> tap <x id="222"/>to<g> got <x id="44"/>dosplayd<x id="51"/> two strong</g>US patents.',
};
Import tag replacement test:
Request example:
{
"src": "Tap <ph/>View <ph/>o<bpt/> get <ph>strong</ph>displayed<ph>View</ph> two strong<ept/>US patents.",
"trg": "View <ph/> tap <ph/>to<bpt/> got <ph>strong</ph>dosplayd<ph>Veiw</ph> two strong<ept/>US patents.",
}
Response example:
{
'1' :'Tap <ph x="1"/>View <ph x="2"/>o<bpt x="3" i="1"/> get <ph x="4"/>displayed<ph x="5"/> two strong<ept x="6" i="1"/>US patents.',
'2' :'View <ph x="1"/> tap <ph x="2"/>to<bpt x="3" i="1"/> got <ph x="4"/>dosplayd<ph x="5"/> two strong<ept x="6" i="1"/>US patents.',
};
|
...
Logging
...
logging values of variables. Wouldn't delete temporary files(In MEM and TMP subdirectories), like base64 encoded\decoded tmx files and archives for import\export
...
you shouldn't reach this code, something is really wrongOther values would be ignored. The set level would stay the same till you change it in a new request or close the app. Logs suppose to be written into a file with date\time name under ~/.OtmMemoryService/Logs and errors/fatal are supposed to be duplicated in another log file with FATAL suffices
...
- Logs only things like begin\end of request etc. No purpose to setup this hight
Logging could impact application speed very much, especially during import or export.
You can setup the logging level from the config file or in any POST JSON request by attaching a parameter to a JSON object
[loggingThreshold:"2"]
Like here
POST http://localhost:4040/t5memory/example_tm/
{
sourceLang: “en”, // the source language is required for a new TM
name: „TM Name“,
loggingThreshold:"2"
}
This would set the logging level to INFO just before the main work of creating mem endpoint starts
...
Working directory
...
lIncludes log files. It should be cleanup manualy. One session(launch of service) creates two files Log_Thu May 12 10:15:48 2022 .log and Log_Thu May 12 10:15:48 2022 .log_IMPORTANT
Last have logs reduced to level Warning and higher.
...
Config file
...
service port
...
Ram limit to operate openning\closing TM(see Openning and closing TM)
Doesn't include services RAM
in Megabytes
...
Level of pre-fuzzy search filtering based on combinations of triples of tokens(excluding tags). Could impact fuzzy search perfomance. For higher values service is faster, but could skip some segments in result. Not always corelated with resulted fuzzyRate
Config file should be located under ~/.t5memory/t5memory.conf
Anyway, all field has default values so the service could start without the conf file
Reading\applying configs happen only once at service start
Once service started you should be able to see setup values in logs.
Config file example:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
name=t5memory
port=4040
timeout=3600
threads=1
logLevel=0
AllowedRAM_MB=200
TriplesThreshold=5
|
Conceptional information
...
Openning and closing TM
In first concept it was planned to implement routines to open and close a TM. While concepting we found some problemes with this approach:
- First one is the realization: opening and closing a TM by REST would mean to update the TM Resource and set a state to open or close. This is very awkward.
- Since in translate5 multiple tasks can be used to the same time, multiple tasks try to access one TM. Closing TMs is getting complicated to prevent race conditions in TM usage.
- Since OpenTM2 loads the whole TM in memory, OpenTM2 must control itself which TMs are loaded or not.
This leads to the following conclusion in implementation of opening and closing of TMs:
OpenTM2 has to automatically load the requested TMs if requested. Also OpenTM2 has to close the TMs after a TM was not used for some time. That means that OpenTM2 has to track the timestamps when a TM was last requested.
...
http://opentm2/translationmemory/[TM_Name]/openHandle
GET – Opens a memory for queries by OpenTM2
Note: This method is not required as memories are automatically opened when they are accessed for the first time.
http://opentm2/translationmemory/[TM_Name]/openHandle
DELETE – Closes a memory for queries by OpenTM2
Note: This method is not required as memories are automatically opened when they are accessed for the first time.
|
Import provided base64 encoded TMX file into TM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call | |||||||||
Request | POST /%service%/%tm_name%/import | |||||||||
Params | {"tmxData": "base64EncodedTmxFile" }
| |||||||||
TM must exist Handling if framing tag situation differs from source to target - for skipAll or skipPairedIf framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed.
|
Reorganize TM | |
---|---|
Purpose | Reorganizes tm and fixing issues. |
Request | GET /%service%/%tm_name%/reorganize |
Headers | Accept - applicaton/xml |
up to v0.4.x reorganize is sync, so t5memory reorganize would check this condition
, and in case if this condition is true and then it passes segment to putProposal function, which is also used by UpdateRequest and ImportTmx request, so other
{ |
Export TMX from TM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Creates TMX from tm. | |||||||||
Request | GET /%service%/%tm_name%/ | |||||||||
Headers | Accept - applicaton/xml | |||||||||
|
Export in internal format | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Creates and exports archive with .TMD, .TMI, .MEM files of TM | |||||||||
Request | GET /%service%/%tm_name%/ | |||||||||
Headers | application/zip | |||||||||
returns archive(.tm file) consists with .tmd and .tmi files
|
Get the status of TM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Request | GET /%service%/%tm_name%/status | |||||||||
Params | - | |||||||||
Would return status of TM. It could be 'not found', 'available' if it's on the disk but not loaded into the RAM yet, and 'open' with additional info. In case if there was at least one try to import tmx or reorganize tm since it was loaded into the RAM, additional fields would appear and stay in the statistics till memory would be unloaded.
|
Fuzzy search | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Purpose | Returns enrties\translations with small differences from requested | ||||||||||||||||||
Request | POST /%service%/%tm_name%/fuzzysearch | ||||||||||||||||||
Params | Required: source, sourceLang, targetLang iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5' | ||||||||||||||||||
|
New Concordance search | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Purpose | Returns entries\translations that fits selected filters. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Request | POST /%service%/%tm_name%/search | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Params | Required: NONE iNumOfProposal - limit of found proposals - max is 200, if 0 → use default value '5' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Search is made segment-by segment, and it's checking segment if it fits selected filters. You can search for EXACT or CONCORDANCE matches in this fields: "Filters":" It's possible to apply filter just with SearchMode, like if you would type "authorSearchMode": "exact",but there would be no "author" field, it would look for segments, where author field is empty. "timestampSpanStart":"20000121T115234Z", You should set both parameters to apply filter, otherwise you would get error as return. Check output to see how it was parsed and applied. "logicalOr": 1, Instead of returning segments, just count them and return counter in "NumOfFoundSegments":22741 "sourceLang":"en-GB", Lang filters could be applied with major lang feature, so source lang in this case would be applied as exact filter for source lang, but target lang would check if langs is in the same lang group. That check is done in languages.xml file with isPreferred flag. "GlobalSearchOptions":"SEARCH_FILTERS_LOGICAL_OR|SEARCH_EXACT_MATCH_OF_SRC_LANG_OPT, lang = en-GB|SEARCH_GROUP_MATCH_OF_TRG_LANG_OPT, lang = de", Other that you can send is: "searchPosition":"8:1", So search position is position where to start search internaly in btree. This search is limited by num of found segment(set by numResults) or timeout(set by msSearchAfterNumResults), but timeout would be ignored in case if there are no segments in the tm to fit params. Max numResults is 200. from responce.
Here is search request with all possible parameters: "source":"the", "sourceSearchMode":"CONTAINS, CASEINSENSETIVE, WHITESPACETOLERANT, INVERTED", "target":"", "targetSearchMode":"EXACT, CASEINSENSETIVE", "document":"evo3_p1137_reports_translation_properties_de_fr_20220720_094902", "documentSearchMode":"CONTAINS, INVERTED", "author":"some author", "timestampSpanStart": "20000121T115234Z", "timestampSpanEnd": "20240121T115234Z", "addInfo":"some add info", "addInfoSearchMode":"CONCORDANCE, WHITESPACETOLERANT", "context":"context context", "contextSearchMode":"EXACT", "sourceLang":"en-GB", "targetLang":"SV", "searchPosition": "8:1", "numResults": 2, "msSearchAfterNumResults": 25, So request with this body would also work:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Concordance search | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Returns entries\translations that contain requested segment | |||||||||
Request | POST /%service%/%tm_name%/concordancesearch | |||||||||
Params | Required: searchString - what we are looking for , searchType ["Source"|"Target"|"SourceAndTarget"] - where to look iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5' | |||||||||
|
Update entry | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Purpose | Updates entry\translation | ||||||||||||||||||
Request | POST /%service%/%tm_name%/entry | ||||||||||||||||||
Params | Only sourceLang, targetLang, source and target are required | ||||||||||||||||||
This request would made changes only in the filebuffer(so files on disk would not be changed)
|
Delete entry | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Deletes entry\translation | |||||||||
Request | POST /%service%/%tm_name%/entrydelete | |||||||||
Params | Only sourceLang, targetLang, source, and target are required Deleting based on strict match(including tags and whitespaces) of target and source | |||||||||
This request would made changes only in the filebuffer(so files on disk would not be changed)
|
Delete entries / mass deletion | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Deletes entries\translation | |||||||||
Request | POST /%service%/%tm_name%/entriesdelete | |||||||||
Params | This would start reorganize process which would remove like reorganize bad segments and also would remove segments that gives true when checking with provided filters combined with logical AND. So if you provide timestamps and addInfo, only segments within provided timestamp and with that addInfo would not be imported to new TM(check reorganize process). | |||||||||
|
Save all TMs | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Flushes all filebuffers(TMD, TMI files) into the filesystem. Reset 'Modified' flags for file buffers. Filebuffer is a file instance of .TMD or .TMI loaded into RAM. It provides better speed and safety when working with files. | |||||||||
Request | GET /%service%_service/savetms | |||||||||
Params | - | |||||||||
|
Shutdown service | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Safely shutting down the service with\without saving all loaded tm files to the disk | |||||||||
Request | GET /%service%_service/shutdown?dontsave=1 | |||||||||
Params | dontsave=1(optional in address) - skips saving tms, for now value doesn't matter, only presence | |||||||||
If try to save tms before closing, would check if there is still import process going on
|
Test tag replacement call | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Purpose | Updates entry\translation | |||||||||
Request | POST /%service%_service/tagreplacement | |||||||||
Params | Required: src, trg, Optional: req | |||||||||
|
Configuration of service
You can configure the service in ~/.t5service/t5memory.conf
Logging | ||
---|---|---|
Level | Mnemonic | Description |
0 | DEVELOP | could make code work really slow, should be used only when debugging some specific places in code, like binary search in files, etc. |
1 | DEBUG | logging values of variables. Wouldn't delete temporary files(In MEM and TMP subdirectories), like base64 encoded\decoded tmx files and archives for import\export |
2 | INFO | logging top-level functions entrances, return codes, etc. Default value. |
3 | WARNING | logging if we reached some commented or hardcoded code. Usually commented code here is replaced with new code, and if not, it's marked as ERROR level |
4 | ERROR | errors, why and where something fails during parsing, search, etc |
5 | FATAL | you shouldn't reach this code, something is really wrongOther values would be ignored. The set level would stay the same till you change it in a new request or close the app. Logs suppose to be written into a file with date\time name under ~/.OtmMemoryService/Logs and errors/fatal are supposed to be duplicated in another log file with FATAL suffices |
6 | TRANSACTION | - Logs only things like begin\end of request etc. No purpose to setup this hight |
Logging could impact application speed very much, especially during import or export. In t5memory there are 2 systems of logs - one from glog library and could be set in launch as commandline parameter and one is internal to filter out logs based on their level, can be set with every request that have json body with additional ["loggingThreshold": 0] parameter or at startup with flag. POST http://localhost:4040/t5memory/example_tm/ { Or in t5memory.conf file in line (config file is obsolete now) |
Working directory | |
---|---|
Path | Description |
~/.t5memory | The main directory of service. Should always be under the home directory. Consists of nested folders and t5memory.conf file(see Config file). All directories\files below are nested |
LOG | lIncludes log files. It should be cleanup manualy. One session(launch of service) creates two files Log_Thu May 12 10:15:48 2022 .log and Log_Thu May 12 10:15:48 2022 .log_IMPORTANT |
MEM | Main data directory. All tm files is stored here. One TM should include .TMD(data file), .TMI(index file), .MEM(properties file) with the same name as TM name |
TABLE | Services reserved readonly folder with tagtables, languages etc. |
TEMP | For temporary files that were created for mainly import\export. On low debug leved(DEVELOP, DEBUG) should be cleaned manualy |
t5memory.conf | Main config file(see config file) |
Config directory should be located in a specific place |
Config file - obsolete - use commandline flags instead | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
field | default | Description | |||||||||
name | t5memory | name of service that we use under %service% in address | |||||||||
port | 8080 | service port | |||||||||
timeout | 3600 | service timeout | |||||||||
threads | 1 | ||||||||||
logLevel | 2 | logLevel - > see logging | |||||||||
AllowedRAM_MB | 1500 | Ram limit to operate openning\closing TM(see Openning and closing TM) Doesn't include services RAM | |||||||||
TriplesThreshold | 33 | Level of pre-fuzzy search filtering based on combinations of triples of tokens(excluding tags). Could impact fuzzy search perfomance. For higher values service is faster, but could skip some segments in result. Not always corelated with resulted fuzzyRate | |||||||||
Config file should be located under ~/.t5memory/t5memory.conf Anyway, all field has default values so the service could start without the conf file Reading\applying configs happen only once at service start Once service started you should be able to see setup values in logs.
|
Conceptional information
Openning and closing TM | |
---|---|
In first concept it was planned to implement routines to open and close a TM. While concepting we found some problemes with this approach:
This leads to the following conclusion in implementation of opening and closing of TMs: OpenTM2 has to automatically load the requested TMs if requested. Also OpenTM2 has to close the TMs after a TM was not used for some time. That means that OpenTM2 has to track the timestamps when a TM was last requested.
http://opentm2/translationmemory/[TM_Name]/openHandle GET – Opens a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time. http://opentm2/translationmemory/[TM_Name]/openHandle DELETE – Closes a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time.
|
TM files structure and other related info | ||
---|---|---|
Info below is actual for version 0_5_x TM file is just archive with tmi and tmd files. |
NUMBER PROTECTION TAGS (NP TAG, t5:n) | ||
---|---|---|
NP Feature is also implemented in tagReplacer, but it has other branch in code - for import it's just saves original id, r and n attributes, without generating new, for fuzzy requests it's just outputs original data without searching for mathing tag in src and trg. So NP tags is influence ID generation for other tags(or matching if it's trg segment). "Press the encodedRegex, power button to turn on <bpt id="501" rid="1"/>text<ept rid="1"/>" |
Tag replacement
Pseudocode for tag replacement in import call:
TAG_REPLACEMENT PSEUDO CODE
This is the pseudo code, that was used as a discussion base for finding the right algorithm for implementation. It was not exactly implemented like this, but it's logic should be valid and can be used to understand, what should be going on.
...
TAG REPLACEMENT
Tag replacement
Pseudocode for tag replacement in import call:
TAG_REPLACEMENT PSEUDOCODE
Pseudocode for tag replacement in import call:
...
!!!CONSIDER THAT WE SHOULD HAVE IN SOURCE SEGMENT ONLY 3 TYPES OF TAGS - PH_ELEMENT, BPT_ELEMENT and EPT_ELEMENT, because all of them was regenerated with their attributes at import stage
At this point we read the source and target segments "as is", without any tag replacement in lists. so original_id would be id, that was generated_id at import stage.
SOURCE_SEGMENT{
<ph x="1" />{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
matchingTag.generated_tagType == PH_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" />
else
maybe just return <x/> tag?
save tag in SOURCE_TAGS
}
<bpt i="1" x="2"/> {
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.generated_tagType == BPT_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
set our_tag.generated_i = matchingTag.original_i
if matchingTag.original_tagType == BX_ELEMENT // do BX_ELEMENT always have id and rid attributes provided?
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
else:
[rid="{our_tag.generated_id}"] - means optional, so for example if it's bigger than 0, then we should add this attribute
use that that data to generate tag like <our_tag.generated_tagType [id="{our_tag.generated_id}"] [rid="{our_tag.generated_id}"] >
else
maybe just return <bx/> tag?
save tag in SOURCE_TAGS
}
<ept i="1" /> {
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.generated_tagType == EPT_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id // id should hold information about paired BPT_ELEMENT, or it's absence
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
set our_tag.generated_i = matchingTag.original_i
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
if matchingTag.original_tagType == EX_ELEMENT // do EX_ELEMENT always have id and rid attributes provided?
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
else:
[ rid ="{our_tag.generated_id}"] - means optional, so for example if it's bigger than 0, then we should add this attribute
use that that data to generate tag like </our_tag.generated_tagType>
else
maybe just return <ex/> tag? or add some specific attributes?
save tag in SOURCE_TAGS attributes provided?
}
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////NEW PSEUDO CODE//////////////////////////////////////////////////////////////////////////////////////////////////
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
else:
[rid="{our_tag.generated_id}"] - means optional, so for example if it's bigger than 0, then we should add this attribute
use that that data to generate tag like </our_tag.generated_tagType>
else
maybe just return <ex/> tag? or add some specific attributes?
save tag in SOURCE_TAGS
}
}
}
NEW PSEUDO CODE
This is the code, actually implemented///////////////////////////////////////////////////////
Tag replacement feature implementation is splited into 2 functions:
GenerateReplacingTag - input - tagType, attributeList
output - tagInfo
this function would generate tagInfo data structure that saves original data(tagType, attributes(i\rid and x\id only) and would generate new data that suits context\segment
PrintTag - input - tagInfo
- output - text representation of tag with attributes depending on context
this function would print tag with attributes(if they exist(bigger than 0). If it's fuzzy call, would replace for source and target segments tags with matching tags from fuzzy search request.
If matching tag not found - would generate new tag in xliff format with id or rid attributes that rising starting from biggest id and rid values +1 that was present in requested segment
for fuzzy search request segment this function would pring tag with generated data - that is never used in production, but can be used to find out how mechanism normalized input fuzzy search request segment
(we base tag matching on this normalization.)
...
Previous documentation:
|
...