Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
...
Values
...
Name of Translation Memory
...
Endpoints overview
...
default endpoint/example
...
Is async?
...
Creates TM with the provided name
...
Status of TM
...
Updated for v0.6.75
To configure t5memory use it's commandline flags. To list all the flags start t5memory with help flag: ./t5memory --help. All flags related to t5memory and not libraries is under otmd.cpp section. Also you can send get request to t5memory_service/flags - it would print all the flags with description and current and default values. Here are those flags
Flags from /home/or/workspace/translate5/translate5-tm-service-source/source/otmd.cpp:
-add_premade_socket (if set to true, socket instance would be created
outside of proxygen and then binded, that made possible to add tcp backog
event handler and use socket_backog option) type: bool default: false
currently: true
-allowLoadingMultipleTmsSimultaneously (If set to true, multiple tms could
be loaded from the disk at the same time. ) type: bool default: false
-allowedram (Sets amought RAM(in MB) allowed for service to use)
type: int64 default: 5000
-allowedtmdsize (Sets max size of tmd file(in MB) after which t5m would not
allow to add new data to the tm) type: int64 default: 500
-debug_sleep_in_request_run (If set, provide artificial delay in every
request handling execution equal to provided num of microseconds)
type: int64 default: 0 currently: 10000000
-disable_aslr (If set to true, process personality would be set to
ADDR_NO_RANDOMIZE) type: bool default: false currently: true
-enable_newlines_in_logs ((not working)if set to true, would keep newline
symbols in the logs, otherwise(by default) newlines would be removed and
logs would be oneliners) type: bool default: false
-flush_tm_at_shutdown (If set to true, flushes tm when shutting down the
app not using shutdown request) type: bool default: false
-flush_tm_to_disk_with_every_update (If set to true, flushes tm to disk
with every successfull update request) type: bool default: false
-forbiddeletefiles (Set to true to keep all files(including temporary and
tm)) type: bool default: false
-http_listen_backlog (Sets http options listen backog) type: int64
default: 128 currently: 32
-ignore_newer_target_exists_check (if set to true, check for newer already
saved target would be skipped for saving segments) type: bool
default: true
-keep_tm_backups (if set to true, when saving tmd and tmi files, old copies
would be saved with .old suffix) type: bool default: true
-limit_num_of_active_requests (If set to true, it would be possible to
handle only up to servicethreads-1 requests at the same time, the last
thread would respond with 503 to eliminate creating queue of requests
waiting to be handled.) type: bool default: false
-logMutexes (if set to true you would see mutex logs) type: bool
default: false
-log_every_request_end (Sets log for every request end with it's url,
method etc...) type: bool default: false
-log_every_request_start (Sets log for every request call with it's url,
method etc...) type: bool default: false
-log_memmove_in_compareputdata (if set to true, when saving segment and
causing memmove in compareputdata functions, just before memmove, data
would be logged - use this to debug btree crashes.) type: bool
default: false
-log_tcp_backog_events (if set to true, tcp backlog events would be
logged(to enable, add_premade_socket flag should be set to true))
type: bool default: false currently: true
-port (What port to listen on) type: int32 default: 4080
-servicename (Sets service name to use in url) type: string
default: "t5memory"
-servicethreads (Sets amought of worker threads for service) type: int32
default: 5
-socket_backlog (Sets proxygen socket listen backog(disabled, to enable set
add_premade_socket=true)) type: int64 default: 1024 currently: 32
-t5_ip (Which ip to use in t5memory(default is any). Should be in format
'1.1.1.1', default is to listen to all available ip) type: string
default: ""
-t5loglevel (Sets t5memory log level threshold from DEVELOP(0) to
TRANSACTION(6)) type: int32 default: 2 currently: 3
-timeout (Sets timeout for service request handling) type: int32
default: 180000
-tmListLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for TM
list(which is used to open and close tms, and hold list of opened tms),
after which operation would be canceled and mutex would return an error,
if set to 0, mutex lock would be waited without timeout) type: int64
default: 3000
-tmLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for TM after which
operation would be canceled and mutex would return an error, if set to 0,
mutex lock would be waited without timeout) type: int64 default: 3000
-tmRequestLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for part
where request is requesting tm(which is used to open and close tms, and
hold list of opened tms), after which operation would be canceled and
mutex would return an error, if set to 0, mutex lock would be waited
without timeout) type: int64 default: 3000
-triplesthreshold (Sets threshold to pre fuzzy filtering based on hashes of
neibour tokens) type: int32 default: 5
-useTimedMutexesForReorganizeAndImport (If set to true, in reorganize or
import thread would be used mutexes with timeouts, and reorganizee or
import could be canceled, false(by default) - would be used non timed
mutexes) type: bool default: false
-wait_for_import_and_reorganize_requests (If set to true, waiting for all
import and reorganize processes to be done at shutdown when not using
shutdown request) type: bool default: true
Hints:
- -debug_sleep_in_request_run would add delay to every requests
- In theory you can restore tm with only tmd file using reorganize, but in case if there would be some issue during reorganize, tm in ram would be in instable state, so keep original tmd backed up anyway
- you can set filter for requests using --t5loglevel from 0 to 6. --v could be set to 0 or 2. if set to 0 only errors would be logged and transaction log level would be mapped to info.
- keep_tm_backups in case of flushing to the disk, older version would be kept with .old suffix, enabled by default
- triplesthreshold have big impact on fuzzy search speed, but if you set it to too big value, some good matches could be filtered out. in old opentm2 value was, I think, 33.
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Endpoints overview | default endpoint/example | Is async? |
---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ |
|
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ |
|
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ |
|
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url |
|
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status |
|
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status |
|
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch |
|
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch |
|
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry |
|
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete |
|
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms |
|
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown |
|
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement |
|
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources |
|
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search |
|
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry |
|
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ ("multipart/form-data") | /t5memory/tm1/ ("multipart/form-data") |
|
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx |
|
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
|
26 | Flush tm | If tm is open, flushes it to the disk(implemented in 0.6.33) | GET | /%service%/%tm_name%/flush | /t5memory/tm1/flush |
|
27 | Flags | Return all available commandline flags(implemented in 0.6.47). Do not spam too much because gflags documentation says that that's slow. Useful to collect configuration data about t5memory to do debugging. | GET | /%service%_service/flags | /t5memory_service/flags |
|
Available end points
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app |
...
/%service%_service/resources
...
/t5memory_service/resources
...
/%service%/%tm_name%/importlocal
...
/t5memory/%tm_name%/importlocal
...
+
...
/%service%/%tm_name%/entriesdelete
...
/t5memory/tm1/entriesdelete
...
+
...
/%service%/%tm_name%/search
...
/t5memory/tm1/search
...
/%service%/%tm_name%/getentry
...
/t5memory/tm1/getentry
...
/%service%/%tm_name%/importtmx
...
/t5memory/tm1/tmporttmx
...
+
...
/%service%/%tm_name%/
("multipart/form-data")
...
/t5memory/tm1/
("multipart/form-data")
...
/%service%/%tm_name%/download.tmx
...
/t5memory/tm1/download.tmx
...
/%service%/%tm_name%/download.tm
...
/t5memory/tm1/download.tm
...
/%service%/%tm_name%/flush
...
/t5memory/tm1/flush
...
/%service%_service/flags
...
/t5memory_service/flags
Available end points
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app.
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Response example:
{
"Open": [
{
"name": "mem2"
}
],
"Available on disk": [
{
"name": "mem_internal_format"
},
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
Create TM |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Request example
{ "name": "examle_tm", // this name would be used as filename for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as
version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version.
This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folderIf there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xmlStarting from 0.6.52 import in internal format supporst multipart/form data, so you can send then both file and json_body. In json_body only "name" attribute is required(sourceLang would be ignored anyway).
Send it in a same way as streaming import TMX. Json body should be in pretty formatting and in a part called json_body to be parsed correctly. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
|
Requestname"mem_internal_format", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
Response example:{
"name": "mem2"
}
],
"Available on disk": [
{
"name": " |
|
examletm
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg,
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
|
}Clone TM localy |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ " |
|
newNamewhencloning, cloned tm renamedtothisname(source tm is in url)
}
Response example:
Success:
{
"msg": "newBtree3_cloned2 was cloned successfully",
"time": "5 ms"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "'dstTmdPath' = /home/or/.t5memory/MEM/newBtree3_cloned.TMD already exists; for request for mem newBtree3; with body = {\n \"newName\": \"newBtree3_cloned\"\n}"
}Flush TM |
---|
Purpose | If TM is open, flushes it to the disk |
Request | Get /%service%/%tm_name%/flush |
Params | Endpoint is sync(blocking)
If tm is not found on the disk - returns 404
If tm is not open - returns 400 with message
Then t5m requests write pointer to the tm(so it waits till other requests that's working with the tm would finish) and then it flushes it to the disk
Could also return an error if flushing got some issue.
Would not open the tm, if it's not opened yet, but instead would return an error.
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Response example:
Success: {
"msg": "Mem test1 was flushed to the disk successfully"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not found"
}// or
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not open"
} |
Delete TM |
---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version. This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder If there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xml
Starting from 0.6.52 import in internal format supporst multipart/form data, so you can send then both file and json_body. In json_body only "name" attribute is required(sourceLang would be ignored anyway). Send it in a same way as streaming import TMX. Json body should be in pretty formatting and in a part called json_body to be parsed correctly. |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Response example:
success:
{
"newBtree3_cloned2": "deleted"
},
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
|
Response:
failed
newBtree3cloned2:not found"
}data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
Response example:{
"name": "examle_tm"
}
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg": ""
} |
|
Clone TM localy |
---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) |
Import provided base64 encoded TMX file into TM |
---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/import |
Params | {"tmxData": "base64EncodedTmxFile" }
additional:
"framingTags":
"saveAll" - default behaviour, do nothing
"skipAll" - skip all enclosing tags, including standalone tags
"skipPaired" - skip only paired enclosing tags
TM must exist It's async, so check status using status endpoint, like with reorganize in 0.5.x and up If framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:
{
[ "framingTagsnewName": "skipAll"["skipPaired", "saveAll"],]
"tmxData": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHRteCB2ZXJzaW9uPSIxLjQiPgogIDxoZWFkZXIgY3JlYXRpb250b29sPSJTREwgTGFuZ3VhZ2UgUGxhdGZvcm0iIGNyZWF0aW9udG9vbHZlcnNpb249IjguMCIgby10bWY9IlNETCBUTTggRm9ybWF0IiBkYXRhdHlwZT0ieG1sIiBzZWd0eXBlPSJzZW50ZW5jZSIgYWRtaW5sYW5nPSJlbi1HQiIgc3JjbGFuZz0iYmctQkciIGNyZWF0aW9uZGF0ZT0iMjAxNTA4MjFUMDkyNjE0WiIgY3JlYXRpb25pZD0idGVzdCIvPgogIDxib2R5PgoJPHR1IGNyZWF0aW9uZGF0ZT0iMjAxODAyMTZUMTU1MTA1WiIgY3JlYXRpb25pZD0iREVTS1RPUC1SNTlCT0tCXFBDMiIgY2hhbmdlZGF0ZT0iMjAxODAyMTZUMTU1MTA4WiIgY2hhbmdlaWQ9IkRFU0tUT1AtUjU5Qk9LQlxQQzIiIGxhc3R1c2FnZWRhdGU9IjIwMTgwMjE2VDE2MTMwNVoiIHVzYWdlY291bnQ9IjEiPgogICAgICA8dHV2IHhtbDpsYW5nPSJiZy1CRyI+CiAgICAgICAgPHNlZz5UaGUgPHBoIC8+IGVuZDwvc2VnPgogICAgICA8L3R1dj4KICAgICAgPHR1diB4bWw6bGFuZz0iZW4tR0IiPgogICAgICAgIDxzZWc+RXRoIDxwaCAvPiBkbmU8L3NlZz4KICAgICAgPC90dXY+CiAgICA8L3R1PgogIDwvYm9keT4KPC90bXg+Cg=="
}Response example:Error in case of errorFrom v0_2_15
{ "%tm_name%":""} in case of success
Check status of import using status call
TMX import could be interrupted in case of invalid XML or TM reaching it's limit. For both cases check status request to have info about position in tmx file where it was interrupted. |
|
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
examle_tm" // when cloning, cloned tm would be renamed to this name(source tm is in url)
}
Response example:
Success:
{
"msg": "newBtree3_cloned2 was cloned successfully",
"time": "5 ms"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "'dstTmdPath' = /home/or/.t5memory/MEM/newBtree3_cloned.TMD already exists; for request for mem newBtree3; with body = {\n \"newName\": \"newBtree3_cloned\"\n}"
} |
|
Flush TM |
---|
Purpose | If TM is open, flushes it to the disk |
Request | Get /%service%/%tm_name%/flush |
Params |
|
Endpoint is sync(blocking) If tm is not found on the disk - returns 404 If tm is not open - returns 400 with message Then t5m requests write pointer to the tm(so it waits till other requests that's working with the tm would finish) and then it flushes it to the disk Could also return an error if flushing got some issue. Would not open the tm, if it's not opened yet, but instead would return an error. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
Success: {
"msg": "Mem test1 was flushed to the disk successfully"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not found"
}// or
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not open"
} |
|
Delete TM |
---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
success:
{
"newBtree3_cloned2": "deleted"
},
|
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
failed:
{
"newBtree3_cloned2": "not found"
} |
|
Import provided base64 encoded TMX file into TM |
---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/import |
Params | {"tmxData": "base64EncodedTmxFile" } - additional:
"framingTags": "saveAll" - default behaviour, do nothing "skipAll" - skip all enclosing tags, including standalone tags "skipPaired" - skip only paired enclosing tags
|
TM must exist It's async, so check status using status endpoint, like with reorganize in 0.5.x and up If framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:{
["framingTags": "skipAll"["skipPaired", "saveAll"],]
"tmxData": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHRteCB2ZXJzaW9uPSIxLjQiPgogIDxoZWFkZXIgY3JlYXRpb250b29sPSJTREwgTGFuZ3VhZ2UgUGxhdGZvcm0iIGNyZWF0aW9udG9vbHZlcnNpb249IjguMCIgby10bWY9IlNETCBUTTggRm9ybWF0IiBkYXRhdHlwZT0ieG1sIiBzZWd0eXBlPSJzZW50ZW5jZSIgYWRtaW5sYW5nPSJlbi1HQiIgc3JjbGFuZz0iYmctQkciIGNyZWF0aW9uZGF0ZT0iMjAxNTA4MjFUMDkyNjE0WiIgY3JlYXRpb25pZD0idGVzdCIvPgogIDxib2R5PgoJPHR1IGNyZWF0aW9uZGF0ZT0iMjAxODAyMTZUMTU1MTA1WiIgY3JlYXRpb25pZD0iREVTS1RPUC1SNTlCT0tCXFBDMiIgY2hhbmdlZGF0ZT0iMjAxODAyMTZUMTU1MTA4WiIgY2hhbmdlaWQ9IkRFU0tUT1AtUjU5Qk9LQlxQQzIiIGxhc3R1c2FnZWRhdGU9IjIwMTgwMjE2VDE2MTMwNVoiIHVzYWdlY291bnQ9IjEiPgogICAgICA8dHV2IHhtbDpsYW5nPSJiZy1CRyI+CiAgICAgICAgPHNlZz5UaGUgPHBoIC8+IGVuZDwvc2VnPgogICAgICA8L3R1dj4KICAgICAgPHR1diB4bWw6bGFuZz0iZW4tR0IiPgogICAgICAgIDxzZWc+RXRoIDxwaCAvPiBkbmU8L3NlZz4KICAgICAgPC90dXY+CiAgICA8L3R1PgogIDwvYm9keT4KPC90bXg+Cg=="
}Response example:Error in case of errorFrom v0_2_15
{ "%tm_name%":""} in case of success
Check status of import using status call
TMX import could be interrupted in case of invalid XML or TM reaching it's limit. For both cases check status request to have info about position in tmx file where it was interrupted. |
|
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Endpoints overview | default endpoint/example | Is async? |
---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ |
|
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ |
|
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ |
|
4 | Clone TM Localy | Makes clone of existing tm |
Values |
---|
%service% | Name of service(default - t5memory, could be changed in t5m3mory.conf file |
%tm_name% | Name of Translation Memory |
Example | http://localhost:4040/t5memory/examle_tm/fuzzysearch/? |
Endpoints overview | default endpoint/example | Is async? |
---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ | 2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ | 3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ | 4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url | 5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | 6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | 8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status | 9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status | 10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch | 11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch | 12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry | 13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete | 14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms | 15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown | 16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement | 17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources | 18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/ |
importlocalclone | /t5memory/ |
%tm_name%/importlocal+ | 19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search | 21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry | 22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ ("multipart/form-data") | /t5memory/tm1/ ("multipart/form-data") | 24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx | 25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
Available end points
my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url |
|
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status |
|
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status |
|
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch |
|
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch |
|
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry |
|
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete |
|
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms |
|
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown |
|
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement |
|
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources |
|
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search |
|
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry |
|
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ ("multipart/form-data") | /t5memory/tm1/ ("multipart/form-data") |
|
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx |
|
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
|
Available end points
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
{
"Open": [
{
"name": "mem2"
}
],
"Available on disk": [
{
"name": "mem_internal_format"
},
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
|
Create TM |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "name": "examle_tm", // this name would be used as filename for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } or alternatively data could be provided in non-base64 binary format as a file attached to the request |
| curl -X POST \ |
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app.
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Response example:
{
"Open": [
{
"name": "mem2"
}
],
"Available on disk": [
{
"name": "mem_internal_format"
},
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
Create TM |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Request example
{ "name": "examle_tm", // this name would be used as filename for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } or alternatively data could be provided in non-base64 binary format as a file attached to the request |
curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.gz;filename=myfile.tg" \ -F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ http://t5memory:4045/t5memory | Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as
version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version.
This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder
If there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files
If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xml
In 0.6.20 and up data could be send as attachment instead of base64 encoded. Content-type then should be set to "multipart/form-data" and then json(with name of new tm) should be provided with json_data key(search is made this way:
part.headers.at("Content-Disposition").find("name=\"json_data\"")
curl command example : curl -X POST \
-H "Content-Type: application/json" \ |
-F "file=@/path/to/12434615271d732fvd7te3. |
tmgz;filename=myfile. |
tmtg" \ |
-F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ |
http://t5memory:4045/t5memory |
Response example:{ "name": "examle_tm" } |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version. This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder If there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xml In 0.6.20 and up data could be send as attachment instead of base64 encoded. Content-type then should be set to "multipart/form-data" and then json(with name of new tm) should be provided with json_data key(search is made this way: part.headers.at("Content-Disposition").find("name=\"json_data\"") curl command example : curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.tm;filename=myfile.tm" \ -F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ http://t5memory:4045/t5memory Response example:{ "name": "examle_tm" }
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:{ "name": "mem_internal_format", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/ |
|
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
|
Request example:{ "name": "mem_internal_format", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
//you can skip data if you send it as attachment, but then set content-type to multipart/form-data and send json with json_body key
//
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg": ""
} |
|
Clone TM localy |
---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "newName": "examle_tm" // when cloning, cloned tm would be renamed to this name(source tm is in url)
}
Response example:
Success:
{
"msg": "newBtree3_cloned2 was cloned successfully",
"time": "5 ms"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "'dstTmdPath' = /home/or/.t5memory/MEM/newBtree3_cloned.TMD already exists; for request for mem newBtree3; with body = {\n \"newName\": \"newBtree3_cloned\"\n}"
} |
|
Delete TM |
---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
success:
{
"newBtree3_cloned2": "deleted"
},
|
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
failed:
{
"newBtree3_cloned2": "not found"
} |
|
Import binary TMX file into TM |
---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/importtmx |
Params | Request has a file attached and a body as an option, Implemented in 0.6.19 here are curl command to test: curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.tmx;filename=myfile.tmx" \ -F "json_data={\"framingTags\": \"value\", \"timeout\": 1500}" \ http://t5memory:4045/t5memory/{memory_name}/importtmx Body should be provided in multiform under json_data key Body(optional): { ["framingTags": "saveAll"], // framing tags behaviour [timeout: 100] // timeout in sec after which import stops, even if it doesn't reach end of tmx yet } - additional:
"framingTags": "saveAll" - default behaviour, do nothing "skipAll" - skip all enclosing tags, including standalone tags "skipPaired" - skip only paired enclosing tags
|
TM must exist It's async, so check status using status endpoint TMX import could be interrupted in case of invalid XML or TM reaching it's limit or timeout. For both cases check status request to have info about position in tmx file where it was interrupted. If framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed.
|
Reorganize TM |
---|
Purpose | Reorganizes tm and fixing issues. |
Request | GET /%service%/%tm_name%/reorganize |
Headers | Accept - applicaton/xml |
up to v0.4.x reorganize is sync, so t5memory starting from 0.5.x is async, so you can check status of reorganize similar to how you can check status for importTMX
Under the hood it creates new tm with $Org- prefix, then reimport all segments one-by-one, and then deletes original TM and rename reorganized TM to replace original. This request should flush tm(from RAM to the disk) before reorganizing
reorganize would check this condition if (fValidXmlInSrc && fValidXmlInTrg && (pProposal->getSourceLen() != 0) && (pProposal->getTargetLen() != 0) &&
(szTargetLanguage[0] != EOS) && (szTagTable[0] != EOS) )
, and in case if this condition is true and then it passes segment to putProposal function, which is also used by UpdateRequest and ImportTmx request, so other issues could be connected to updating new tm. In 0.4.48 reorganize responce would look like this
{ "too_long_reorg_0_4":"reorganized", "time":"37 sec", "reorganizedSegmentCount":"9424", "invalidSegmentCount":"0" } so if there is invalid segments, inspect t5memory log
|
...
Export TMX from TM - old |
---|
Purpose | Creates TMX from tm. |
Request | GET /%service%/%tm_name%/ |
Headers | Accept - applicaton/xml |
This endpoint should flush tm before execution
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:<?xml version="1.0" encoding="UTF-8" ?>
<tmx version="1.4">
<header creationtoolversion="0.2.14" gitCommit="60784cf * refactoring and cleanup" segtype="sentence" adminlang="en-us" srclang="en-GB" o-tmf="t5memory" creationtool="t5memory" datatype="xml" />
<body>
<tu tuid="1" datatype="xml" creationdate="20190401T084052Z">
<prop type="tmgr:segNum">10906825</prop>
<prop type="tmgr:markup">OTMXML</prop>
<prop type="tmgr:docname">none</prop>
<tuv xml:lang="en-GB">
<prop type="tmgr:language">English(U.K.)</prop>
<seg>For > 100 setups.</seg>
</tuv>
<tuv xml:lang="de-DE">
<prop type="tmgr:language">GERMAN(REFORM)</prop>
<seg>Für > 100 Aufstellungen.</seg>
</tuv>
</tu>
</body>
</tmx> |
|
Export TMX from TM |
---|
Purpose | Exports TMX from tm. |
Request | GET /%service%/%tm_name%/download.tmx |
Headers | Accept - applicaton/xml |
curl | curl --location --request GET 'http://localhost:4040/t5memory/{MEMORY_NAME}/download.tmx' \ --header 'Accept: application/xml' \ --header 'Content-Type: application/json' \ --data '{"startFromInternalKey": "7:1", "limit": 20}' |
Could have body with this fields startFromInternalKey - in "recordKey:targetKey" format sets starting point for import limit - sets maximum numberof segments to be exported loggingThreshold- as in other requests in response in headers you would get NextInternalKey: 19:1 - if exists next item in memory else the same as you send. So you could repeat the call with new starting position. If no body provided, export starts from the beginning (key 7:1) to the end.
This endpoint should flush tm before execution Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:<?xml version="1.0" encoding="UTF-8" ?>
<tmx version="1.4">
<header creationtoolversion="0.2.14" gitCommit="60784cf * refactoring and cleanup" segtype="sentence" adminlang="en-us" srclang="en-GB" o-tmf="t5memory" creationtool="t5memory" datatype="xml" />
<body>
<tu tuid="1" datatype="xml" creationdate="20190401T084052Z">
<prop type="tmgr:segNum">10906825</prop>
<prop type="tmgr:markup">OTMXML</prop>
<prop type="tmgr:docname">none</prop>
<tuv xml:lang="en-GB">
<prop type="tmgr:language">English(U.K.)</prop>
<seg>For > 100 setups.</seg>
</tuv>
<tuv xml:lang="de-DE">
<prop type="tmgr:language">GERMAN(REFORM)</prop>
<seg>Für > 100 Aufstellungen.</seg>
</tuv>
</tu>
</body>
</tmx> |
|
|
---|
Purpose | Creates and exports archive with .TMD, .TMI files of TM |
Request | GET /%service%/%tm_name%/download.tm |
Headers | application/zip |
returns archive(.tm file) consists with .tmd and .tmi files This should flush tm before execution Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:%binary_data% |
|
|
---|
Purpose | Creates and exports archive with .TMD, .TMI, .MEM files of TM |
Request | GET /%service%/%tm_name%/ |
Headers | application/zip |
returns archive(.tm file) consists with .tmd and .tmi files This should flush tm before execution Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:%binary_data% |
|
...
Fuzzy search |
---|
Purpose | Returns enrties\translations with small differences from requested |
Request | POST /%service%/%tm_name%/fuzzysearch |
Params | Required: source, sourceLang, targetLang iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5'
|
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:
Request example:
{ // required fields
"sourceLang":"en-GB", // langs would be checked with languages.xml
"targetLang":"de",
"source":"For > 100 setups.",
// optional fields
["documentName":"OBJ_DCL-0000000845-004_pt-br.xml"],
["segmentNumber":15],
["markupTable":"OTMXUXLF"], //if there is no markup, default OTMXUXLF would be used.
//Markup tables should be located inside ~/.t5memory/TABLE/%markup$.TBL
["context":"395_408"],
["numOfProposals":20], // num of expected segments in output. By default it's 5
["loggingThreshold": 0]
}
Response example:
Success:
{
"ReturnValue": 0,
"ErrorMsg": "",
"NumOfFoundProposals": 1,
"results": [
{
"source": "The end",
"target": "The target",
"segmentNumber": 0,
"id": "",
"documentName": "Te2.xlf",
"sourceLang": "de-DE",
"targetLang": "EN-GB",
"type": "Manual",
"author": "THOMAS LAURIA",
"timestamp": "20231228T171821Z",
"markupTable": "OTMXUXLF",
"context": "2_3",
"additionalInfo": "",
"internalKey": "7:1",
"matchType": "Fuzzy",
"matchRate": 50,
"fuzzyWords": 0,
"fuzzyDiffs": 0
}
]
}
example 2
{
"ReturnValue": 0,
"ErrorMsg": "",
"NumOfFoundProposals": 1,
"results": [
{
"source": "For > 100 setups.",
"target": "Für > 100 Aufstellungen.",
"segmentNumber": 10906825,
"id": "",
"documentName": "none",
"documentShortName": "NONE",
"sourceLang": "en-GB",
"targetLang": "de-DE",
"type": "Manual",
"matchType": "Exact", // could be exact or fuzzy
"author": "",
"timestamp": "20190401T084052Z",
"matchRate": 100,
"fuzzyWords": -1, // for exact match it would be -1 here and in diffs
"fuzzyDiffs": -1, // otherwise here would be amount of parsed words and diffs that was
// used in fuzzy matchrate calculation
"markupTable": "OTMXML",
"context": "",
"additionalInfo": ""
}
]
}
Not found:
{
"ReturnValue": 133,
"ErrorMsg": "OtmMemoryServiceWorker::concordanceSearch::"
}
For exact match used function that's comparing strings ignoring whitespaces. First normalized strings(without tags).
If it's the same string, then t5memory is checking string with tags and could return 100 or 97 match rate depending on result.
Then it's checking context match rate and if document name is the same(non case sensitive)
Then it's checking and modifying exactMatchRate according to code in code block below.
After that it would store exact matches only with usMatchLevel>=100. If there would be no exact matches, fuzzy match calculations would begin.
In case if there is at least one exact match, any fuzzy matches would be skipped.
In case if we have only one exact exact match, it's rate would be set to 102
For equal matches with 100% word matches but different whitespaces/newlines, each whitespace/newline diffs would be count as -1%. For punctuation, at least for 0.4.50, each punctuation would count as word token. This would be changed in future to count punctuation as whitespaces.
For fuzzy calculation tags would be removed from text, except t5:np tags, which would be replaced with their "r" attribute to be counted as 1 word per tag.
For fuzzy rate calculation we count words and then diffs in normalized string(without tags), using this formula:
if (usDiff < usWords )
{
*pusFuzzy = (usWords != 0) ? ((usWords - usDiff)*100 / usWords) : 100;
}
else
{
*pusFuzzy = 0;
} /* endif */ Regarging Number Protection feature, tags from number protection would be replaced with their regexHashes from their attributes, so they would be count as 1 word each. NP with the same regex would be counted as equal
To count diffs, t5memory go throuht both segments to find matching tokens, to find something called snake- line of matching tokens.
Then It marks unmatched as INSERTED or DELETED tokens, and based on that it calculates diffs.
if it's 100% rate, we add tags and compare it again
if then it's not equal, here is how match rate would be changed - probably this would never happens, because we have exact match test before fuzzy,
and we do exact test even if triplesHashes is different(which is pre-fuzzy calculation and if it's equal, it could be flag that trigger exact test)
if ( !fStringEqual )
{
if ( usFuzzy > 3 )
{
usFuzzy -= 3;
}
else
{
usFuzzy = 0;
} /* endif */
usFuzzy = std::min( (USHORT)99, usFuzzy );
} /* endif */
then depending on type of translation it could tweak rate
if ( (usModifiedTranslationFlag == TRANSLFLAG_MACHINE) && (usFuzzy < 100) )
{
// ignore machine fuzzy matches
}
else if ( usFuzzy > TM_FUZZINESS_THRESHOLD )
{
/********************************************************/
/* give MT flag a little less fuzziness */
/********************************************************/
if ( usModifiedTranslationFlag == TRANSLFLAG_MACHINE )
{
if ( usFuzzy > 1 )
{
usFuzzy -= 1;
}
else
{
usFuzzy = 0;
} /* endif */
} /* endif */
if (usFuzzy == 100 && (pGetIn->ulParm & GET_RESPECTCRLF) && !fRespectCRLFStringEqual )
{ // P018279!
usFuzzy -= 1;
}
add to resulting set
} /* endif */
} /* endif */
At the end fuzzy request replaces tags in proposal from TM with tags from request, and if matchRate >= 100, it calculates whitespace diffs and apply matchRate-= wsDiffs |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| ExactMatchRate calculation:so, before usExact is equal to 97 or 100, depending if strings with tags are equal ignoring whitespaces and then code do some tweaks.
pClb is struct that have proposals from TM, pGetIn is fuzzy requests data
// loop over CLBs and look for best matching entry
{
LONG lLeftClbLen; // left CLB entries in CLB list
PTMX_TARGET_CLB pClb; // pointer for CLB list processing
#define SEG_DOC_AND_CONTEXT_MATCH 8
#define DOC_AND_CONTEXT_MATCH 7
#define CONTEXT_MATCH 6
#define SAME_SEG_AND_DOC_MATCH 5
#define SAME_DOC_MATCH 4
#define MULT_DOC_MATCH 3
#define NORMAL_MATCH 2
#define IGNORE_MATCH 1
SHORT sCurMatch = 0;
// loop over all target CLBs
pClb = pTMXTargetClb;
lLeftClbLen = RECLEN(pTMXTargetRecord) -
pTMXTargetRecord->usClb;
while ( ( lLeftClbLen > 0 ) && (sCurMatch < SAME_SEG_AND_DOC_MATCH) )
{
USHORT usTranslationFlag = pClb->bTranslationFlag;
USHORT usCurContextRanking = 0; // context ranking of this match
BOOL fIgnoreProposal = FALSE;
// apply global memory option file on global memory proposals
if ( pClb->bTranslationFlag == TRANSLFLAG_GLOBMEM ) // pClb it's segment in TM
{
if ( (pGetIn->pvGMOptList != NULL) && pClb->usAddDataLen ) // pGetIn it's fuzzy requests segment
{
USHORT usAddDataLen = NtmGetAddData( pClb, ADDDATA_ADDINFO_ID, pContextBuffer, MAX_SEGMENT_SIZE );
if ( usAddDataLen )
{
GMMEMOPT GobMemOpt = GlobMemGetFlagForProposal( pGetIn->pvGMOptList, pContextBuffer );
switch ( GobMemOpt )
{
case GM_SUBSTITUTE_OPT: usTranslationFlag = TRANSLFLAG_NORMAL; break;
case GM_HFLAG_OPT : usTranslationFlag = TRANSLFLAG_GLOBMEM; break;
case GM_HFLAGSTAR_OPT : usTranslationFlag = TRANSLFLAG_GLOBMEMSTAR; break;
case GM_EXCLUDE_OPT : fIgnoreProposal = TRUE; break;
} /* endswitch */
} /* endif */
} /* endif */
if ( pClb == pTMXTargetClb )
{
usTargetTranslationFlag = usTranslationFlag;
} /* endif *
} /* endif */
// check context strings (if any)
if ((!fIgnoreProposal)
&& pGetIn->szContext[0]
&& pClb->usAddDataLen )
{
USHORT usContextLen = NtmGetAddData( pClb, ADDDATA_CONTEXT_ID, pContextBuffer, MAX_SEGMENT_SIZE );
if ( usContextLen != 0 )
{
usCurContextRanking = NTMCompareContext( pTmClb, pGetIn->szTagTable, pGetIn->szContext, pContextBuffer );
} /* endif */
} /* endif */
// check for matching document names
if ( pGetIn->ulParm & GET_IGNORE_PATH )
{
// we have to compare the real document names rather than comparing the document name IDs
PSZ pszCLBDocName = NTMFindNameForID( pTmClb, &(pClb->usFileId), (USHORT)FILE_KEY );
if ( pszCLBDocName != NULL )
{
PSZ pszName = UtlGetFnameFromPath( pszCLBDocName );
if ( pszName == NULL )
{
pszName = pszCLBDocName;
} /* endif */
fMatchingDocName = stricmp( pszName, pszDocName ) == 0;
}
else
{
// could not access the document name, we have to compare the document name IDs
fMatchingDocName = ((pClb->usFileId == usGetFile) || (pClb->usFileId == usAlternateGetFile));
} /* endif */
}
else
{
// we can compare the document name IDs
fMatchingDocName = ((pClb->usFileId == usGetFile) || (pClb->usFileId == usAlternateGetFile));
} /* endif */
if ( fIgnoreProposal )
{
if ( sCurMatch == 0 )
{
sCurMatch = IGNORE_MATCH;
} /* endif */
}
else if ( usCurContextRanking == 100 )
{
if ( fMatchingDocName && (pClb->ulSegmId >= (pGetIn->ulSegmentId - 1)) && (pClb->ulSegmId <= (pGetIn->ulSegmentId + 1)) )
{
if ( sCurMatch < SEG_DOC_AND_CONTEXT_MATCH )
{
sCurMatch = SEG_DOC_AND_CONTEXT_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
}
else if ( fMatchingDocName )
{
if ( sCurMatch < DOC_AND_CONTEXT_MATCH )
{
sCurMatch = DOC_AND_CONTEXT_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
else if ( sCurMatch == DOC_AND_CONTEXT_MATCH )
{
// we have already a match of this type so check if context ranking
if ( usCurContextRanking > usContextRanking )
{
pTMXTargetClb = pClb; // use newer target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
// use time info to ensure that latest match is used
else if ( usCurContextRanking == usContextRanking )
{
// GQ 2015-04-10 New approach: If we have an exact-exact match use this one, otherwise use timestamp for the comparism
BOOL fExactExactNewCLB = fMatchingDocName && (pClb->ulSegmId >= (pGetIn->ulSegmentId - 1)) && (pClb->ulSegmId <= (pGetIn->ulSegmentId + 1));
BOOL fExactExactExistingCLB = ((pTMXTargetClb->usFileId == usGetFile) || (pTMXTargetClb->usFileId == usAlternateGetFile)) &&
(pTMXTargetClb->ulSegmId >= (pGetIn->ulSegmentId - 1)) && (pTMXTargetClb->ulSegmId <= (pGetIn->ulSegmentId + 1));
if ( fExactExactNewCLB && !fExactExactExistingCLB )
{
// use exact-exact CLB for match
pTMXTargetClb = pClb;
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
else if ( (fExactExactNewCLB == fExactExactExistingCLB) && (pClb->lTime > pTMXTargetClb->lTime) )
{
// use newer target CLB for match
pTMXTargetClb = pClb;
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
} /* endif */
} /* endif */
}
else
{
if ( sCurMatch < CONTEXT_MATCH )
{
sCurMatch = CONTEXT_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
else if ( sCurMatch == CONTEXT_MATCH )
{
// we have already a match of this type so check if context ranking
if ( usCurContextRanking > usContextRanking )
{
pTMXTargetClb = pClb; // use newer target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
// use time info to ensure that latest match is used
else if ( (usCurContextRanking == usContextRanking) && (pClb->lTime > pTMXTargetClb->lTime) )
{
pTMXTargetClb = pClb; // use newer target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
} /* endif */
} /* endif */
} /* endif */
}
else if ( fMatchingDocName && (pClb->ulSegmId >= (pGetIn->ulSegmentId - 1)) && (pClb->ulSegmId <= (pGetIn->ulSegmentId + 1)) )
{
// same segment from same document available
sCurMatch = SAME_SEG_AND_DOC_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usContextRanking = usCurContextRanking;
usTargetTranslationFlag = usTranslationFlag;
}
else if ( fMatchingDocName )
{
// segment from same document available
if ( sCurMatch < SAME_DOC_MATCH )
{
sCurMatch = SAME_DOC_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
}
else if ( sCurMatch == SAME_DOC_MATCH )
{
// we have already a match of this type so
// use time info to ensure that latest match is used
if ( pClb->lTime > pTMXTargetClb->lTime )
{
pTMXTargetClb = pClb; // use newer target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
} /* endif */
} /* endif */
}
else if ( pClb->bMultiple )
{
// multiple target segment available
if ( sCurMatch < MULT_DOC_MATCH )
{
// no better match yet
sCurMatch = MULT_DOC_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
} /* endif */
}
else if ( usTranslationFlag == TRANSLFLAG_NORMAL )
{
// a 'normal' memory match is available
if ( sCurMatch < NORMAL_MATCH )
{
// no better match yet
sCurMatch = NORMAL_MATCH;
pTMXTargetClb = pClb; // use this target CLB for match
usTargetTranslationFlag = usTranslationFlag;
usContextRanking = usCurContextRanking;
} /* endif */
} /* endif */
// continue with next target CLB
if ( sCurMatch < SAME_SEG_AND_DOC_MATCH )
{
lLeftClbLen -= TARGETCLBLEN(pClb);
if (lLeftClbLen > 0)
{
usTgtNum++;
pClb = NEXTTARGETCLB(pClb);
}
} /* endif */
} /* endwhile */
{
BOOL fNormalMatch = (usTargetTranslationFlag == TRANSLFLAG_NORMAL) ||
(usTargetTranslationFlag == TRANSLFLAG_GLOBMEM) ||
(usTargetTranslationFlag == TRANSLFLAG_GLOBMEMSTAR);
switch ( sCurMatch )
{
case IGNORE_MATCH :
usMatchLevel = 0;
break;
case SAME_SEG_AND_DOC_MATCH :
usMatchLevel = fNormalMatch ? usEqual+2 : usEqual-1;
break;
case SEG_DOC_AND_CONTEXT_MATCH :
usMatchLevel = fNormalMatch ? usEqual+2 : usEqual-1; // exact-exact match with matching context
break;
case DOC_AND_CONTEXT_MATCH :
if ( usContextRanking == 100 )
{
// GQ 2015/05/09: treat 100% context matches as normal exact matches
// usMatchLevel = fNormalMatch ? usEqual+2 : usEqual-1;
usMatchLevel = fNormalMatch ? usEqual+1 : usEqual-1;
}
else
{
usMatchLevel = fNormalMatch ? usEqual+1 : usEqual-1;
} /* endif */
break;
case CONTEXT_MATCH :
if ( usContextRanking == 100 )
{
// GQ 2015/05/09: treat 100% context matches as normal exact context matches
// usMatchLevel = fNormalMatch ? usEqual+2 : usEqual-1;
// GQ 2016/10/24: treat 100% context matches as normal exact matches
usMatchLevel = fNormalMatch ? usEqual : usEqual-1;
}
else
{
usMatchLevel = fNormalMatch ? usEqual : usEqual-1;
} /* endif */
break;
case SAME_DOC_MATCH :
usMatchLevel = fNormalMatch ? usEqual+1 : usEqual-1;
break;
case MULT_DOC_MATCH :
usMatchLevel = fNormalMatch ? usEqual+1 : usEqual-1;
break;
default :
usMatchLevel = fNormalMatch ? usEqual : usEqual-1;
break;
} /* endswitch */
}
} |
Here is structure of the segment from responses
{ "source":"in Verbindung 2 fds fdsa amit Befestigungswinkel fdsaf MS-...-WPE-B zur Wandmontage eines sfg Einzelgeräts", // source that was saved "sourceNPRepl":"in Verbindung 2 fds fdsa amit Befestigungswinkel fdsaf MS-...-WPE-B zur Wandmontage eines sfg Einzelgeräts",// np replaced source - used for fuzzy and triples thresholds - here there are no NP tags, but their hashes "sourceNorm":"in Verbindung 2 fds fdsa amit Befestigungswinkel fdsaf MS-...-WPE-B zur Wandmontage eines sfg Einzelgeräts",// normalized source - used for fuzzy calclulation - there are no tags at all "target":"In combinahgfd tion with mounting bracket MS-...-WPE-B for wall mounting an individual component ",// saved target "segmentNumber":1, // previously segmentNumber - internal Id generated in tm, or provided with update call. Can be used together with internalKey as a primary number in tm "id":"", // dummy field, "documentName":"Audioscript_Hybrides_Arbeiten.xlsx.sdlxliff", "sourceLang":"DE-DE",// that langs in requests would be searched in languages.xml, and would be used best match(or preffered) "targetLang":"EN-GB", "type":"Manual", "author":"PROJECT MANAGER", "timestamp":"",// if empty, current time would be used. "markupTable":"OTMXUXLF", // the same all the time, in the future could be refactored and deleted "context":"390",// context and addinfo would be saved as additional field in cbl(internal data struct, which saves variants of other variables for the same translation, which have position pointed below - "11:1" "additionalInfo":"", "internalKey":"11:1"// internal position of the segment inside tmd file. could be shifted with deleting some other segments. both numbers should not be zero } |
New Concordance search |
---|
Purpose | Returns entries\translations that fits selected filters. |
Request | POST /%service%/%tm_name%/search |
Params | Required: NONE iNumOfProposal - limit of found proposals - max is 200, if 0 → use default value '5' |
Search is made segment-by segment, and it's checking segment if it fits selected filters. You can search for EXACT or CONCORDANCE matches in this fields: source, target, document, author, addInfo, context To set filter, use it's SearchMode field, otherwise filter would be disabled. So you have sourceSearchMode, targetSearchMode, documentSearchMode, authorSearchMode, addInfoSearchMode, contextSearchMode
Search mode should be set explicitly to CONTAINS/CONCORDANCE or EXACT, otherwise filter would be ignored. But also each searchMode could have additional search parameters "CONTAINS, caseinsensetive, WHITESPACETOLERANT, INVERTED", all that values is not important, as well as delimiter. By default search is case sensetive. If you add Inverted option, check for that filter would be reverted. To check how filters would be parsed, check json in responce. Field with that info could look like this: "Filters":" Search filter, field: SOURCE FilterType::CONTAINS SearchStr: 'THE'; Options: SEARCH_FILTERS_NOT|SEARCH_CASEINSENSITIVE_OPT|SEARCH_WHITESPACETOLERANT_OPT|;\n Search filter, field: TARGET FilterType::EXACT SearchStr: ''; Options: SEARCH_CASEINSENSITIVE_OPT|;\n Search filter, field: ADDINFO FilterType::CONTAINS SearchStr: 'some add info'; Options: SEARCH_WHITESPACETOLERANT_OPT|;\n Search filter, field: CONTEXT FilterType::EXACT SearchStr: 'context context'; Options: ;\nSearch filter, field: AUTHOR FilterType::CONTAINS SearchStr: ''; Options: ;\n Search filter, field: DOCUMENT FilterType::CONTAINS SearchStr: 'evo3_p1137_reports_translation_properties_de_fr_20220720_094902'; Options: SEARCH_FILTERS_NOT|;\n Search filter, field: TIMESTAMP FilterType::RANGE Range: 20000121T115234Z - 20240121T115234Z Options: ;\n", It's possible to apply filter just with SearchMode, like if you would type "authorSearchMode": "exact",but there would be no "author" field, it would look for segments, where author field is empty.
Also there are timespan parameter, to set it, use this fields and format: "timestampSpanStart":"20000121T115234Z", "timestampSpanEnd":"20240121T115234Z", You should set both parameters to apply filter, otherwise you would get error as return. Check output to see how it was parsed and applied. By default all mentioned filters is applied in logical and combination, but you can change that globaly with adding "logicalOr": 1, Then all mentioned filters would be applied in logical or combination(please, use 1 to set this to true, boolean type is not supported by json parser in t5memory). Supported since 0.6.5
"onlyCountSegments":1 Instead of returning segments, just count them and return counter in "NumOfFoundSegments":22741
Also there are lang filters, they would always be applied to selection of segments that passed previous filters, so value of "logicalOr": 1, wouldn't be applied to that. To set language filters, use this fields: "sourceLang":"en-GB", "targetLang":"de", Lang filters could be applied with major lang feature, so source lang in this case would be applied as exact filter for source lang, but target lang would check if langs is in the same lang group. That check is done in languages.xml file with isPreferred flag. Lang filters and if filters is combined in logical or or logical and you can check in GlobalSearchOptions field of responce. It could look like this: "GlobalSearchOptions":"SEARCH_FILTERS_LOGICAL_OR|SEARCH_EXACT_MATCH_OF_SRC_LANG_OPT, lang = en-GB|SEARCH_GROUP_MATCH_OF_TRG_LANG_OPT, lang = de", Other that you can send is: "searchPosition":"8:1", "numResults":2, "msSearchAfterNumResults":250 "loggingThreshold": 4 - check other requests,
So search position is position where to start search internaly in btree. This search is limited by num of found segment(set by numResults) or timeout(set by msSearchAfterNumResults), but timeout would be ignored in case if there are no segments in the tm to fit params. Max numResults is 200.
You can send empty json and search would work fine, but it would just return first 5 segments in tm You can go through all segment with using this 2 fields "searchPosition":"8:1", "numResults":200 and just updating searchPosition with NewSearchPositionfrom responce. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| {
"logicalOr": 1,
"source":"the",
"sourceSearchMode":"CONTAINS, CASEINSENSETIVE, WHITESPACETOLERANT, INVERTED",
"target":"",
"targetSearchMode":"EXACT, CASEINSENSETIVE",
"document":"evo3_p1137_reports_translation_properties_de_fr_20220720_094902",
"documentSearchMode":"CONTAINS, INVERTED",
"author":"some author",
"authorSearchMode":"CONTAINS",
"timestampSpanStart": "20000121T115234Z",
"timestampSpanEnd": "20240121T115234Z",
"addInfo":"some add info",
"addInfoSearchMode":"CONCORDANCE, WHITESPACETOLERANT",
"context":"context context",
"contextSearchMode":"EXACT",
"sourceLang":"en-GB",
"targetLang":"SV",
"searchPosition": "8:1",
"numResults": 2,
"msSearchAfterNumResults": 25,
"loggingThreshold": 3
}
So here search would be done in logical or way, so if any of source, target, document, context, author, timestamp filters returns true, result would be added to set, which then would be filtered out by sourceLang on exact match check and targetLang on groupLang check.
Search would start from position "8:1"(tm data start at "7:1" but if you wan't to start from the beggining, just avoid that param.
numResuts:2 - so if there would be 2 segments found, search would end
"msSearchAfterNumResults": 25 - 25ms after first found segment, search would end, even if more segments was found, responce would contain "NewSearchPosition": "10:1", which can be used in searchPosition to continue search
Response example:Success:
example{
"Filters": "Search filter, field: SOURCE FilterType::CONTAINS SearchStr: 'THE'; Options: SEARCH_FILTERS_NOT|SEARCH_CASEINSENSITIVE_OPT|SEARCH_WHITESPACETOLERANT_OPT|;\n
Search filter, field: TARGET FilterType::EXACT SearchStr: ''; Options: SEARCH_CASEINSENSITIVE_OPT|;\n
Search filter, field: ADDINFO FilterType::CONTAINS SearchStr: 'some add info'; Options: SEARCH_WHITESPACETOLERANT_OPT|;\n
Search filter, field: CONTEXT FilterType::EXACT SearchStr: 'context context'; Options: ;\n
Search filter, field: AUTHOR FilterType::CONTAINS SearchStr: ''; Options: ;\n
Search filter, field: DOCUMENT FilterType::CONTAINS SearchStr: 'evo3_p1137_reports_translation_properties_de_fr_20220720_094902'; Options: SEARCH_FILTERS_NOT|;\n
Search filter, field: TIMESTAMP FilterType::RANGE Range: 20000121T115234Z - 20240121T115234Z Options: ;\n",
"GlobalSearchOptions": "SEARCH_FILTERS_LOGICAL_OR|SEARCH_EXACT_MATCH_OF_SRC_LANG_OPT, lang = en-GB|SEARCH_GROUP_MATCH_OF_TRG_LANG_OPT, lang = sv",
"ReturnValue": 0,
"ReturnMessage": "FOUND",
"NewSearchPosition": "10:1",
"results": [
{
"source": "Congratulations on the purchase of a <ph x=\"101\"/> machine control system.",
"target": "Gratulerar till köpet av maskinstyrningsystemet <ph x=\"101\"/>.",
"segmentNumber": 5740419,
"id": "",
"documentName": "none",
"sourceLang": "en-GB",
"targetLang": "SV-SE",
"type": "Manual",
"author": "",
"timestamp": "20170327T091814Z",
"markupTable": "OTMXUXLF",
"context": "",
"additionalInfo": "",
"internalKey": "8:1"
},
{
"source": "The <ph x=\"101\"/> System is an ideal tool for increasing productivity in all aspects of the construction earthmoving industry.",
"target": "Systemet <ph x=\"101\"/> är ett verktyg som lämpar sig perfekt för att öka produktiviteten inom alla delar av bygg- och anläggningsområdet.",
"segmentNumber": 5740420,
"id": "",
"documentName": "none",
"sourceLang": "en-GB",
"targetLang": "SV-SE",
"type": "Manual",
"author": "",
"timestamp": "20170327T091814Z",
"markupTable": "OTMXUXLF",
"context": "",
"additionalInfo": "",
"internalKey": "9:1"
}
]
}
SearchPosition / NewSearchPositionFormat: "7:1"
First is segment\record number, second is target number
The NextSearchposition is an internal key of the memory for the next position on sequential access. Since it is an internal key, maintained and understood by the underlying memory plug-in (for EqfMemoryPlugin is it the record number and the position in one record),
no assumptions should be made regarding the content. It is just a string that, should be sent back to OpenTM2 on the next request, so that the search starts from there.
So is the implementation in Translate5: The first request to OpenTM2 contains SearchPosition with an empty string, OpenTM2 returns than a string in NewSearchPosition, which is just resent to OpenTM2 in the next request.
Not found:{
"ReturnValue": 0,
"NewSearchPosition": null,
"ErrorMsg": ""
}TM not found:{
"ReturnValue": 133,
"ErrorMsg": "OtmMemoryServiceWorker::concordanceSearch::"
} |
Here is search request with all possible parameters: { "logicalOr": 1, "source":"the", "sourceSearchMode":"CONTAINS, CASEINSENSETIVE, WHITESPACETOLERANT, INVERTED", "target":"", "targetSearchMode":"EXACT, CASEINSENSETIVE", "document":"evo3_p1137_reports_translation_properties_de_fr_20220720_094902", "documentSearchMode":"CONTAINS, INVERTED", "author":"some author", "authorSearchMode":"CONTAINS", "timestampSpanStart": "20000121T115234Z", "timestampSpanEnd": "20240121T115234Z", "addInfo":"some add info", "addInfoSearchMode":"CONCORDANCE, WHITESPACETOLERANT", "context":"context context", "contextSearchMode":"EXACT", "sourceLang":"en-GB", "targetLang":"SV", "searchPosition": "8:1", "numResults": 2, "msSearchAfterNumResults": 25, "loggingThreshold": 3 } All fields is optional, but some depends on other, so error should be returned in case of not providing required field So request with this body would also work: { } Parameter | valueType | default value | possible values | requireField | description |
---|
sourceLang | string | "" | langs that can be matched to langs in languages.xml | - | Filter segments on src/trg lang attribute, If specified lang is preffered, matching is done based on lang family, otherwise on exact match | targetLang | searchPosition | string | "" (search would start from "7:1" then | "8:1" etc | point where to start search in tmd file | numResults | int | 5 | (0....200] | points how many matches return in current request | msSearchAfterNumResults | 0 | no check | sets how many ms should pass between first found segment and search stop, if it didn't reach the end yet. | loggingThreshold | -1 | [0...6] | additional field to set log level on the run | logicalOr | int | 0 | 0 for false, any other number as true, example: "logicalOr": 1, "onlyCountSegments": 1 | by default source, target, document, author, context, addinfo, timestamp is combined in logical AND, but by sending here "OR" you can switch that to logical OR, any other value would left it in default AND state. Doesn't apply to sourceLang and targetLang filters, they are always in AND state | onlyCountSegments | instead of returning segment, would go in search till the end of tm and return total number of segments, that returns true with selected filters | source | string | "" | any string, example "source": "data in the segment" | sourceSearchMode | Sets what to look for in source of the segments, based on type of search, specified in sourceSearchMode(exact, concordance). If sourceSearchMode is not specified, returns an error. | target | targetSearchMode | --//–(the same as above but for corresponding fields) | document | documentSearchMode | author | authorSearchMode | context | contextSearchMode | addInfo | addInfoSearchMode | timestampSpanStart | string | string with date in format "20240121T115234Z" | timestampSpanEnd | Sets filter for time. You need to provid both timestamps, or none, otherwise request would return an error. Could be used in "OR" combination in "logicalOr": 1,, but, maybe, it's better to change that behaviour to similar like with langs(Always AND) | timestampSpanEnd | timestampSpanStart | sourceSearchMode | string | "" | String with required EXACT or CONCORDANCE (or CONTAINS, what's equal to CONCORDANCE) words and some optional, like CASEINSENSETIVE for non case sensetive comparison, WHITESPACETOLERANT for modifying whitespaces(result of this actions you can see in filters in responce) INVERTED for applying filter in inverted state, so to return false on match and true if no match. Logical NOT
Attributes is not case sensetive, Separator doen't matters | - | Sets type of search for corresponding field. If you set, for example, "authorSearchMode" = "EXACT", but don't provide any author in request, author field would be "", so request would look for segments, where author equals to "". The same is true for other fields Examples: 1) "source": "the text inside" "sourceSearchMode":"CONTAINS, CASEINSENSETIVE, WHITESPACETOLERANT, INVERTED", - search would be for all segments, which doesn't contains "the text inside" in non case sensetive mode and with normalizing whitespaces. 2) "author": "Ed Sheeran", "authorSearchMode" = "Exact", -search would be done on exact case sensetive matches with "Ed Sheeran" in author field 3) "author": "Ed Sheeran", "authorSearchMode" = "CASEINSENSETIVE", - ERROR, search mode(Exact\Contains) is not selected 4) "author": "Ed Sheeran", - ERROR, search mode(Exact\Contains) is not selected 5) "authorSearchMode" = "CONTAINS",- OK, filter would check if segment contains "", so every segment would return true then
| targetSearchMode | documentSearchMode | authorSearchMode | contextSearchMode | addInfoSearchMode |
|
|
Concordance search |
---|
Purpose | Returns entries\translations that contain requested segment |
Request | POST /%service%/%tm_name%/concordancesearch |
Params | Required: searchString - what we are looking for , searchType ["Source"|"Target"|"SourceAndTarget"] - where to look iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5' |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:
{
"searchString": "The",
"searchType": "source", // could be Source, Target, SourceAndTarget - says where to do search
["searchPosition": "",]
["numResults": 20,]
["msSearchAfterNumResults": 250,]
["loggingThreshold": 0]
}
Response example:Success:
example_new{
"ReturnValue": "ENDREACHED_RC",
"NewSearchPosition": null,
"results": [
{
"source": "The end",
"target": "The target",
"segmentNumber": 0,
"id": "",
"documentName": "Te2.xlf",
"sourceLang": "de-DE",
"targetLang": "EN-GB",
"type": "Manual",
"author": "THOMAS LAURIA",
"timestamp": "20231228T171821Z",
"markupTable": "OTMXUXLF",
"context": "2_3",
"additionalInfo": "",
"internalKey": "7:1"
}
]
}
example_old
{
"ReturnValue": 0,
"NewSearchPosition": null,
"results": [
{
"source": "For > 100 setups.",
"target": "Für > 100 Aufstellungen.",
"segmentNumber": 10906825,
"id": "",
"documentName": "none",
"documentShortName": "NONE",
"sourceLang": "en-GB",← rfc5646
"targetLang": "de-DE",← rfc5646
"type": "Manual",
"matchType": "undefined",
"author": "",
"timestamp": "20190401T084052Z",
"matchRate": 0,
"markupTable": "OTMXML",
"context": "",
"additionalInfo": ""
}
],
"ErrorMsg": ""
}
Success, but with NewSearchPosition - not all TM was checked, use this position to repeat search:
{
"ReturnValue": 0,
"NewSearchPosition": "8:1",
"results": [
{
"source": "For > 100 setups.",
"target": "Für > 100 Aufstellungen.",
"segmentNumber": 10906825,
"id": "",
"documentName": "none",
"documentShortName": "NONE",
"sourceLang": "en-GB",
"targetLang": "de-DE",
"type": "Manual",
"matchType": "undefined",
"author": "",
"timestamp": "20190401T084052Z",
"matchRate": 0,
"markupTable": "OTMXML",
"context": "",
"additionalInfo": ""
}
],
"ErrorMsg": ""
}
SearchPosition / NewSearchPositionFormat: "7:1"
First is segmeng\record number, second is target number
The NextSearchposition is an internal key of the memory for the next position on sequential access. Since it is an internal key, maintained and understood by the underlying memory plug-in (for EqfMemoryPlugin is it the record number and the position in one record),
no assumptions should be made regarding the content. It is just a string that, should be sent back to OpenTM2 on the next request, so that the search starts from there.
So is the implementation in Translate5: The first request to OpenTM2 contains SearchPosition with an empty string, OpenTM2 returns than a string in NewSearchPosition, which is just resent to OpenTM2 in the next request.
Not found:{
"ReturnValue": 0,
"NewSearchPosition": null,
"ErrorMsg": ""
}TM not found:{
"ReturnValue": 133,
"ErrorMsg": "OtmMemoryServiceWorker::concordanceSearch::"
} |
|
Get entry |
---|
Purpose | Returns entry that located in [recordKey:targetKey] location or error if it's empty |
Request | POST /%service%/%tm_name%/getentry |
Params | Required: recordKey- it's position in the tmd file, starting from 7(first 6 it's service records) targetKey - position in record, starting from 1 Implemented in 0.6.24 |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:
{
"recordKey": "8",
"targetKey": "2"
["loggingThreshold": 0]
}
Response example:
Success: {
"source": "%Project.Progress%",
"target": "%Project.Progress%",
"segmentId": 7,
"documentName": "DSGVO_v27_HIDDEN.docx.sdlxliff",
"sourceLang": "de-DE",
"targetLang": "EN-US",
"type": "Manual",
"author": "",
"timestamp": "20220706T112459Z",
"markupTable": "OTMXUXLF",
"context": "",
"additionalInfo": "",
"internalKey": "10:1"
}
Not found:
{
"ReturnValue": 939,
"ErrorMsg": "Requested entry not found! Next internalKey after requested is : 11;1"
} |
|
Update entry |
---|
Purpose | Updates entry\translation |
Request | POST /%service%/%tm_name%/entry |
Params | Only sourceLang, targetLang, source and target are required
|
This request would made changes only in the filebuffer(so files on disk would not be changed) To write it to the disk just call request which would flush tm to the disk as part of execution(exportTMX, exportTM, cloneTM) or using SaveAllTms request Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:
{
"source": "The end",
"target": "The target",
"sourceLang": "en", // langs would be checked with languages.xml
"targetLang": "de",
//additional field
["documentName": "Translate5 Demo Text-en-de.xlf"],
["segmentNumber": 8,]
["author": "Thomas Lauria"],
["timeStamp": "20210621T071042Z"], // if there is no timestamp, current time would be used
["context": "2_2"], // context and addInfo would be saved in TM in the same field
["addInfo": "2_2"],
["type": "Manual"], // could be GlobalMemory, GlobalMemoryStar, MachineTranslation, Manual, by default Undefined
["markupTable": "OTMXUXLF"], //if there is no markup, default OTMXUXLF would be used.
//Markup tables should be located inside ~/.t5memory/TABLE/%markup$.TBL
["loggingThreshold": 0],
["save2disk": 0] // flag if we need to flush tm to disk after update. by default is true
}
here are data struct used for search, so you can see max numbers of symbols
typedef struct _LOOKUPINMEMORYDATA
{
char szMemory[260];
wchar_t szSource[2050];
wchar_t szTarget[2050];
char szIsoSourceLang[40];
char szIsoTargetLang[40];
int lSegmentNum;
char szDocName[260];
char szMarkup[128];
wchar_t szContext[2050];
wchar_t szAddInfo[2050];
wchar_t szError[512];
char szType[256];
char szAuthor[80];
char szDateTime[40];
char szSearchMode[40]; // only for concordance search
char szSearchPos[80]; // only for concordance search
int iNumOfProposals;
int iSearchTime;
wchar_t szSearchString[2050];
} LOOKUPINMEMORYDATA, *PLOOKUPINMEMORYDATA;
Response example:success:
example_new{
"source": "The end",
"sourceNPRepl": "The end",
"sourceNorm": "The end",
"target": "The target",
"segmentNumber": 0,
"id": "",
"documentName": "Te2.xlf",
"sourceLang": "DE-DE",
"targetLang": "EN-GB",
"type": "Manual",
"author": "THOMAS LAURIA",
"timestamp": "",
"markupTable": "OTMXUXLF",
"context": "2_3",
"additionalInfo": "addInfo2",
"internalKey": "8:1"
}
example_old
{
"sourceLang": "de-DE",
"targetLang": "en-GB",
"source": "The end",
"target": "The target",
"documentName": "Translate5 Demo Text-en-de.xlf",
"segmentNumber": 222,
"markupTable": "OTMXUXLF",
"timeStamp": "20210621T071042Z",
"author": "Thomas Lauria"
}
in case if similar record exists, t5memory comparing source text,
if it's the same, t5memory would compare docName,
if it's the same,t5memory would compare timestamps and would leave only newer one
in case if TM is alreade reached it's limit, you would get
{
"ReturnValue": 5034,
"ErrorMsg": ""
}or{
"ReturnValue": 5035,
"ErrorMsg": ""
} |
Code Block |
---|
language | js |
---|
title | UpdateEntry Pseudo code |
---|
collapse | true |
---|
| Update entry pseudo code:update segment/import
{
if we have triples equal match (candidate for exact match)
{
UpdateTmRecord
if(updateFailed)
AddToTMAsNewKey
if(added) UpdateTmIndex
}else{
AddToTMAsNewKey
if(added) UpdateTmIndex
}
}
UpdateTmRecord{
getListOfDataKeysFromIndexRecord
sortThemByTriplesMatchesWithProposal(first have biggest match)
foreach key untill fStop==true{
readTmRecord // tm record is 16kB block in file, first number in "7:1"
//compare tm record data with data passed in the get in structure
CompareAndModifyPutData
if(NO_ERROR) set fStop = true;
}
}
CompareAndModifyPutData{
if source strings are equal
Delete old entry - with TMLoopAndDelTargetClb
if fNewerTargetExists -> fStop = TRUE
Loop thru target records
loop over all target CLBs or until fStop
if segment+file id found (exact-exact-found!)
update time field in control block
set fUpdate= fStop=TRUE
update context info
if not fStop
goto next CLB
endloop
if no matching CLB has been found (if not fStop)
add new CLB (ids, context, timestamp etc. )
endloop
endloop
if fupdated, update TM record
if !fStop (all target record have been tried & none matches )
add new target record to end of tm record
else
return source_string_error // errcode for UpdateTmRecord to go to the next TM record in prepared list
}
TMLoopAndDelTargetClb{
loop through all target records in tm record checking
loop over all target CLBs or until fStop
if lang + segment+file id found (exact-exact-found!)
if entry is older
delete it, fDel = TRUE
else set fNewerTargetExists=TRUE(would be used in CompareAndModifyPutData)
goon with search in next tgt CLB (control block)
else
goon with search in next tgt CLB (control block)
endloop
endif
if not fDel
position at next target record
endloop
} |
|
...
Logging |
---|
Level | Mnemonic | Description |
0 | DEVELOP | could make code work really slow, should be used only when debugging some specific places in code, like binary search in files, etc. |
1 | DEBUG | logging values of variables. Wouldn't delete temporary files(In MEM and TMP subdirectories), like base64 encoded\decoded tmx files and archives for import\export |
2 | INFO | logging top-level functions entrances, return codes, etc. Default value. |
3 | WARNING | logging if we reached some commented or hardcoded code. Usually commented code here is replaced with new code, and if not, it's marked as ERROR level |
4 | ERROR | errors, why and where something fails during parsing, search, etc |
5 | FATAL | you shouldn't reach this code, something is really wrongOther values would be ignored. The set level would stay the same till you change it in a new request or close the app. Logs suppose to be written into a file with date\time name under ~/.OtmMemoryService/Logs and errors/fatal are supposed to be duplicated in another log file with FATAL suffices |
6 | TRANSACTION | - Logs only things like begin\end of request etc. No purpose to setup this hight |
--v is glogs flag, for t5memory it make sense to set it to 0(by default) for production or to 2 for debuging.glog have it's own log levels and flags, but we are not touching them, default is okay, but it have just INFO, WARNING, ERROR. t5memory have it's own system, that was implemeted before proxygen, so it have 6 log levels (0=develop, 1=debug, 2=info, 3=warning, 4=error, 5=fatal, 6=transaction), which would be streamed to glog streams this way: 1. develop, debug, info, transaction would be streamed to glogs INFO stream 2. warning to WARNING 3. error and fatal error to ERROR stream, but also when first error log would happen, cached info about tm name and body of request that caused error would be flushed once per request. with next errors in the same request, you should not see "...with body... etc" in the log.--v and t5log level are just two separate filters for logs. when you set --v=0, glog allow only ERROR stream to work, so if you set t5loglevel to [0,1,2,3,4] wouldn't matter. but you can set to 5 to skip regular errors and have only fatal errors. in that mode also transaction log level is downgraded to be just info log levelwhen you set --v=2 you disable glogs filter, so you would have a lot of logs and now you can set logging intensivity with --t5loglevel. also t5transaction would be the highest log level now, you you can skip all info logs(with --t5loglevel=4) and still have transaction logs(which are not warnings or errors, usually something about that request handling begin or end, if you enable that). Shortly: for production just leave defaults(–v=0, --t5loglevel=2(info)), for debugging you need to set --v=2 and --t5loglevel to 0,1,2. Sometimes it make sense to use 1 or 2, because for 0 t5loglevel would print a ton of logs and t5memory would be slow.
Logging could impact application speed very much, especially during import or export. In t5memory there are 2 systems of logs - one from glog library and could be set in launch as commandline parameter and one is internal to filter out logs based on their level, can be set with every request that have json body with additional ["loggingThreshold": 0] parameter or at startup with flag. [loggingThreshold:"2"] Like here POST http://localhost:4040/t5memory/example_tm/ { sourceLang: “en”, // the source language is required for a new TM name: „TM Name“, loggingThreshold:"2" } This would set the logging level to INFO just before the main work of creating mem endpoint starts. DEVELOP could be used in really low level debugging, but most of the time DEBUG log is more useful, since DEVELOP would log a lot of logs. Transaction logs have the highest level of severity but it's severity is also changes with -v parameter, so with --v=2 it would be the highest log level(this log is not used often, it's only to track something like end or start of request) but with default --v=0 it's severity is belowe WARNING
gLog part - it have it's own configuration with command line flags. you can see all possible flags for t5memory with ./t5memory --help command. main parameter here is --v and you can set it to 2 or 0(default). By default it set to 0, in that case all not-errors would be avoided in logs, except startup. idea of --v=1 was to have logBuffer to keep log in some stream and in case of error show previous logs for that request, but it seems not so usefull, so it was not fixed and it's not working properly --v=2 is basicaly disables that buffering, so In case of error or fatalError, log would be written with info about what request caused that log to happen(but that info would be truncated to 3000 symbols, this is important for importTMX), but if there are second error with the same request, new logs would not have that requests info
Some parameters combinations: Default - --t5loglevel=2(T5INFO), --v=0, in this case you could see only init messages and errors only, with info about requests that caused error to happen Change only --v=2 - t5loglevel would be set by default to 2(T5INFO), so you could see T5INFO, T5WARNING, T5ERROR, T5FATAL, T5TRANSACTION messages Debug production --t5loglevel=1(T5DEBUG), --v=2 - should be enough to have some info about issues, a lot of logs, but not as much as with Develop Develop --t5loglevel=0(T5DEVELOP), --v=2 - all Logging could impact application speed very much, especially during import or export. In t5memory there are 2 systems of logs - one from glog library and could be set in launch as commandline parameter and one is internal to filter out logs based on their level, can be set with every request that have json body with additional ["loggingThreshold": 0] parameter or at startup with flag. [loggingThreshold:"2"] Like here POST http://localhost:4040/t5memory/example_tm/ { sourceLang: “en”, // the source language is required for a new TM name: „TM Name“, loggingThreshold:"2" } This would set the logging level to INFO just before the main work of creating mem endpoint starts. DEVELOP could be used in really low level debugging, but most of the time DEBUG log is more useful, since DEVELOP would log a lot of logs. Transaction logs have the highest level of severity but it's severity is also changes with -v parameter, so with --v=2 it would be the highest log level(this log is not used often, it's only to track something like end or start of request) but with default --v=0 it's severity is belowe WARNING Or in t5memory.conf file in line (config file is obsolete now) logLevel=0 Would set the log level to DEVELOP, this would be applied only after restarting of service gLog part - it have it's own configuration with command line flags. you can see all possible flags for t5memory with ./t5memory --help command. main parameter here is --v and you can set it to 2 or 0(default). By default it set to 0, in that case all not-errors would be avoided in logs, except startup. idea of --v=1 was to have logBuffer to keep log in some stream and in case of error show previous logs for that request, but it seems not so usefull, so it was not fixed and it's not working properly --v=2 is basicaly disables that buffering, so In case of error or fatalError, log would be written with info about what request caused that log to happen(but that info would be truncated to 3000 symbols, this is important for importTMX), but if there are second error with the same request, new logs would not have that requests info Some parameters combinations: Default - --t5loglevel=2(T5INFO), --v=0, in this case you could see only init messages and errors only, with info about requests that caused error to happen Change only --v=2 - t5loglevel would be set by default to 2(T5INFO), so you could see T5INFO, T5WARNING, T5ERROR, T5FATAL, T5TRANSACTION messages Debug production --t5loglevel=1(T5DEBUG), --v=2 - should be enough to have some info about issues, a lot of logs, but not as much as with Develop Develop --t5loglevel=0(T5DEVELOP), --v=2 - all possible logs, includes entering to some functions, some step-by-step mechanisms logs(like how t5memory is parsing and hashing strings) etc. Useful only when you can reproduce issue so you don't get lost in logs from just normal behaviour or when it's crashing etc.
It's possible to change t5loglevel with some requests, so for example for some specific update request, you can set it to some lower log level and then set it back. It would affect other threads, but since in logs you have info about thread, it could be useful tool.
Seems like --v parameter it's not quite useful, maybe should be refactored, since with --v=0 you wouldn't get any messages with severity lower than T5ERROR, except init process. But gLog library could be connected to some other libs in proxygen package
Here are all glog flags: Flags from src/logging.cc: -alsologtoemail (log messages go to these email addresses in addition to logfiles) type: string default: "" -alsologtostderr (log messages go to stderr in addition to logfiles) type: bool default: false -colorlogtostderr (color messages logged to stderr (if supported by terminal)) type: bool default: false -drop_log_memory (Drop in-memory buffers of log contents. Logs can grow very quickly and they are rarely read before they need to be evicted from memory. Instead, drop them from memory as soon as they are flushed to disk.) type: bool default: true -log_backtrace_at (Emit a backtrace when logging at file:linenum.) type: string default: "" -log_dir (If specified, logfiles are written into this directory instead of the default logging directory.) type: string default: "" currently: "/root/.t5memory/LOG/" -log_link (Put additional links to the log files in this directory) type: string default: "" -log_prefix (Prepend the log prefix to the start of each log line) type: bool default: true -logbuflevel (Buffer log messages logged at this level or lower (-1 means don't buffer; 0 means buffer INFO only; ...)) type: int32 default: 0 -logbufsecs (Buffer log messages for at most this many seconds) type: int32 default: 30 -logemaillevel (Email log messages logged at this level or higher (0 means email all; 3 means email FATAL only; ...)) type: int32 default: 999 -logfile_mode (Log file mode/permissions.) type: int32 default: 436 -logmailer (Mailer used to send logging email) type: string default: "/bin/mail" -logtostderr (log messages go to stderr instead of logfiles) type: bool default: false -max_log_size (approx. maximum log file size (in MB). A value of 0 will be silently overridden to 1.) type: int32 default: 1800 -minloglevel (Messages logged at a lower level than this don't actually get logged anywhere) type: int32 default: 0 -stderrthreshold (log messages at or above this level are copied to stderr in addition to logfiles. This flag obsoletes --alsologtostderr.) type: int32 default: 2 -stop_logging_if_full_disk (Stop attempting to log to disk if the disk is full.) type: bool default: false
|
...
Openning and closing TM |
---|
In first concept it was planned to implement routines to open and close a TM. While concepting we found some problemes with this approach: - First one is the realization: opening and closing a TM by REST would mean to update the TM Resource and set a state to open or close. This is very awkward.
- Since in translate5 multiple tasks can be used to the same time, multiple tasks try to access one TM. Closing TMs is getting complicated to prevent race conditions in TM usage.
- Since OpenTM2 loads the whole TM in memory, OpenTM2 must control itself which TMs are loaded or not.
This leads to the following conclusion in implementation of opening and closing of TMs: OpenTM2 has to automatically load the requested TMs if requested. Also OpenTM2 has to close the TMs after a TM was not used for some time. That means that OpenTM2 has to track the timestamps when a TM was last requested. Concept endpoints, not implemented
http://opentm2/translationmemory/[TM_Name]/openHandle GET – Opens a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time. http://opentm2/translationmemory/[TM_Name]/openHandle DELETE – Closes a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time. For now we open TM in case of call to work with it. TM stays opened till the shutdown we wouldn't try to open more TM's, exceeding the RAM limit setupped in config file. In that case we would close TM in order of longest not used, till we would fit in limit including TM that we try to open. TM size is calcucated basicaly as sum .TMD and .TMI files Ram limit doesn't include service RAM and temporary files
|
Multithreading |
---|
In 0.6.44 multithreading are implemented this way - you can set number of sevice threads with --servicethreads 8 commandline argument. This would be a number of threads which would handle requests, but also there would be 2-3 proxygen service threads running, and also for every import and reorganize - they would be new threads created.
- there are mutexes for shared resources, like filesystem and some shared files in the ram, which are not configurable
- there are 3 configurable recursive timed mutexes, which can be also be used as non-timed mutexes(that means that it would not have timeout, but would wait till mutex would become free). To use them that way, they need to be set to 0
This mutexes are - - requestTMMutex - mutex for whole requestTM functions, which can just find tm in tm list, or reserve a place for tm in that list(tm would be opened later). Probably could be disabled to optimize code, was implemented as first high level mutex
- tmListMutex - every operation with tm list, like search, adding or deleting elements is managed with that mutex.
- tmMutex - most requests which have tm_name in url, except status request, would be blocking - they would occupy tm for whole execution time(after request data would be parsed and checked). The reason for that is opentm2 code, which still have too many low level chunks, which makes multithreading impossible.
- by default in import or reorganize threads(non request handlers - that would have regular mutexes, but threads, which are created by that handlers, which would run after you would receive response from your reorganize or import tmx requests) would be used non-timed mutexes. So this threads would wait till tm would be free. You can change that with commandline argument
UseTimedMutexesForReorganizeAndImport 1 - You can set default values for tmRequestLockDefaultTimeout, tmLockDefaultTimeout, tmRequestLockDefaultTimeout using commandline arguments with this names. Value would be set in ms, default value is 0, which means that timeouts would be disabled. That change would apply for all requests without body and for requests with body if other value is not provided. For import and reorganize threads by default would be used non-timed mutexes, but if it's changed with commandline argument, would be used value from corresponding request(if provided, or default if not).
- saveAllTms request could be used in 2 ways - with non timed mutexes for tms, or with timed mutexes, and in case of timeout, tm would not be saved and request would skip to the next tm. In response you would see message about saved and timeouted tms.
- shutdown request would internally use saveAllTms with hardcoded non-timed mutexes. But it could fail on tmListMutexTimeout when checking how many tm are in import and reorganize status
- resources request would use tmListMutex and timeout, when listing tms. It case of timeout, instead of list of tms, Failed to lock tm list: (timeout_error_msg) would be returned. But that wouldn't be threated as request error.
- in case of timeout fail, you would get back (and into the logs) errorcode 506 and one of the next messages(where (msg) is generated service message about location of failed lock and backtrace):
(msg); Failed to lock tm list:(msg) /(msg); Failed to lock requestTm:(msg) / (msg); Failed to lock tm:(msg) like this : { "ReturnValue":506, "ErrorMsg":"Failed to acquire the lock(tmMutex:18) within the timeout(1ms)(location: requestTM:339); Failed to lock tm:(location: requestTM:342); " } - you can see default timeout values in initMsg. every timeout value is ms value.
- for requests with body, you can provide value for each type of mutexes as integer with this names:
{ "tmMutexTimeout": 5000, "tmListMutexTimeout": 4000, "requestTMMutexTimeout": 15000, ... }
|
Mutexes and request handling details: Can you explain more, why the tm list mutex is needed? Should it not be enough to block a new request from accessing a TM that is already in use?
it is mutex to block access to the tm list. It's not quite a list, internally it's a map, hash table, or dictionary- so it has a key, which is tm name, and value, which is tm data, and its auto sorting also, that means, as with any non-fixed size array, that I can't be sure about its memory location, because during, for example, search in one thread, it could be completely reallocated to another size in the memory, if some thread would try to add some new tm object to that list. so even read operations should be blocking so yes, I would explain about simplier mutexes first In 0.5 I made every request handler as a class, which have the same abstract ancestor, which implements the same strategy pattern to execute any request, but each request implements it's own methods as it need it to be implemented here is a code to run every request, but each request type implements it's own parseJSON, checkData and execute methods. TimedMutexGuard is a class that I implemented to make timed recursive mutexes and be flexible and that could be used as RAII, and tmLockTimeout is a timeout class, which I implemented to have timeout fields, flag that some timeout is out of time, so in case of that, execution should be rolled back with an error code. Also, in case of an error, that class writes down a string with all functions names and a line numbers, that was calling for that mutex even if it's nested, to trace location of failed mutex. if timeout would be set to 0, it would be used as non-timed mutex //pseudo-code but it's close to real
int RequestData::run(){ if(!res) res = parseJSON(); if(!res) res = checkData(); if(!res) res = requestTM(); if(!res) { if(isLockingRequest()) { TimedMutexGuard l(mem->tmMutex, tmLockTimeout, "tmMutex") ; // mutexes used as RAII, to open the lock automatically if(tmLockTimeout.failed()){ return buildErrorReturn(506, tmLockTimeout.getErrMsg(), _rc_); // would return "Failed to lock tm:...
} res = execute(); }// here mutex would be destroyed else { // request that doesn't require lock, so it's in general without tmName in URL + status request res = execute(); } } buildRet(res); //reset pointer if(mem != nullptr) mem.reset(); return res; } so here you can see how tmLock is used it's blocking selected tm for execution time only then there are requestTM function, which would, depending on request type, request writePointer, readPointer or service pointer (some requests, like list of tm's , that don't have tm in url, or status request, as exception) requesting writePointer or readPointer would check tm list and return tm if tm is found else, it would init tm with found tm name and add that to tmList(it used to open tm also, but now it's separated) service pointer wouldn't add nor open tm if it's not in list, but, can also return tm, if it's in list, without blocking it, for status request) and that requestTM function have it's own mutex. it was implemented first, so only one tm could be requested at a time. Maybe with that tmList mutexes that I would explain later here, we don't need that requestTM function, but anyway... requestTM could also try to free some space, closing tms in the list, so at least that probably should be managed and synced. so we would see if it's needed
and TmList mutex so, every access to tm list block that mutex few versions ago I had simple mutexes, but then it became recursive mutexes to simplify code and make it safer. Then I had 2 versions of functions, safe, with mutex, and unsafe, that was used in functions that had that mutex locked on higher level for example, I had this most primitive function
bool TMManager::IsMemoryInList(strMemName, tmListTimeout)
{ TimedMutexGuard l {mutex_access_tms, tmListTimeout, "tmListMutex"};// lock tms list if(tmListTimeout.failed()){ returnError(".Failed to lock tm list:"); return false; } // if TM mutex is failed in some nested function, tmListTimeout would be marked as spoiled so every other mutex that would use that timeout would be failed after first fail. in execution boolean function would return false, but also check if mutex was spoiled is needed to find out if function returned false because it didn't find (in this case) tm in list, or because its timeout is spoiled. But all checks is placed in code for now. return TMManager::find(strMemName); // if lock was successful - try to find mem }
so it's boolean, but making it return some custom data type would make it harder to use it's used everywhere, for example, to check if memory is loaded, but without having pointer, we have this function
bool TMManager::IsMemoryFailedToLoad(strMemName, tmListTimeout){ TimedMutexGuard l{mutex_access_tms, tmListTimeout, "tmListMutex"}; bool res = false; if(tmListTimeout.failed()) { tmListTimeout.addToErrMsg(".Failed to lock tm list:"); return false; } if(IsMemoryInList(strMemName, tmListTimeout) && tms[strMemName]->isFailedToLoad()) { res = true; } if(tmListTimeout.failed()) { // if timeout was spoiled, errMessage would be extended with new, so you would have backtrace with functions and lines in the file in the outputting message tmListTimeout.addToErrMsg(".Failed to lock tm list:"); return false; } return res; } which also have blocking IsMemoryInList but also blocks the same mutex because it's also working with tm list directly, and theoretically some other thread could change tm list between these lines
IsMemoryInList(strMemName, tmListTimeout) // memory was checked to be present in the list && tms[strMemName]→isFailedToLoad()) // slow change that list was changed from the line above in another thread
so that type of boolean functions in case of timeout would return false, and then places where they were called should check if they returned false and timeout is not exiped. In case if it expired once, it would fail every next mutex lock, so even if that check is missing, that should now be handled in a right way. every addToErrMsg sets that tmMutex is expired and adds some comment, function name and line number, so they could be tracked. and tm list could be used not only when just requesting tm, but, for example, for resource request, or to free some space, or flush tm during shutdown. so access to tm list should be managed and synced that 2 classes TimedMutexGuard and MutexTimeout(tmListTimeout- is an object of that class) makes it more time consuming to implement that mutexes, because of that requiremnt They should provide RAII possibility, be recursive and timed, but also supports non-timed mode, collect info in case if timeout failed, and also secure that in case if one timeout would fail, next mutexes would fail automatically, collect, provide and log data, where it failed Regarding and that requestTM function have it's own mutex. it was implemented first, so only one tm could be requested at a time so I understand it right, as long as one TM is requested for opening (not for reading) another one can not be opened at the same time? That does not make sense to me. Why that? That means, a request for a small TM has to wait, until a large TM has been loaded into the RAM.
Load call is outside of mutex_requestTM mutex, so it wouldn't be blocked in current version. Openning(loading) of the tm files is happening outside of the mutex_requestTM area.
So an active mutex to the tm list would still block every request then, also to other TMs, rigth? How long will it be blocked for example, if I import a new TMX into one TM? The whole time the TMX is imported? And same question for update? And what about read requests? I understood, they will also block the TM list, right? But why I do not understand. no, it wouldn't be blocked whole time for another tm's it would be blocked for the time to check if tm is in list, and then if not, it would add it to the list. TMListLock is used only to prevent rearranging list when accessing its data. during that it would calculate how much space is needed and if we have enough space free in not enough, it would delete tms from the list starting from the one that was not in use for the longest time but if that tm is blocked, that wouldn't prevent for deletion from the list, because there are smart pointers used there, so tm would be closed when last pointer would be freed. but it could take longer if there are some processes which uses tm list blocking tm is necessary because of big chunk of low-level code which exists in opentm2 which operated with pointers to the memory (RAM) for a long time I implemented read pointer and write pointer to be able in future have multiple read pointers and only single write pointer at the time, but with old code of opentm2 we need to treat every request as write request, because it can lead to memory reallocations that leads to crashes For example if you have 2 projects translating with the same tm assigned, they would send both requests to the same tm, and that leads to crashes. that was actually one of the things that I fixed in recent versions so tm should be blocked at least till some legacy code would be removed. like lookup tables. |
|
---|
Info below is actual for version 0_5_x
Starting from version 0_5_0 .mem file is excluded from TM files - tm now consists only with .tmd and .tmi files. That files have 2kb headers which have some useful information, like creation date and version in which that file was created. In general, changing mid_version number means binary incompatible files. During reorganize there would be created new empty tm and then segments would be reimported from previous, and then old files would be deleted and new ones would be renamed to replace old files. That means that reorganize would also update creation t5memory version of files to the newest.
TM file is just archive with tmi and tmd files.
tmd and tmi files should be flushed in a safe way - saved on disk with temporary filename and then replacing old files.(Should be implemented)
There is tmmanager(as singletone) which have list of tm, and one tm instance have two binary trees(for both (tmd)data and (tmi)index files), with each have own filebuffer instance(before there used to be a pool of filebuffers and it's files operation functions, like write, read, close and open was handling requests).
Request handler - it's an instance of class in request handler hierarhy classes. For each type of requests there is class to handle it. In general it have private functions "parseJSON"(would parse json if provided and would return error if json is invalid), "checkData"(whould check if all required fields was provided), "requestTM"(would request readOnly, write or service tm handlers. It would load tm if it is not loaded in RAM yet) and "execute" - original requests code. And also it has public function "run" which is stategy template to operate listed private function.
The TMs is saved in TMManager using smart pointers(it's pointer which track references to itself and call destructor automaticaly). That means that on request it's possible to clear list from some TM, while it would still be active in other thread(like in fuzzy search). Then ram would be freed at the end of last request handling that TM. In case if in the middle of some request(like fuzzy search) there was a call to delete tm, first we clear TMlist(but we keep smart pointer in fuzzy requests thread, so this is not calling destructor yet, but would after fuzzy request would be done). Destructor would try to flush filebuffer into filesystem but because there is no files in the disk, filebuffers would not create them again and it would just clean the RAM(in that case log would be writen about filebuffer flush not founding file in the folder).
From TMManager, request could ask for one of 3 types of tm handers - readonly, write or service. ReadOnly\write requests here have it's name from inside-tm perspective(so operations with tm files in filesystem is service requests). ReadOnly(concordance search, fuzzy search, exportTmx) would be provided if there is no write handlers, for write handlers(deleteEntry, updateEntry, importTmx) there should be no other write handlers and no readOnly handlers. Service handlers could mean different for different requests. For example status request should be able to access something like readonly handler, but it shouldn't be blocked if there is any write requests, since it's used for checking import\reorganize status and progress. For some filesystem requests(deleteTM, createTM, cloneTM, importTM, exportTM(internal format)) there should be other blocking mechanism, since most of them even doesn't require to load tm into the ram.
In case if tm is not in RAM, requesting handler from TMManager would try to load TM into the RAM, considering RAM limit explained in this document.
|
...
Previous documentation: Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| http://opentm2/translationmemory/
POST - creating a new or importing an existing filebased binary OpenTM2 TM
The Parameter „name“ contains the TM Name as a string. The string has a maxlength of 256 chars. It can contain any characters except the characters backslash (\), slash(/), colon (:), question mark (?), asterisk (*), vertical line (|), less than sign (<), and greater than sign (>).
Uploading a file is optional, omitting a file means creating a empty TM only.
If an empty TM is created, the POST request contains only the JSON structure with the TM Name.
If an existing binary OpenTM2 file should be additionally imported to the new TM, the POST must be encoded as multipart/form-data.
The JSON structure with the meta data will then be in the first chunk of the multiparted request, the chunk must be named “meta”.
The second chunk contains the plain binary file content and must be named “data”. This binary data contains the TM content
The resulting body contains the name of the TM, as given in the POST request.
To OpenTM2 – without data / creating an empty TM:
{
sourceLang: “en”, // the source language is required for a new TM
name: „TM Name“,
[loggingThreshold:"2"]
}
Raw POST to OpenTM2 – with provided import file:
POST http://opentm2/translationmemory HTTP/1.1
Content-Type: multipart/form-data; boundary="autogenerated"
-- autogenerated
Content-Type: application/json; charset=utf-8
Content-Disposition: form-data; name=meta
{"name":"TM Name", sourceLang:"en"}
--autogenerated
Content-Type: image/jpeg
Content-Disposition: form-data; name=data; filename=Original Filename.jpg
...TM content ...
--autogenerated--
In both cases from OpenTM2 - HTTP 200 OK:
{
name: „TM Name“
}
Errors:
400 Bad Request – if parameters are missing or are not well formed.
409 Conflict – if a memory with the given name already exists.
500 Server Error – for other technical problems.
http://opentm2/translationmemory/[TM_Name]/import
POST import a TMX file into an existing OpenTM2 TM
To OpenTM2:
multipart/form-data like on POST above, expect that no separate JSON section is needed here.
Call answers directly after the upload is done, but before the import starts with HTTP 201 – this means: Import is created and will be started now.
From OpenTM2 - HTTP 201 OK:
{ // empty JSON object, since no data expected as result here!
}
Errors:
400 Bad Request – if parameters are missing or are not well formed.
404 Not Found – if the memory of the given name does not exist
500 Server Error – for other technical problems.
http://opentm2/translationmemory/[TM_Name]/status
GET status of a TM
To OpenTM2:
multipart/form-data like on POST above, expect that no separate JSON section is needed here.
From OpenTM2 - HTTP 200 OK:
{
‘status’:’import’ //allowed status values: import, available, error
}
Errors:
400 Bad Request – if parameters are missing or are not well formed.
404 Not Found – if the memory of the given name does not exist
500 Server Error – for other technical problems.
http://opentm2/translationmemory/
GET – retrieving a list of available TM Files
To OpenTM2: -
From OpenTM2 - HTTP 200 OK:
[{
name: 'my nice TM'
}]
Errors:
500 Server Error – for other technical problems.
http://opentm2/translationmemory/[TM_Name]/
TM_Name is URL-encoded
GET – retrieving a single TM File
To OpenTM2: -
From OpenTM2 - HTTP 200 OK:
Same as POST from OpenTM2 result.
Errors:
404 Not Found – if TM file to given [TMID] in URL was not found
500 Server Error – for other technical problems.
DELETE – deletes an existing TM File
Adressed by the given URL, no body needed.
Errors:
404 Not Found – if TM file to given [TMID] in URL was not found
500 Server Error – for other technical problems.
PUT – updating an existing TM File in one request
Currently not needed, would be only to change the TM name
GET – list of all segments from TM
Currently not needed.
http://opentm2/translationmemory/[TM_Name]/entry/
POST – creates a new entry or updates target entry if match pair already exists
This method updates an existing proposal when a proposal with the same key information (source text, language, segment number, and document name) exists.
Parameters sourceLang and targetLang are containing the languages as RFC5646.
Parameters source and target are containing the entry contents to be stored. Format? plain string?
Attribute Parameters:
documentName: contains the filename where the segment resides in Translate5.
context: evaluates to Translate5 segment mid.
markupTable: OpenTM2 gets a new markup table named „translate5“, so this is the value which is delivered by Translate5.
timestamp: this parameter is not set by translate5, but calculated automatically and delivered from OpenTM2 to translate5.
author: contains the named user which provides the update / new entry
In addition there are the following OpenTM2 Attributes currently not used by translate5:
segmentNumber
additional info
type
To OpenTM2:
{
sourceLang: 'de',
targetLang: 'en',
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
documentName: 'my file.sdlxliff',
segmentNumber: ,
markupTable: 'translate5',
author: „Thomas Lauria“,
type: '',
timeStamp: '',
context: '123',
addInfo: '',
[loggingThreshold:"2"]
}
The result from the server contains the same data as posted to the server. No additonal ID is added, since the entries are identified by the whole source string instead by an ID, only the timestamp is added.
From OpenTM2 – HTTP 200 OK:
{
sourceLang: 'de',
targetLang: 'en',
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
documentName: 'my file.sdlxliff',
segmentNumber: 123,
markupTable: 'translate5',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“
}
Errors:
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems.
400 Bad Request – if JSON parameters are missing or are not well formed.
http://opentm2/translationmemory/[TM_Name]/fuzzysearch/
POST– Serves a memory lookup based on the provided search criteria
To OpenTM2:
{
sourceLang: 'de',
targetLang: 'en-US',
source: „Das ist das Haus des Nikolaus“,
documentName: 'my file.sdlxliff', // can be empty
segmentNumber: 123, // can be empty
markupTable: 'translate5', // can be empty
context: „xyz“, // can be empty
[loggingThreshold:"2"]
}
From OpenTM2 HTTP 200 OK:
{
'NumOfFoundProposals': 2,
'results':
[{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
},{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
}]}
Errors:
400 Bad Request – if search, query or language parameters are missing or are not well formed.
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems.
http://opentm2/translationmemory/[TM_Name]/concordancesearch /?
POST – Performs a context search of the given search string in the proposals contained in a memory. Returns one proposal per request.
To OpenTM2:
{
searchString: 'Haus des Nikolaus',
searchType: 'source', // values can be source or target
searchPosition: 123// can be empty; Position where a search should start in the memory, see below
numResults: 1,
msSearchAfterNumResults: 100 //number of milliseconds the search will continue, after the first result is found. All additional results that are found in this additional time will also be returned until numResults is reached. If numResults is reached before msSearchAfterNumResults is reached, the search will abort. If msSearchAfterNumResults is reached before numResults is reached, search is also aborted. All found results are delivered in both cases.
[loggingThreshold:"2"]
}
From OpenTM2 HTTP 200 OK:
{
NewSearchPosition: '123:54', /returns NULL, if end of TM is reached, see below
results:[{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
},{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
}]}
Errors:
400 Bad Request – if search, query or language parameters are missing or are not well formed.
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems. , language, segment number, and document name) exists.
Parameters sourceLang and targetLang are containing the languages as RFC5646.
Parameters source and target are containing the entry contents to be stored. Format? plain string?
Attribute Parameters:
documentName: contains the filename where the segment resides in Translate5.
context: evaluates to Translate5 segment mid.
markupTable: OpenTM2 gets a new markup table named „translate5“, so this is the value which is delivered by Translate5.
timestamp: this parameter is not set by translate5, but calculated automatically and delivered from OpenTM2 to translate5.
author: contains the named user which provides the update / new entry
In addition there are the following OpenTM2 Attributes currently not used by translate5:
segmentNumber
additional info
type
To OpenTM2:
{
sourceLang: 'de',
targetLang: 'en',
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
documentName: 'my file.sdlxliff',
segmentNumber: ,
markupTable: 'translate5',
author: „Thomas Lauria“,
type: '',
timeStamp: '',
context: '123',
addInfo: '',
[loggingThreshold:"2"]
}
The result from the server contains the same data as posted to the server. No additonal ID is added, since the entries are identified by the whole source string instead by an ID, only the timestamp is added.
From OpenTM2 – HTTP 200 OK:
{
sourceLang: 'de',
targetLang: 'en',
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
documentName: 'my file.sdlxliff',
segmentNumber: 123,
markupTable: 'translate5',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“
}
Errors:
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems.
400 Bad Request – if JSON parameters are missing or are not well formed.
http://opentm2/translationmemory/[TM_Name]/fuzzysearch/
POST– Serves a memory lookup based on the provided search criteria
To OpenTM2:
{
sourceLang: 'de',
targetLang: 'en-US',
source: „Das ist das Haus des Nikolaus“,
documentName: 'my file.sdlxliff', // can be empty
segmentNumber: 123, // can be empty
markupTable: 'translate5', // can be empty
context: „xyz“, // can be empty
[loggingThreshold:"2"]
}
From OpenTM2 HTTP 200 OK:
{
'NumOfFoundProposals': 2,
'results':
[{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
},{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
}]}
Errors:
400 Bad Request – if search, query or language parameters are missing or are not well formed.
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems.
http://opentm2/translationmemory/[TM_Name]/concordancesearch /?
POST – Performs a context search of the given search string in the proposals contained in a memory. Returns one proposal per request.
To OpenTM2:
{
searchString: 'Haus des Nikolaus',
searchType: 'source', // values can be source or target
searchPosition: 123// can be empty; Position where a search should start in the memory, see below
numResults: 1,
msSearchAfterNumResults: 100 //number of milliseconds the search will continue, after the first result is found. All additional results that are found in this additional time will also be returned until numResults is reached. If numResults is reached before msSearchAfterNumResults is reached, the search will abort. If msSearchAfterNumResults is reached before numResults is reached, search is also aborted. All found results are delivered in both cases.
[loggingThreshold:"2"]
}
From OpenTM2 HTTP 200 OK:
{
NewSearchPosition: '123:54', /returns NULL, if end of TM is reached, see below
results:[{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
},{
source: „Das ist das Haus des Nikolaus“,
target: „This is the house of St. Nicholas“,
sourceLang: 'de', ← rfc5646
targetLang: 'en', ← rfc5646
matchRate: '100',
documentName: 'my file.sdlxliff',
DocumentShortName: 'shortnam.txt',
id: 'identifier',
type: 'Manual',
matchType: 'Exact',
segmentNumber: 123,
markupTable: 'XYZ',
timestamp: '2015-05-12 13:46:12',
author: „Thomas Lauria“.
context: '',
addInfo: ''
}]}
Errors:
400 Bad Request – if search, query or language parameters are missing or are not well formed.
404 Not Found – if TM file to given [TM_Name] in URL was not found
500 Server Error – for other technical problems. |
|
Updated for v0.6.75
To configure t5memory use it's commandline flags. To list all the flags start t5memory with help flag: ./t5memory --help. All flags related to t5memory and not libraries is under otmd.cpp section. Also you can send get request to t5memory_service/flags - it would print all the flags with description and current and default values. Here are those flags
Flags from /home/or/workspace/translate5/translate5-tm-service-source/source/otmd.cpp:
-add_premade_socket (if set to true, socket instance would be created
outside of proxygen and then binded, that made possible to add tcp backog
event handler and use socket_backog option) type: bool default: false
currently: true
-allowLoadingMultipleTmsSimultaneously (If set to true, multiple tms could
be loaded from the disk at the same time. ) type: bool default: false
-allowedram (Sets amought RAM(in MB) allowed for service to use)
type: int64 default: 5000
-allowedtmdsize (Sets max size of tmd file(in MB) after which t5m would not
allow to add new data to the tm) type: int64 default: 500
-debug_sleep_in_request_run (If set, provide artificial delay in every
request handling execution equal to provided num of microseconds)
type: int64 default: 0 currently: 10000000
-disable_aslr (If set to true, process personality would be set to
ADDR_NO_RANDOMIZE) type: bool default: false currently: true
-enable_newlines_in_logs ((not working)if set to true, would keep newline
symbols in the logs, otherwise(by default) newlines would be removed and
logs would be oneliners) type: bool default: false
-flush_tm_at_shutdown (If set to true, flushes tm when shutting down the
app not using shutdown request) type: bool default: false
-flush_tm_to_disk_with_every_update (If set to true, flushes tm to disk
with every successfull update request) type: bool default: false
-forbiddeletefiles (Set to true to keep all files(including temporary and
tm)) type: bool default: false
-http_listen_backlog (Sets http options listen backog) type: int64
default: 128 currently: 32
-ignore_newer_target_exists_check (if set to true, check for newer already
saved target would be skipped for saving segments) type: bool
default: true
-keep_tm_backups (if set to true, when saving tmd and tmi files, old copies
would be saved with .old suffix) type: bool default: true
-limit_num_of_active_requests (If set to true, it would be possible to
handle only up to servicethreads-1 requests at the same time, the last
thread would respond with 503 to eliminate creating queue of requests
waiting to be handled.) type: bool default: false
-logMutexes (if set to true you would see mutex logs) type: bool
default: false
-log_every_request_end (Sets log for every request end with it's url,
method etc...) type: bool default: false
-log_every_request_start (Sets log for every request call with it's url,
method etc...) type: bool default: false
-log_memmove_in_compareputdata (if set to true, when saving segment and
causing memmove in compareputdata functions, just before memmove, data
would be logged - use this to debug btree crashes.) type: bool
default: false
-log_tcp_backog_events (if set to true, tcp backlog events would be
logged(to enable, add_premade_socket flag should be set to true))
type: bool default: false currently: true
-port (What port to listen on) type: int32 default: 4080
-servicename (Sets service name to use in url) type: string
default: "t5memory"
-servicethreads (Sets amought of worker threads for service) type: int32
default: 5
-socket_backlog (Sets proxygen socket listen backog(disabled, to enable set
add_premade_socket=true)) type: int64 default: 1024 currently: 32
-t5_ip (Which ip to use in t5memory(default is any). Should be in format
'1.1.1.1', default is to listen to all available ip) type: string
default: ""
-t5loglevel (Sets t5memory log level threshold from DEVELOP(0) to
TRANSACTION(6)) type: int32 default: 2 currently: 3
-timeout (Sets timeout for service request handling) type: int32
default: 180000
-tmListLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for TM
list(which is used to open and close tms, and hold list of opened tms),
after which operation would be canceled and mutex would return an error,
if set to 0, mutex lock would be waited without timeout) type: int64
default: 3000
-tmLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for TM after which
operation would be canceled and mutex would return an error, if set to 0,
mutex lock would be waited without timeout) type: int64 default: 3000
-tmRequestLockDefaultTimeout (Sets tm mutex lock timeout(in ms) for part
where request is requesting tm(which is used to open and close tms, and
hold list of opened tms), after which operation would be canceled and
mutex would return an error, if set to 0, mutex lock would be waited
without timeout) type: int64 default: 3000
-triplesthreshold (Sets threshold to pre fuzzy filtering based on hashes of
neibour tokens) type: int32 default: 5
-useTimedMutexesForReorganizeAndImport (If set to true, in reorganize or
import thread would be used mutexes with timeouts, and reorganizee or
import could be canceled, false(by default) - would be used non timed
mutexes) type: bool default: false
-wait_for_import_and_reorganize_requests (If set to true, waiting for all
import and reorganize processes to be done at shutdown when not using
shutdown request) type: bool default: true
Hints:
- -debug_sleep_in_request_run would add delay to every requests
- In theory you can restore tm with only tmd file using reorganize, but in case if there would be some issue during reorganize, tm in ram would be in instable state, so keep original tmd backed up anyway
- you can set filter for requests using --t5loglevel from 0 to 6. --v could be set to 0 or 2. if set to 0 only errors would be logged and transaction log level would be mapped to info.
- keep_tm_backups in case of flushing to the disk, older version would be kept with .old suffix, enabled by default
- triplesthreshold have big impact on fuzzy search speed, but if you set it to too big value, some good matches could be filtered out. in old opentm2 value was, I think, 33.
- to test tcp backlog you can set --add_premade_socket=1 --t5loglevel=4 --v=2 --debug_sleep_in_request_run=10000000 --log_tcp_backog_events=true --log_every_request_end=1 --log_every_request_start=1 --http_listen_backlog=4 --socket_backlog=2
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Endpoints overview | default endpoint/example | Is async? |
---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ |
|
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ |
|
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ |
|
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url |
|
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status |
|
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status |
|
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch |
|
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch |
|
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry |
|
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete |
|
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms |
|
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown |
|
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement |
|
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources |
|
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search |
|
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry |
|
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ ("multipart/form-data") | /t5memory/tm1/ ("multipart/form-data") |
|
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx |
|
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
|
26 | Flush tm | If tm is open, flushes it to the disk(implemented in 0.6.33) | GET | /%service%/%tm_name%/flush | /t5memory/tm1/flush |
|
27 | Flags | Return all available commandline flags(implemented in 0.6.47). Do not spam too much because gflags documentation says that that's slow. Useful to collect configuration data about t5memory to do debugging. | GET | /%service%_service/flags | /t5memory_service/flags |
|
Available end points
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
{
"Open": [
{
"name": "mem2"
}
],
"Available on disk": [
{
"name": "mem_internal_format"
},
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
|
Create TM |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "name": "examle_tm", // this name would be used as filename for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version. This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder If there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xml
Starting from 0.6.52 import in internal format supporst multipart/form data, so you can send then both file and json_body. In json_body only "name" attribute is required(sourceLang would be ignored anyway). Send it in a same way as streaming import TMX. Json body should be in pretty formatting and in a part called json_body to be parsed correctly. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:{ "name": "mem_internal_format", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
Response example:{
"name": "examle_tm"
}
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg": ""
} |
|
Clone TM localy |
---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "newName": "examle_tm" // when cloning, cloned tm would be renamed to this name(source tm is in url)
}
Response example:
Success:
{
"msg": "newBtree3_cloned2 was cloned successfully",
"time": "5 ms"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "'dstTmdPath' = /home/or/.t5memory/MEM/newBtree3_cloned.TMD already exists; for request for mem newBtree3; with body = {\n \"newName\": \"newBtree3_cloned\"\n}"
} |
|
Flush TM |
---|
Purpose | If TM is open, flushes it to the disk |
Request | Get /%service%/%tm_name%/flush |
Params |
|
Endpoint is sync(blocking) If tm is not found on the disk - returns 404 If tm is not open - returns 400 with message Then t5m requests write pointer to the tm(so it waits till other requests that's working with the tm would finish) and then it flushes it to the disk Could also return an error if flushing got some issue. Would not open the tm, if it's not opened yet, but instead would return an error. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
Success: {
"msg": "Mem test1 was flushed to the disk successfully"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not found"
}// or
{
"ReturnValue": -1,
"ErrorMsg": "FlushMemRequestData::checkData -> tm is not open"
} |
|
Delete TM |
---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
success:
{
"newBtree3_cloned2": "deleted"
},
|
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
failed:
{
"newBtree3_cloned2": "not found"
} |
|
Import provided base64 encoded TMX file into TM |
---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/import |
Params | {"tmxData": "base64EncodedTmxFile" } - additional:
"framingTags": "saveAll" - default behaviour, do nothing "skipAll" - skip all enclosing tags, including standalone tags "skipPaired" - skip only paired enclosing tags
|
TM must exist It's async, so check status using status endpoint, like with reorganize in 0.5.x and up If framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:{
["framingTags": "skipAll"["skipPaired", "saveAll"],]
"tmxData": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHRteCB2ZXJzaW9uPSIxLjQiPgogIDxoZWFkZXIgY3JlYXRpb250b29sPSJTREwgTGFuZ3VhZ2UgUGxhdGZvcm0iIGNyZWF0aW9udG9vbHZlcnNpb249IjguMCIgby10bWY9IlNETCBUTTggRm9ybWF0IiBkYXRhdHlwZT0ieG1sIiBzZWd0eXBlPSJzZW50ZW5jZSIgYWRtaW5sYW5nPSJlbi1HQiIgc3JjbGFuZz0iYmctQkciIGNyZWF0aW9uZGF0ZT0iMjAxNTA4MjFUMDkyNjE0WiIgY3JlYXRpb25pZD0idGVzdCIvPgogIDxib2R5PgoJPHR1IGNyZWF0aW9uZGF0ZT0iMjAxODAyMTZUMTU1MTA1WiIgY3JlYXRpb25pZD0iREVTS1RPUC1SNTlCT0tCXFBDMiIgY2hhbmdlZGF0ZT0iMjAxODAyMTZUMTU1MTA4WiIgY2hhbmdlaWQ9IkRFU0tUT1AtUjU5Qk9LQlxQQzIiIGxhc3R1c2FnZWRhdGU9IjIwMTgwMjE2VDE2MTMwNVoiIHVzYWdlY291bnQ9IjEiPgogICAgICA8dHV2IHhtbDpsYW5nPSJiZy1CRyI+CiAgICAgICAgPHNlZz5UaGUgPHBoIC8+IGVuZDwvc2VnPgogICAgICA8L3R1dj4KICAgICAgPHR1diB4bWw6bGFuZz0iZW4tR0IiPgogICAgICAgIDxzZWc+RXRoIDxwaCAvPiBkbmU8L3NlZz4KICAgICAgPC90dXY+CiAgICA8L3R1PgogIDwvYm9keT4KPC90bXg+Cg=="
}Response example:Error in case of errorFrom v0_2_15
{ "%tm_name%":""} in case of success
Check status of import using status call
TMX import could be interrupted in case of invalid XML or TM reaching it's limit. For both cases check status request to have info about position in tmx file where it was interrupted. |
|
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Endpoints overview | default endpoint/example | Is async? |
---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ |
|
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ |
|
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ |
|
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url |
|
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ |
|
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status |
|
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status |
|
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch |
|
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch |
|
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry |
|
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete |
|
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms |
|
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown |
|
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement |
|
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources |
|
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search |
|
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry |
|
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ ("multipart/form-data") | /t5memory/tm1/ ("multipart/form-data") |
|
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx |
|
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
|
Available end points
List of TMs |
---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app. Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Response example:
{
"Open": [
{
"name": "mem2"
}
],
"Available on disk": [
{
"name": "mem_internal_format"
},
{
"name": "mem1"
},
{
"name": "newBtree3"
},
{
"name": "newBtree3_cloned"
}
]
}open - TM is in RAM, Available on disk - TM is not yet loaded from disk |
|
Create TM |
---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "name": "examle_tm", // this name would be used as filename for .TMD and .TMI files
{ "sourceLang": "bg-BG"} // should match lang in languages.xml
{"data": "base64_encoded_archive_see_import_in_internal_format"}
["loggingThreshold": 0]
}
this endpoint could work in 2 ways, like creation of new tm (then sourceLang is required and data can be skipped) or importing archived .tm(then sourceLang can be skipped, but data is required)it's possible to add memDescription in this stage, but this should be explored more if needed
Response example:Success:{
"name": "examle_tm",
}
TM already exists:
{
"ReturnValue": 7272,
"ErrorMsg": "::ERROR_MEM_NAME_EXISTS:: TM with this name already exists: examle_tm1; res = 0"
} |
|
|
---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } or alternatively data could be provided in non-base64 binary format as a file attached to the request |
| curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.gz;filename=myfile.tg" \ -F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ http://t5memory:4045/t5memory |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as version mismatch. Instead export tmx in corresponding version and create new empty tm and import tmx in new version. This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder If there are "data" provided, no "sourceLang" required and vice versa - base64 data should be base64 encoded .tm file(which is just archive that contains .tmd and .tmi files If there are no "data" - new tm would be created, "sourceLang" should be provided and should be match with lang in languages.xml In 0.6.20 and up data could be send as attachment instead of base64 encoded. Content-type then should be set to "multipart/form-data" and then json(with name of new tm) should be provided with json_data key(search is made this way: part.headers.at("Content-Disposition").find("name=\"json_data\"") curl command example : curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.tm;filename=myfile.tm" \ -F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ http://t5memory:4045/t5memory Response example:{ "name": "examle_tm" }
Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example:{ "name": "mem_internal_format", "data":"UEsDBBQACAgIAPmrhVQAAAAAAAAAAAAAAAAWAAQAT1RNXy1JRDE3NS0wXzJfNV9iLk1FTQEAAADtzqEKgDAQgOFTEHwNWZ5swrAO0SBys6wfWxFBDILv6uOI2WZQw33lr38GbvRIsm91baSiigzFEjuEb6XHEK\/myX0PXtXsyxS2OazwhLDWeVTaWgEFMMYYY\/9wAlBLBwhEWTaSXAAAAAAAAAAACAAAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAA7d3Pa5JxHMDxz+Ns09phDAYdPfaDyQqWRcYjS9nGpoYZhBeZMCISW2v2g5o6VkqQONk\/0KVzh4IoKAovnboUo1PHbuuwU8dSn8c9Pk2yTbc53y+R5\/P9fL7P1wf5Ps9zep5vIOy3iMiSiPLn0yPrQ7In+rStTQARi\/bV9chEyHcxGPIKAGDnPonl21SsHNmUYNgfHZ70nnKNDo9ET0dHozFn2L+Ll9uxZPzazPz1mYQAAAAAAAAAAAAAAAAAAAAAAAAAANDtBkXRoj5Zk7OqSFZ9q35Vn6khNa6W2wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdBKbKHK4Em1omT5DxV6J7FrmkKFypBKt9FczvYaKtr+2DLpiqPTWVayGiq2uYjFUpC7VI6aElN8F8JPn\/QEAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2ANW7U0Ag9Iv60MnT4j8uLBZ\/X5+7dxn1ztX6Uy5AgAAAAAAAAAAAAAAAAAAgA6nL1qFjmc1rAO2IwNN9bL9u4ulVUeEfcQqQAfxSNtltshZaytB7jalZZ2a5KhFGT3Qr\/ztv1pkzAnP1v06+F7UxL22tRzSNf6aFq08MdoiY078\/znmkTZo5Qm2YdoOSLSyDdbaVUop\/Cj3cDm14I6\/uqf++nDUN1u4lS+k9MbKXL4QK72+775U+phOpp8sucdK728X5nK5hVT+weJqbTiHjMiNzWG1yNxWvI8rvxZ9cTfycj71NH1nsZgbf54uJlKryWy6GFlueBT6xHrzJRupDqkPXc9eyyduJmbLkf6\/mlYRDgQDPtO++3\/uYvsazANfYHx68vLEsSvOKedxqa\/hAGowD4Jh\/1X\/dH1X5sEBZpoH6E6\/AVBLBwj3gRyzjAIAAAAAAAAAAAEAAAAAAFBLAwQUAAgICAD5q4VUAAAAAAAAAAAAAAAAFgAEAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAA7d3PS9NhHMDxz\/Y1nbp0zfw2Vw6CEjooJkkFPs9DZZaFCiIRHRxKoJUIFXk06iB0kS5Fvw6dhDp28FDgOSqiIKQ\/ICQMhIIuYVnJt2f7eK2M2Ps1xp49b8Y+fP6ArXegJy4iV0RiPx6BNAXyT6ysrKhXlLZ49PwlkKP9hw\/19XcKAOD3PZX42+PDP0+JWN9AT765u3P33vbm1nxbvj0\/3DLQ0y3r5uClsZGhC2eGxgUAAAAAAAAAAAAAAAAAAAAAAAAAgFKXllh0ahQbLHeInDb3Xc6NWrF77Jibcr22zC2YY6bVLNoX5qp97Pa5SbPc8ci8sqHpd1k7a2+ZN+6eFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4YxISk8bVUyq6eVa905dtqtxO3fBlqyqnkrW+ZFVZCGp8aVDl9ZeELxlVjhRNsEWVa+UffAlVuf78rC\/1eoK20JfNqnzt3OhLnSp1DZW+bFJl\/467vqRUuVxV5UutKts\/JX2pUWUyXvie9OopE5U7QWEHSfWZXdmPvlSr8i75xJcqVT7fPOdLpSqj5+t9Sahy8UBhOxWqLEph6nJVHhZNvUFPXbS3MlXyYWFvgSon3xf2FldlpGiCmCoPiiYQVbLR3or\/ZT0tS04AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMC6K4t+ZSAtOWkKQpOSeTfnZty0m3CDrsu1uNB9swv2pZ21IlN23J6w1uZsuV0y82bOzJhpM2EGTZdpMaERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPjrUmteK0RypXifid5n1tyX6j7+9\/vvUEsHCGo104BhAgAAAAAAAAAAAQAAAAAAUEsBAgAAFAAICAgA912FVERZNpJcAAAAAAgAABYABAAAAAAAAAAAALSBAAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5NRU0BAAAAUEsBAgAAFAAICAgA\/F2FVPeBHLOMAgAAAAABABYABAAAAAAAAAAAALSBrAAAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUQBAAAAUEsBAgAAFAAICAgA\/F2FVGo104BhAgAAAAABABYABAAAAAAAAAAAALSBiAMAAE9UTV8tSUQxNzUtMF8yXzVfYi5UTUkBAAAAUEsGBiwAAAAAAAAAHgMtAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAANgAAAAAAAAAOQYAAAAAAABQSwYHAAAAABEHAAAAAAAAAQAAAFBLBQYAAAAAAwADANgAAAA5BgAAAAA=" }
//you can skip data if you send it as attachment, but then set content-type to multipart/form-data and send json with json_body key
//
TM already exists:
{
"ReturnValue": 65535,
"ErrorMsg": ""
} |
|
Clone TM localy |
---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) Code Block |
---|
language | js |
---|
title | Response |
---|
collapse | true |
---|
| Request example
{ "newName": "examle_tm" // when cloning, cloned tm would be renamed to this name(source tm is in url)
}
Response example:
Success:
{
"msg": "newBtree3_cloned2 was cloned successfully",
"time": "5 ms"
}
Failure:
{
"ReturnValue": -1,
"ErrorMsg": "'dstTmdPath' = /home/or/.t5memory/MEM/newBtree3_cloned.TMD already exists; for request for mem newBtree3; with body = {\n \"newName\": \"newBtree3_cloned\"\n}"
} |
|
Testing TCP backlog options |
---|
related to | issue T5TMS-281 |
most up-to-date version for this ticket is 0.6.75, where there are new flags and functionality to manipulate tcp stack. --http_listen_backlog, default was 1024, in 0.6.75 it's 128, suppose to set tcp backlog for proxygen server, but seems like in reality it's just a hint, because requests over that limit is not dropping, except of timeout --add_premade_socket - this is used to create socket and bind it to proxygen server instead of just providing ip address tot the server to open socket inside, should be set to true to enable, log_tcp_backog_events and socket_backlog flags --log_tcp_backog_events if set to true allow to test tcp backog, for that also recomended to set --v=2 --t5loglevel=4. Require add_premade_socket to be set to true. You would see then in logs behaviour of tcp backlog --socket_backlog is simillar to http_listen_backlog, but for socket. But this require add_premade_socket to be set to true --limit_num_of_active_requests, this would limit num of requests that could be handled at the same time in a way, when only n-1 of n created worker threads could be executed at the same time. last one would send 503 error and message that service is busy. I think that it make sense to play with num of worker threads and measure performance, for example try service with 32 threads on 8 cores. in that case service would handle properly 31 thread but 32nd would be responded with error. --debug_sleep_in_request_run just make sleep n microseconds(1/1000000 s) in every requests to artificially slow them down. to test behaviour of tcp backlog you can use attached python script via command: python(3) sendNrequests4.py -n 40 this would send 40 request on default local t5memory address feel free to edit script if needed flags for to test tcp backlog you can set --add_premade_socket=1 --t5loglevel=4 --v=2 --debug_sleep_in_request_run=10000000 --log_tcp_backog_events=true --log_every_request_end=1 --log_every_request_start=1 --http_listen_backlog=4 --socket_backlog=2 and other flags as you wish This would make every request at least 10 sec longer, every tcp backlog action would be logged, and also start and end of request handler execution, proxygens http tcp backog would be set to 4(or set it to some other value), and sockets backlog to 2 add_premade_socket is required to set sockets backlog and also tcp backlogs event logs.
other approach is to set docker containers environment, but seems like it's also just a hint and could be ignored by os in docker-compose.yaml: myt5m: image: translate5/t5memory:0.6.75 sysctls: net.core.somaxconn: 1 net.ipv4.tcp_max_syn_backlog: 1 net.ipv4.tcp_abort_on_overflow: 1 ports: - '4086:4086'
Code Block |
---|
language | py |
---|
title | sendNRequests.py |
---|
collapse | true |
---|
| import asyncio
import aiohttp
import argparse
import time
import traceback
async def fetch(session, url, request_id):
try:
async with session.get(url, timeout=60) as response:
text = await response.text()
if response.status != 200:
print(f"Request {request_id}: Error with status {response.status}. Response:")
print(text)
else:
print(f"Request {request_id}: Success with status {response.status}")
return response.status, text
except Exception as e:
print(f"Request {request_id}: Exception occurred: {e}")
traceback.print_exc() # Print the full traceback for the exception
return e # Return the exception for further handling
async def main(num_requests, url, delay):
async with aiohttp.ClientSession() as session:
tasks = []
for i in range(num_requests):
tasks.append(asyncio.create_task(fetch(session, url, i)))
if delay > 0:
await asyncio.sleep(delay)
results = await asyncio.gather(*tasks, return_exceptions=True)
success_count = 0
failure_count = 0
for idx, result in enumerate(results):
if isinstance(result, Exception):
failure_count += 1
print(f"Request {idx} raised an exception: {result}")
else:
status, text = result
if status is None or status != 200:
failure_count += 1
print(f"Request {idx}: Failed. Status: {status}. Response: {text}")
else:
success_count += 1
print(f"\nTotal successes: {success_count}")
print(f"Total failures: {failure_count}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Send multiple HTTP GET requests concurrently with an optional delay between requests"
)
parser.add_argument("-n", "--num_requests", type=int, default=200,
help="Number of parallel requests to send (default: 200)")
parser.add_argument("-u", "--url", type=str, default="http://127.0.0.1:4080/t5memory",
help="URL to send requests to (default: http://127.0.0.1:4080/t5memory)")
parser.add_argument("-d", "--delay", type=float, default=0.1,
help="Delay in seconds between starting each request (default: 0.1)")
args = parser.parse_args()
asyncio.run(main(args.num_requests, args.url, args.delay))
|
|