Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Values | |
---|---|
%service% | Name of service(default - t5memory, could be changed in t5m3mory.conf file |
%tm_name% | Name of Translation Memory |
Example | http://localhost:4040/t5memory/examle_tm/fuzzysearch/? |
Endpoints overview | default endpoint/example | Is async? | ||||
---|---|---|---|---|---|---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ | |
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ | |
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ | |
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url | |
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status | |
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status | |
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch | |
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch | |
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry | |
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete | |
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms | |
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown | |
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement | |
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources | |
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search | |
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry | |
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ | /t5memory/tm1/ ("multipart/form-data") | |
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx | |
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm | |
26 | Flush tm | If tm is open, flushes it to the disk(implemented in 0.6.33) | GET | /%service%/%tm_name%/flush | /t5memory/tm1/flush | |
27 | Flags | Return all available commandline flags(implemented in 0.6.47). Do not spam too much because gflags documentation says that that's slow. Useful to collect configuration data about t5memory to do debugging. | GET | /%service%_service/flags | /t5memory_service/flags |
Available end points
List of TMs | |
---|---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app. |
Create TM | |
---|---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Create/Import TM in internal format | |
---|---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder Starting from 0.6.52 import in internal format supporst multipart/form data, so you can send then both file and json_body. In json_body only "name" attribute is required(sourceLang would be ignored anyway). Send it in a same way as streaming import TMX. Json body should be in pretty formatting and in a part called json_body to be parsed correctly. |
Clone TM localy | |
---|---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) |
Flush TM | |
---|---|
Purpose | If TM is open, flushes it to the disk |
Request | Get /%service%/%tm_name%/flush |
Params | |
Endpoint is sync(blocking) If tm is not found on the disk - returns 404 If tm is not open - returns 400 with message Then t5m requests write pointer to the tm(so it waits till other requests that's working with the tm would finish) and then it flushes it to the disk Could also return an error if flushing got some issue. Would not open the tm, if it's not opened yet, but instead would return an error. |
Delete TM | |
---|---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
Import provided base64 encoded TMX file into TM | |
---|---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/import |
Params | {"tmxData": "base64EncodedTmxFile" }
|
TM must exist Handling if framing tag situation differs from source to target - for skipAll or skipPairedIf framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed. |
Overview and API introduction
In this document the translate5 TM service REST interface is described.
The translate5 TM service is build by using the OpenTM2 Translation Memory Engine.
It provides the following functionality:
- import new openTM2-TMs
- delete openTM2-TMs
- create new empty openTM2-TM
- import TMX
- open TM and close TM: not possible see extra section in this document. Maybe we need trigger to flush tm to the disk, but also it could be done in some specific cases...
- query TM for Matches: one query per TM, not quering multiple TMs at once.
- query TM for concordance search
- extract segment by it's location
- save new entry to TM
- delete entry from TM
- localy clone TM
- reorganize TM
- get some statistics about service
- also you can use tagreplacement endpoint to test tag replacement mechanism
This can be achieved by the following specification of a RESTful HTTP Serive, the specification is given in the following form:
- URL of the HTTP Resource, where servername and an optional path prefix is configurable.
- HTTP Method with affected functionality
- Brief Description
- Sent and returned Body.
Request Data Format:
The transferred data in the requests is JSON and is directly done in the request body. It's should be pretty json and ends with '\n}" symbol, because of bug in proxygen that caused garbage after valid data.
URL Format:
In this document, the OpenTM2 is always assumed under http://opentm2/.
To rely on full networking features (proxying etc.) the URL is configurable in Translate5 so that the OpenTM2 instance can also reside under http://xyz/foo/bar/.
Errors
For each request, the possible errors are listed below for each resource. In case of an error, the body should contain at least the following JSON, if it is senseful the attributes of the original representation can be added.
{
errors: [{errorMsg: 'Given tmxData is no TMX.'}]
}
Values | |
---|---|
%service% | Name of service(default - t5memory, could be changed in t5m3mory.conf file |
%tm_name% | Name of Translation Memory |
Example | http://localhost:4040/t5memory/examle_tm/fuzzysearch/? |
Endpoints overview | default endpoint/example | Is async? | ||||
---|---|---|---|---|---|---|
1 | Get the list of TMs | Returns JSON list of TMs | GET | /%service%/ | /t5memory/ | |
2 | Create TM | Creates TM with the provided name | POST | /%service%/ | /t5memory/ | |
3 | Create/Import TM in internal format | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM files. Rename it to provided name | POST | /%service%/ | /t5memory/ | |
4 | Clone TM Localy | Makes clone of existing tm | POST | /%service%/%tm_name%/clone | /t5memory/my+TM/clone (+is placeholder for whitespace in tm name, so there should be 'my TM.TMD' and 'my TM.TMI'(and in pre 0.5.x 'my TM.MEM' also) files on the disk ) tm name IS case sensetive in url | |
5 | Reorganize TM | Reorganizing tm(replacing tm with new one and reimporting segments from tmd) - async | GET | /%service%/%tm_name%/reorganize | /t5memory/my+other_tm/reorganize | + in 0.5.x and up |
5 | Delete TM | Deletes .TMD, .TMI files | DELETE | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
6 | Import TMX into TM | Import provided base64 encoded TMX file into TM - async | POST | /%service%/%tm_name%/import | /t5memory/%tm_name%/import | + |
7 | Export TMX from TM | Creates TMX from tm. Encoded in base64 | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/ | |
8 | Export in Internal format | Creates and exports archive with .TMD, .TMI files of TM | GET | /%service%/%tm_name%/ | /t5memory/%tm_name%/status | |
9 | Status of TM | Returns status\import status of TM | GET | /%service%/%tm_name%/status | /t5memory/%tm_name%/status | |
10 | Fuzzy search | Returns entries\translations with small differences from requested | POST | /%service%/%tm_name%/fuzzysearch | /t5memory/%tm_name%/fuzzysearch | |
11 | Concordance search | Returns entries\translations that contain requested segment | POST | /%service%/%tm_name%/concordancesearch | /t5memory/%tm_name%/concordancesearch | |
12 | Entry update | Updates entry\translation | POST | /%service%/%tm_name%/entry | /t5memory/%tm_name%/entry | |
13 | Entry delete | Deletes entry\translation | POST | /%service%/%tm_name%/entrydelete | /t5memory/%tm_name%/entrydelete | |
14 | Save all TMs | Flushes all filebuffers(TMD, TMI files) into the filesystem | GET | /%service%_service/savetms | /t5memory_service/saveatms | |
15 | Shutdown service | Flushes all filebuffers into the filesystem and shutting down the service | GET | /%service%_service/shutdown | /t5memory_service/shutdown | |
16 | Test tag replacement call | For testing tag replacement | POST | /%service%_service/tagreplacement | /t5memory_service/tagreplacement | |
17 | Resources | Returns resources and service data | GET | /%service%_service/resources | /t5memory_service/resources | |
18 | Import tmx from local file(in removing lookuptable git branch) | Similar to import tmx, but instead of base64 encoded file, use local path to file | POST | /%service%/%tm_name%/importlocal | /t5memory/%tm_name%/importlocal | + |
19 | Mass deletion of entries(from v0.6.0) | It's like reorganize, but with skipping import of segments, that after checking with provided filters combined with logical AND returns true. | POST | /%service%/%tm_name%/entriesdelete | /t5memory/tm1/entriesdelete | + |
20 | New concordance search(from v0.6.0) | It's extended concordance search, where you can search in different field of the segment | POST | /%service%/%tm_name%/search | /t5memory/tm1/search | |
21 | Get segment by internal key | Extracting segment by it's location in tmd file. | POST | /%service%/%tm_name%/getentry | /t5memory/tm1/getentry | |
22 | NEW Import tmx | Imports tmx in non-base64 format | POST | /%service%/%tm_name%/importtmx | /t5memory/tm1/tmporttmx | + |
23 | NEW import in internal format(tm) | Extracts tm zip attached to request(it should contains tmd and tmi files) into MEM folder | POST | /%service%/%tm_name%/ | /t5memory/tm1/ ("multipart/form-data") | |
24 | NEW export tmx | Exports tmx file as a file. Could be used to export selected number of segments starting from selected position | GET (could be with body) | /%service%/%tm_name%/download.tmx | /t5memory/tm1/download.tmx | |
25 | NEW export tm (internal format) | Exports tm archive | GET | /%service%/%tm_name%/download.tm | /t5memory/tm1/download.tm |
Available end points
List of TMs | |
---|---|
Purpose | Returns JSON list of TMs |
Request | GET /%service%/ |
Params | - |
Returns list of open TMs and then list of available(excluding open) in the app. |
Create TM | |
---|---|
Purpose | Creates TM with the provided name(tmd and tmi files in/MEM/ folder) |
Request | Post /%service%/%tm_name%/ |
Params | Required: name, sourceLang |
Create/Import TM in internal format | |
---|---|
Purpose | Import and unpack base64 encoded archive of .TMD, .TMI, .MEM(in pre 0.5.x versions) files. Rename it to provided name |
Request | POST /%service%/ |
Params | { "name": "examle_tm", "sourceLang": "bg-BG" , "data":"base64EncodedArchive" } or alternatively data could be provided in non-base64 binary format as a file attached to the request |
curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.gz;filename=myfile.tg" \ -F "json_data={\"name\": \"TM name\", \"sourceLang\": \"en-GB\"}" \ http://t5memory:4045/t5memory | |
Do not import tms created in other version of t5memory. Starting from 0.5.x tmd and tmi files has t5memory version where they were created in the header of the file, and different middle version(0.5.x) or global version(0.5.x) would be represented as This would create example_tm.TMD(data file) and example.TMI(index file) in MEM folder In 0.6.20 and up data could be send as attachment instead of base64 encoded. Content-type then should be set to "multipart/form-data" and then json(with name of new tm) should be provided with json_data key(search is made this way: part.headers.at("Content-Disposition").find("name=\"json_data\"") curl command example : curl -X POST \ |
Clone TM localy | |
---|---|
Purpose | Creates TM with the provided name |
Request | Post /%service%/%tm_name%/clone |
Params | Required: name, sourceLang |
Endpoint is sync(blocking) |
Delete TM | |
---|---|
Purpose | Deletes .TMD, .TMI, .MEM files |
Request | Delete /%service%/%tm_name%/ |
Params | - |
Import binary TMX file into TM | |
---|---|
Purpose | Import provided base64 encoded TMX file into TM. Starts another thead for import. For checking import status use status call |
Request | POST /%service%/%tm_name%/importtmx |
Params | Request has a file attached and a body as an option, Implemented in 0.6.19 curl -X POST \ -H "Content-Type: application/json" \ -F "file=@/path/to/12434615271d732fvd7te3.tmx;filename=myfile.tmx" \ -F "json_data={\"framingTags\": \"value\", \"timeout\": 1500}" \ http://t5memory:4045/t5memory/{memory_name}/importtmx Body should be provided in multiform under json_data key { ["framingTags": "saveAll"], // framing tags behaviour [timeout: 100] // timeout in sec after which import stops, even if it doesn't reach end of tmx yet }
|
TM must exist TMX import could be interrupted in case of invalid XML or TM reaching it's limit or timeout. For both cases check status request to have info about position in tmx file where it was interrupted. Handling if framing tag situation differs from source to target - for skipAll or skipPairedIf framing tags situation is the same in source and target, both sides should be treated as described above. If framing tags only exist in source, then still they should be treated as described above. If they only exist in target, then nothing should be removed. |
Reorganize TM | |
---|---|
Purpose | Reorganizes tm and fixing issues. |
Request | GET /%service%/%tm_name%/reorganize |
Headers | Accept - applicaton/xml |
up to v0.4.x reorganize is sync, so t5memory reorganize would check this condition
, and in case if this condition is true and then it passes segment to putProposal function, which is also used by UpdateRequest and ImportTmx request, so other
{ |
Export TMX from TM - old | |
---|---|
Purpose | Creates TMX from tm. |
Request | GET /%service%/%tm_name%/ |
Headers | Accept - applicaton/xml |
|
Export TMX from TM | |
---|---|
Purpose | Exports TMX from tm. |
Request | GET /%service%/%tm_name%/download.tmx |
Headers | Accept - applicaton/xml |
curl | curl --location --request GET 'http://localhost:4040/t5memory/{MEMORY_NAME}/download.tmx' \ --header 'Accept: application/xml' \ --header 'Content-Type: application/json' \ --data '{"startFromInternalKey": "7:1", "limit": 20}' |
Could have body with this fields startFromInternalKey - in "recordKey:targetKey" format sets starting point for import loggingThreshold- as in other requests in response in headers you would get NextInternalKey: 19:1 - if exists next item in memory else the same as you send. So you could repeat the call with new starting position. If no body provided, export starts from the beginning (key 7:1) to the end. This endpoint should flush tm before execution |
Export in internal format | |
---|---|
Purpose | Creates and exports archive with .TMD, .TMI files of TM |
Request | GET /%service%/%tm_name%/download.tm |
Headers | application/zip |
returns archive(.tm file) consists with .tmd and .tmi files |
Export in internal format - OLD | |
---|---|
Purpose | Creates and exports archive with .TMD, .TMI, .MEM files of TM |
Request | GET /%service%/%tm_name%/ |
Headers | application/zip |
returns archive(.tm file) consists with .tmd and .tmi files |
Get the status of TM | |
---|---|
Request | GET /%service%/%tm_name%/status |
Params | - |
Would return status of TM. It could be 'not found', 'available' if it's on the disk but not loaded into the RAM yet, and 'open' with additional info. In case if there was at least one try to import tmx or reorganize tm since it was loaded into the RAM, additional fields would appear and stay in the statistics till memory would be unloaded. |
Fuzzy search | |
---|---|
Purpose | Returns enrties\translations with small differences from requested |
Request | POST /%service%/%tm_name%/fuzzysearch |
Params | Required: source, sourceLang, targetLang iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5' |
{ |
New Concordance search | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Purpose | Returns entries\translations that fits selected filters. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Request | POST /%service%/%tm_name%/search | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Params | Required: NONE iNumOfProposal - limit of found proposals - max is 200, if 0 → use default value '5' | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Search is made segment-by segment, and it's checking segment if it fits selected filters. You can search for EXACT or CONCORDANCE matches in this fields: "Filters": " It's possible to apply filter just with SearchMode, like if you would type "authorSearchMode": "exact",but there would be no "author" field, it would look for segments, where author field is empty. "timestampSpanStart": "20000121T115234Z", You should set both parameters to apply filter, otherwise you would get error as return. Check output to see how it was parsed and applied. "logicalOr": 1, Instead of returning segments, just count them and return counter in "NumOfFoundSegments": 22741 "sourceLang":"en-GB", Lang filters could be applied with major lang feature, so source lang in this case would be applied as exact filter for source lang, but target lang would check if langs is in the same lang group. That check is done in languages.xml file with isPreferred flag. "GlobalSearchOptions": "SEARCH_FILTERS_LOGICAL_OR|SEARCH_EXACT_MATCH_OF_SRC_LANG_OPT, lang = en-GB|SEARCH_GROUP_MATCH_OF_TRG_LANG_OPT, lang = de", Other that you can send is: "searchPosition": "8:1", So search position is position where to start search internaly in btree. This search is limited by num of found segment(set by numResults) or timeout(set by msSearchAfterNumResults), but timeout would be ignored in case if there are no segments in the tm to fit params. Max numResults is 200. from responce. Here is search request with all possible parameters: "source":"the", "sourceSearchMode":"CONTAINS, CASEINSENSETIVE, WHITESPACETOLERANT, INVERTED", "target":"", "targetSearchMode":"EXACT, CASEINSENSETIVE", "document":"evo3_p1137_reports_translation_properties_de_fr_20220720_094902", "documentSearchMode":"CONTAINS, INVERTED", "author":"some author", "timestampSpanStart": "20000121T115234Z", "timestampSpanEnd": "20240121T115234Z", "addInfo":"some add info", "addInfoSearchMode":"CONCORDANCE, WHITESPACETOLERANT", "context":"context context", "contextSearchMode":"EXACT", "sourceLang":"en-GB", "targetLang":"SV", "searchPosition": "8:1", "numResults": 2, "msSearchAfterNumResults": 25, So request with this body would also work:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Concordance search | |
---|---|
Purpose | Returns entries\translations that contain requested segment |
Request | POST /%service%/%tm_name%/concordancesearch |
Params | Required: searchString - what we are looking for , searchType ["Source"|"Target"|"SourceAndTarget"] - where to look iNumOfProposal - limit of found proposals - max is 20, if 0 → use default value '5' |
Get entry | |
---|---|
Purpose | Returns entry that located in [recordKey:targetKey] location or error if it's empty |
Request | POST /%service%/%tm_name%/getentry |
Params | Required: recordKey- it's position in the tmd file, starting from 7(first 6 it's service records) targetKey - position in record, starting from 1 Implemented in 0.6.24 |
Update entry | |
---|---|
Purpose | Updates entry\translation |
Request | POST /%service%/%tm_name%/entry |
Params | Only sourceLang, targetLang, source and target are required |
This request would made changes only in the filebuffer(so files on disk would not be changed) |
Delete entry | |
---|---|
Purpose | Deletes entry\translation |
Request | POST /%service%/%tm_name%/entrydelete |
Params | 2 ways - by id or regular 1) by id - 3 integers should be provided - recordKey, targetKey and segmentId. After deletion, tmd is rearranged, so that's why we should use segmentId - it's pseudo unique key. It's generated during import tmx, or when inserting segment without providing id, but if it's provided in update call or during reorganize when segment's id is not 0, it would be used instead of generating new. If id is not matching, t5memory would not delete the segment. if keys are provided, other provided fields would be ignored. All 3 keys should be provided in request to delete a segment. if segment is deleted, it's fields would be returned in response ps: recordKey and targetKey together forming internalKey, in [recordKey:targetKey] format(like 7:1 - first segment) 2)regular - old. Only sourceLang, targetLang, source, and target are required Deleting based on strict match(including tags and whitespaces) of target and source |
This request would made changes only in the filebuffer(so files on disk would not be changed) |
Delete entries / mass deletion | |
---|---|
Purpose | Deletes entries\translation |
Request | POST /%service%/%tm_name%/entriesdelete |
Params | This would start reorganize process which would remove like reorganize bad segments and also would remove segments that gives true when checking with provided filters combined with logical AND. So if you provide timestamps and addInfo, only segments within provided timestamp and with that addInfo would not be imported to new TM(check reorganize process). |
Save all TMs | |
---|---|
Purpose | Flushes all filebuffers(TMD, TMI files) into the filesystem. Reset 'Modified' flags for file buffers. Filebuffer is a file instance of .TMD or .TMI loaded into RAM. It provides better speed and safety when working with files. |
Request | GET /%service%_service/savetms |
Params | - |
Shutdown service | |
---|---|
Purpose | Safely shutting down the service with\without saving all loaded tm files to the disk |
Request | GET /%service%_service/shutdown?dontsave=1 |
Params | dontsave=1(optional in address) - skips saving tms, for now value doesn't matter, only presence |
If try to save tms before closing, would check if there is still import process going on |
Test tag replacement call | |
---|---|
Purpose | Updates entry\translation |
Request | POST /%service%_service/tagreplacement |
Params | Required: src, trg, Optional: req |
Configuration of service
You can configure the service in ~/.t5service/t5memory.conf
Logging | ||
---|---|---|
Level | Mnemonic | Description |
0 | DEVELOP | could make code work really slow, should be used only when debugging some specific places in code, like binary search in files, etc. |
1 | DEBUG | logging values of variables. Wouldn't delete temporary files(In MEM and TMP subdirectories), like base64 encoded\decoded tmx files and archives for import\export |
2 | INFO | logging top-level functions entrances, return codes, etc. Default value. |
3 | WARNING | logging if we reached some commented or hardcoded code. Usually commented code here is replaced with new code, and if not, it's marked as ERROR level |
4 | ERROR | errors, why and where something fails during parsing, search, etc |
5 | FATAL | you shouldn't reach this code, something is really wrongOther values would be ignored. The set level would stay the same till you change it in a new request or close the app. Logs suppose to be written into a file with date\time name under ~/.OtmMemoryService/Logs and errors/fatal are supposed to be duplicated in another log file with FATAL suffices |
6 | TRANSACTION | - Logs only things like begin\end of request etc. No purpose to setup this hight |
Logging could impact application speed very much, especially during import or export. In t5memory there are 2 systems of logs - one from glog library and could be set in launch as commandline parameter and one is internal to filter out logs based on their level, can be set with every request that have json body with additional ["loggingThreshold": 0] parameter or at startup with flag. POST http://localhost:4040/t5memory/example_tm/ { Or in t5memory.conf file in line (config file is obsolete now) |
Working directory | |
---|---|
Path | Description |
~/.t5memory | The main directory of service. Should always be under the home directory. Consists of nested folders and t5memory.conf file(see Config file). All directories\files below are nested |
LOG | lIncludes log files. It should be cleanup manualy. One session(launch of service) creates two files Log_Thu May 12 10:15:48 2022 .log and Log_Thu May 12 10:15:48 2022 .log_IMPORTANT |
MEM | Main data directory. All tm files is stored here. One TM should include .TMD(data file), .TMI(index file), .MEM(properties file) with the same name as TM name |
TABLE | Services reserved readonly folder with tagtables, languages etc. |
TEMP | For temporary files that were created for mainly import\export. On low debug leved(DEVELOP, DEBUG) should be cleaned manualy |
t5memory.conf | Main config file(see config file) |
Config directory should be located in a specific place |
Config file - obsolete - use commandline flags instead | ||
---|---|---|
field | default | Description |
name | t5memory | name of service that we use under %service% in address |
port | 8080 | service port |
timeout | 3600 | service timeout |
threads | 1 | |
logLevel | 2 | logLevel - > see logging |
AllowedRAM_MB | 1500 | Ram limit to operate openning\closing TM(see Openning and closing TM) Doesn't include services RAM |
TriplesThreshold | 33 | Level of pre-fuzzy search filtering based on combinations of triples of tokens(excluding tags). Could impact fuzzy search perfomance. For higher values service is faster, but could skip some segments in result. Not always corelated with resulted fuzzyRate |
Config file should be located under ~/.t5memory/t5memory.conf Anyway, all field has default values so the service could start without the conf file Reading\applying configs happen only once at service start Once service started you should be able to see setup values in logs. |
Conceptional information
Openning and closing TM | |
---|---|
In first concept it was planned to implement routines to open and close a TM. While concepting we found some problemes with this approach:
This leads to the following conclusion in implementation of opening and closing of TMs: OpenTM2 has to automatically load the requested TMs if requested. Also OpenTM2 has to close the TMs after a TM was not used for some time. That means that OpenTM2 has to track the timestamps when a TM was last requested.
http://opentm2/translationmemory/[TM_Name]/openHandle GET – Opens a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time. http://opentm2/translationmemory/[TM_Name]/openHandle DELETE – Closes a memory for queries by OpenTM2 Note: This method is not required as memories are automatically opened when they are accessed for the first time.
|
Multithreading | |
---|---|
In 0.6.44 multithreading are implemented this way
{ "tmMutexTimeout": 5000, "tmListMutexTimeout": 4000, "requestTMMutexTimeout": 15000, ... } | |
Mutexes and request handling details:
} else { // request that doesn't require lock, so it's in general without tmName in URL + status request res = execute(); } } return res;
{ TimedMutexGuard l {mutex_access_tms, tmListTimeout, "tmListMutex"};// lock tms list returnError(".Failed to lock tm list:"); return false; } // if TM mutex is failed in some nested function, tmListTimeout would be marked as spoiled so every other mutex that would use that timeout would be failed after first fail. in execution boolean function would return false, but also check if mutex was spoiled is needed to find out if function returned false because it didn't find (in this case) tm in list, or because its timeout is spoiled. But all checks is placed in code for now.
bool TMManager::IsMemoryFailedToLoad(strMemName, tmListTimeout){ { tmListTimeout.addToErrMsg(".Failed to lock tm list:"); return false; } if(IsMemoryInList(strMemName, tmListTimeout) { res = true; } if(tmListTimeout.failed()) { // if timeout was spoiled, errMessage would be extended with new, so you would have backtrace with functions and lines in the file in the outputting message tmListTimeout.addToErrMsg(".Failed to lock tm list:"); return false; } return res; and tm list could be used not only when just requesting tm, but, for example, for resource request, or to free some space, or flush tm during shutdown. Regarding Load call is outside of mutex_requestTM mutex, so it wouldn't be blocked in current version. Openning(loading) of the tm files is happening outside of the So an active mutex to the tm list would still block every request then, also to other TMs, rigth? How long will it be blocked for example, if I import a new TMX into one TM? The whole time the TMX is imported? And same question for update? no, it wouldn't be blocked whole time for another tm's blocking tm is necessary because of big chunk of low-level code which exists in opentm2 which operated with pointers to the memory (RAM) for a long time |
TM files structure and other related info | ||
---|---|---|
Info below is actual for version 0_5_x TM file is just archive with tmi and tmd files. |
NUMBER PROTECTION TAGS (NP TAG, t5:n) | ||
---|---|---|
NP Feature is also implemented in tagReplacer, but it has other branch in code - for import it's just saves original id, r and n attributes, without generating new, for fuzzy requests it's just outputs original data without searching for mathing tag in src and trg. So NP tags is influence ID generation for other tags(or matching if it's trg segment). "Press the encodedRegex, power button to turn on <bpt id="501" rid="1"/>text<ept rid="1"/>" |
Tag replacement
Pseudocode for tag replacement in import call:
TAG_REPLACEMENT PSEUDO CODE
This is the pseudo code, that was used as a discussion base for finding the right algorithm for implementation. It was not exactly implemented like this, but it's logic should be valid and can be used to understand, what should be going on.
Pseudocode for tag replacement in import call:
TAG_REPLACEMENT PSEUDOCODE
struct TagInfo
{
bool fPairTagClosed = true; // false for bpt tag - waiting for matching ept tag. If we'll find matching tag -> we'll set this to true
bool fTagAlreadyUsedInTarget = false; // would be set to true if we would already use this tag as matching for target
// this we generate to save in TM. this would be saved as <{generated_tagType} [x={generated_x}] [i={generated_i}]/>.
// we would skip x attribute for generated_tagType=EPT_ELEMENT and i for generated_tagType=PH_ELEMENT
int generated_i = -1; // for pair tags - generated identifier to find matching tag. the same as in original_i if it's not binded to other tag in segment
int generated_x = -1; // id of tag. should match original_x, if it's not occupied by other tags
TagType generated_tagType = UNKNOWN_ELEMENT; // replaced tagType, could be only PH_ELEMENT, BPT_ELEMENT, EPT_ELEMENT
// this cant be generated, only saved from provided data
int original_i = -1; // original paired tags i
int original_x = -1; // original id of tag
TagType original_tagType = UNKNOWN_ELEMENT; // original tagType, could be any tag
};
}
TagType could be one of the values in enum:
[
BPT_ELEMENT EPT_ELEMENT G_ELEMENT HI_ELEMENT SUB_ELEMENT BX_ELEMENT EX_ELEMENT
//standalone tags
BEGIN_STANDALONE_TAGS PH_ELEMENT X_ELEMENT IT_ELEMENT UT_ELEMENT
]
we use 3 lists of tags
SOURCE_TAGS
TARGET_TAGS
REQUEST_TAGS
as id we understand one of following attributes(which is present in original tag) : 'x', 'id'
as i we understand one of following attributes(which is present in original tag) : 'i', 'rid'
all single tags we understand as ph_tag
all opening pair tags we understand as bpt_tag
all closing pair tags we understand as ept_tag
-1 means that value is not found/not used/not provided etc.
for ept tags in generated_id we would use generated_id from matching bpt tag
if matching bpt tag is not found -> ???
TagType could be set to one of following values
TAG REPLACEMENT USE CASES {
IMPORT{
SOURCE_SEGMENT{
<single tags> -> would be saved as <ph>{ // for ph and all single tags
if(type == "lb"){
replace with newline
}else{
generate next generated_id incrementally
ignore content and attributes(except id) if provided
set generated_tagType to PH_ELEMENT
save original_tagType for matching
if id provided -> save as original_id for matching
save tag to SOURCE_TAGS
}
}
<opening pair tags> -> would be saved as <bpt>{
original type is <bpt>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
save original_i (should that always be provided??)
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set original_type as BPT_ELEMENT
save tag to SOURCE_TAGS
}
original type is <bx>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
save original_i (should that always be provided??)
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set original_type as BX_ELEMENT
save tag to SOURCE_TAGS
}
original type is other openning pair tags(like <g>){
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
save tag type as original_tagType;
save tag to SOURCE_TAGS
}
}
<closing pair tags> -> would be saved as <ept>{
original type is <ept>{
search for matching bpt_tag in saved tags
//should we look in reverse order?
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == BPT_ELEMENT
AND matchingTag.original_i == our_ept_tag.original_i
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_ept_tag.i to matchingTag.i
set our_ept_tag.id to matchingTag.id
else
generate next our_ept_tag.generated_i incrementally in source segment // in every segment(target, source, request) i starts from 1
generate next our_ept_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in SOURCE_TAGS
}
original type is <ex>{
search for matching bpt_tag in saved tags
//should we look in reverse order?
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs has BPT_ELEMENT here
AND matchingTag.original_tagType == BX_ELEMENT
AND matchingTag.original_i == our_ept_tag.original_i
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_ept_tag.i to matchingTag.i
set our_ept_tag.id to matchingTag.id
else
generate next our_ept_tag.generated_i incrementally in source segment // in every segment(target, source, request) i starts from 1
generate next our_ept_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in SOURCE_TAGS
}
original type is others closing pair tags(like </g>){
search for matching bpt_tag in saved tags:
looking in SOURCE_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == our_tag.original_tagType
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else
generate next our_tag.generated_i incrementally in source segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in SOURCE_TAGS
}
}
}
TARGET_SEGMENT{
<single tags> -> would be saved as <ph>{ // for ph and all single tags
if(type == "lb"){
replace with newline
}else{
ignore content and attributes(except id) if provided
save original_tagType for matching
if id provided -> save as original_id for matching
search for matching ph_tag in saved tags
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fTagAlreadyUsedInTarget == false
AND matchingTag.generated_tagType == PH_ELEMENT //SINGLE TAG
AND matchingTag.original_tagType == our_ph_tag.original_tagType
AND matchingTag.original_id == our_ph_tag.original_id
]
if found
set matchingTag.fTagAlreadyUsedInTarget = true
set our_ph_tag.generated_id = matchingTag.generated_id // use id generated for source segment
else
generate new our_ph_tag.generated_id incrementally(should be unique for SOURCE and TARGET)
save tag in TARGET_TAGS // we should track only opening pair tags in target, so theoretically can skip this step
}
}
<opening tags> -> would be saved as <bpt>{
original type is <bpt>{
set generated_tagType to BPT_ELEMENT
save original_i (should that always be provided??)
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set original_type as BPT_ELEMENT
try to found matching source tag to get generated id:
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fTagAlreadyUsedInTarget == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == BPT_ELEMENT
AND matchingTag.original_id == our_bpt_tag.original_id
]
if found:
set matchingTag.fTagAlreadyUsedInTarget to true
generate our_bpt_tag.generated_i incrementally in target segment
set our_bpt_tag.generated_id to matchingTag.generated_id
else:
generate our_bpt_tag.generated_i incrementally // unique between all segments
generate our_bpt_tag.generated_id incrementally // unique between all segments
save tag in TARGET_TAGS
}
original type is <bx>{
set generated_tagType to BPT_ELEMENT
save original_i (should that always be provided??)
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set original_type as BX_ELEMENT
try to found matching source tag to get generated id:
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fTagAlreadyUsedInTarget == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == BX_ELEMENT
AND matchingTag.original_id == our_bpt_tag.original_id
]
if found:
set matchingTag.fTagAlreadyUsedInTarget to true
generate our_bpt_tag.generated_i incrementally in target segment
set our_bpt_tag.generated_id to matchingTag.generated_id
else:
generate our_bpt_tag.generated_i incrementally // unique between all segments
generate our_bpt_tag.generated_id incrementally // unique between all segments
save tag in TARGET_TAGS
}
original type is other openning pair tags(like <g>){
set generated_tagType to BPT_ELEMENT
we never have here original i attribute
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
save original_type
try to found matching source tag to get generated id:
looking in SOURCE_TAGS for matchingTag which have [
matchingTag.fTagAlreadyUsedInTarget == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == our_tag.original_tagType
AND matchingTag.original_id == our_tag.original_id
]
if found:
set matchingTag.fTagAlreadyUsedInTarget to true
generate our_tag.generated_i incrementally in target segment
set our_tag.generated_id to matchingTag.generated_id
else:
generate our_tag.generated_i incrementally // unique between all segments
generate our_tag.generated_id incrementally // unique between all segments
save tag in TARGET_TAGS
}
}
<closing tags> -> would be saved as <ept>{
original type is <ept>{
try to found matching bpt tag in TARGET_TAGS
looking in TARGET_TAGS for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == BPT_ELEMENT
AND matchingTag.original_i == our_tag.original_i
]
if found:
set matchingTag.fPairTagClosed to true
set our_tag.generated_id to matchingTag.generated_id
set our_tag.generated_i to matchingTag.generated_i
else:
generate our_tag.generated_i incrementally // unique between all segments
generate our_tag.generated_id incrementally // unique between all segments
save tag in TARGET_TAGS // we should track only opening pair tags in target, so theoretically can skip this step
}
original type is <ex>{
try to found matching bpt tag in TARGET_TAGS
looking in TARGET_TAGS for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all OPENING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == BX_ELEMENT
AND matchingTag.original_i == our_tag.original_i
]
if found:
set matchingTag.fPairTagClosed to true
set our_tag.generated_id to matchingTag.generated_id
set our_tag.generated_i to matchingTag.generated_i
else:
generate our_tag.generated_i incrementally // unique between all segments
generate our_tag.generated_id incrementally // unique between all segments
save tag in TARGET_TAGS // we should track only opening pair tags in target, so theoretically can skip this step
}
original type is others closing pair tags(like </g>){
search for matching bpt_tag in saved tags:
looking in TARGET_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == our_tag.original_tagType
]
if found:
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else :
generate next our_tag.generated_i incrementally in target segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in TARGET_TAGS // we should track only opening pair tags in target, so theoretically can skip this step
}
}
}
}
}
Tag replacement for fuzzy request pseudocode:
TAG_REPLACEMENT PSEUDOCODE
struct TagInfo
{
bool fPairTagClosed = true; // false for bpt tag - waiting for matching ept tag. If we'll find matching tag -> we'll set this to true
bool fTagAlreadyUsedInTarget = false; // would be set to true if we would already use this tag as matching for target
// this we generate to save in TM. this would be saved as <{generated_tagType} [x={generated_x}] [i={generated_i}]/>.
// we would skip x attribute for generated_tagType=EPT_ELEMENT and i for generated_tagType=PH_ELEMENT
int generated_i = -1; // for pair tags - generated identifier to find matching tag. the same as in original_i if it's not binded to other tag in segment
int generated_x = -1; // id of tag. should match original_x, if it's not occupied by other tags
TagType generated_tagType = UNKNOWN_ELEMENT; // replaced tagType, could be only PH_ELEMENT, BPT_ELEMENT, EPT_ELEMENT
// this cant be generated, only saved from provided data
int original_i = -1; // original paired tags i
int original_x = -1; // original id of tag
TagType original_tagType = UNKNOWN_ELEMENT; // original tagType, could be any tag
};
}
we use 3 lists of tags
SOURCE_TAGS
TARGET_TAGS
REQUEST_TAGS
as id we understand one of following attributes(which is present in original tag) : 'x', 'id'
as i we understand one of following attributes(which is present in original tag) : 'i', 'rid'
all single tags we understand as ph_tag
all opening pair tags we understand as bpt_tag
all closing pair tags we understand as ept_tag
-1 means that value is not found/not used/not provided etc.
for ept tags in generated_id we would use generated_id from matching bpt tag
if matching bpt tag is not found -> ???
TagType could be set to one of following values
TAG REPLACEMENT USE CASES {
REQUEST{
basically we convert request segment to tmx tags(similar as we generate ph, bpt and ept tags at import), but with saving original data
then we try to find matching tags from the source to generated from the request. In matching source tags we replace data with original from request(tagType, id and i attributes)
then do the same with target segment\tags
REQUEST_SEGMENT{
are we sending only xliff? so ph, bpt and ept tag shouldn't be handled here?
<single tags> { // for ph and all single tags
// here we can have PH, X, IT, UT tags, right?
generate generated_id incrementally
set generated_tagType to PH_ELEMENT
save original_id if provided (should that always be provided??)
save tag type as out_tag.original_tagType
save tag in REQUEST_TAGS
}
<opening tags> {
//this would be never send from translate5, right?
original type is <bpt>{
save tag in REQUEST_TAGS
}
original type is <bx>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
save original_i (should that always be provided??)
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set fTagAlreadyUsedInTarget to false;
set original_type as BX_ELEMENT
save tag to REQUEST_TAGS
}
original type is <g>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
we don't have original_i provided here, only original_id, right?
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set fTagAlreadyUsedInTarget to false;
set original_type as G_ELEMENT
save tag in REQUEST_TAGS
}
original type is <hi>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
we don't have original_i provided here, only original_id, right?
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set fTagAlreadyUsedInTarget to false;
set original_type as HI_ELEMENT
save tag in REQUEST_TAGS
}
original type is <sub>{
generate generated_i incrementally in source segment
generate generated_id incrementally
set generated_tagType to BPT_ELEMENT
we don't have original_i provided here, only original_id, right?
save original_id if provided (should that always be provided??)
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
set fTagAlreadyUsedInTarget to false;
set original_type as HI_ELEMENT
save tag in REQUEST_TAGS
}
}
<closing tags> {
//this would be never send from translate5, right?
original type is <ept>{
save tag in REQUEST_TAGS
}
original type is <ex>{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == BX_ELEMENT // our_tag.original_tagType
AND matchingTag.original_i == our_tag.original_i
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else
generate next our_tag.generated_i incrementally in request segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in REQUEST_TAGS
}
original type is </g>{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == G_ELEMENT // our_tag.original_tagType
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else
generate next our_tag.generated_i incrementally in request segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in REQUEST_TAGS
}
original type is </hi>{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == HI_ELEMENT // our_tag.original_tagType
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else
generate next our_tag.generated_i incrementally in request segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in REQUEST_TAGS
}
original type is </sub>{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //OPENING_PAIR_TAG
AND matchingTag.original_tagType == SUB_ELEMENT // our_tag.original_tagType
]
if found
set matchingTag.fPairTagClosed to true to eliminate matching one opening tag for different closing tags
set our_tag.generated_i to matchingTag.i
set our_tag.generated_id to matchingTag.id
else
generate next our_tag.generated_i incrementally in request segment // in every segment(target, source, request) i starts from 1
generate next our_tag.generated_id incrementally // should be unique across target, source and request segments
save tag in REQUEST_TAGS
}
}
}
!!!CONSIDER THAT WE SHOULD HAVE IN SOURCE SEGMENT ONLY 3 TYPES OF TAGS - PH_ELEMENT, BPT_ELEMENT and EPT_ELEMENT, because all of them was regenerated with their attributes at import stage
At this point we read the source and target segments "as is", without any tag replacement in lists. so original_id would be id, that was generated_id at import stage.
SOURCE_SEGMENT{
<ph x="1" />{
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
matchingTag.generated_tagType == PH_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" />
else
maybe just return <x/> tag?
save tag in SOURCE_TAGS
}
<bpt i="1" x="2"/> {
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.generated_tagType == BPT_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
set our_tag.generated_i = matchingTag.original_i
if matchingTag.original_tagType == BX_ELEMENT // do BX_ELEMENT always have id and rid attributes provided?
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
else:
[rid="{our_tag.generated_id}"] - means optional, so for example if it's bigger than 0, then we should add this attribute
use that that data to generate tag like <our_tag.generated_tagType [id="{our_tag.generated_id}"] [rid="{our_tag.generated_id}"] >
else
maybe just return <bx/> tag?
save tag in SOURCE_TAGS
}
<ept i="1" /> {
search for matching tag in saved tags:
looking in REQUEST_TAGS in REVERSE for matchingTag which have
[ matchingTag.generated_tagType == EPT_ELEMENT //or our_tag.original_tagType
AND matchingTag.generated_id == our_tag.original_id // id should hold information about paired BPT_ELEMENT, or it's absence
]
if found
set our_tag.generated_tagType = matchingTag.original_tagType
set our_tag.generated_id = matchingTag.original_id
set our_tag.generated_i = matchingTag.original_i
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
if matchingTag.original_tagType == EX_ELEMENT // do EX_ELEMENT always have id and rid attributes provided?
use that that data to generate tag like <our_tag.generated_tagType id="{our_tag.generated_id}" rid="{our_tag.generated_id}" />
else:
[rid="{our_tag.generated_id}"] - means optional, so for example if it's bigger than 0, then we should add this attribute
use that that data to generate tag like </our_tag.generated_tagType>
else
maybe just return <ex/> tag? or add some specific attributes?
save tag in SOURCE_TAGS
}
}
}
NEW PSEUDO CODE
This is the code, actually implemented
Tag replacement feature implementation is splited into 2 functions:
GenerateReplacingTag - input - tagType, attributeList
output - tagInfo
this function would generate tagInfo data structure that saves original data(tagType, attributes(i\rid and x\id only) and would generate new data that suits context\segment
PrintTag - input - tagInfo
- output - text representation of tag with attributes depending on context
this function would print tag with attributes(if they exist(bigger than 0). If it's fuzzy call, would replace for source and target segments tags with matching tags from fuzzy search request.
If matching tag not found - would generate new tag in xliff format with id or rid attributes that rising starting from biggest id and rid values +1 that was present in requested segment
for fuzzy search request segment this function would pring tag with generated data - that is never used in production, but can be used to find out how mechanism normalized input fuzzy search request segment
(we base tag matching on this normalization.)
////////////////////////////////////
struct TagInfo
{
bool fPairTagClosed = true; // false for bpt tag - waiting for matching ept tag. If we'll find matching tag -> we'll set this to true
bool fTagAlreadyUsedInTarget = false; // would be set to true if we would already use this tag as matching for target
// this we generate to save in TM. this would be saved as <{generated_tagType} [x={generated_id}] [i={generated_i}]/>.
// we would skip x attribute for generated_tagType=EPT_ELEMENT and i for generated_tagType=PH_ELEMENT
int generated_i = -1; // for pair tags - generated identifier to find matching tag. the same as in original_i if it's not binded to other tag in segment
int generated_id = -1; // id of tag. should match original_id, if it's not occupied by other tags
TagType generated_tagType = UNKNOWN_ELEMENT; // replaced tagType, could be only PH_ELEMENT, BPT_ELEMENT, EPT_ELEMENT
// this cant be generated, only saved from provided data
int original_i = -1; // original paired tags i
int original_id = -1; // original id of tag
TagType original_tagType = UNKNOWN_ELEMENT; // original tagType, could be any tag
};
}
TagType could be one of the values in enum:
[
BPT_ELEMENT EPT_ELEMENT G_ELEMENT HI_ELEMENT SUB_ELEMENT BX_ELEMENT EX_ELEMENT
//standalone tags
BEGIN_STANDALONE_TAGS PH_ELEMENT X_ELEMENT IT_ELEMENT UT_ELEMENT
]
We make normalization process to tags which means to replace original xliff\tmx tags\attributes with only 3 tags:
<ph x='1' />
<bpt x='2' i='1' />
<ept i='1' />
which means that we would regenerate id\x in source, target and request segments to make them unified
for source\target segments this replacement is done at import process, for fuzzy search request we do tag replacement, then look for matches between source and request segments(this happens in PringTag function), then replace tag from source with original tag that was in request
then we do the same with target segment - we try to find matches of target tags with generated tags in request, and then replace tags in target
with original tags from fuzzy search request
for example, we have this segments in import process
'source':"Select the <hi>net<ph/>work <g>BLK360</g> tag </hi>",
'target':"Select the <hi>net<ph/>work <g>BLK360</g> tag </hi>",
after tag replacement we would have this saved in tm:
'source' :'Select the <bpt x="1" i="1"/>net<ph x="2"/>work <bpt x="3" i="2"/>BLK360<ept i="2"/> tag <ept i="1"/>',
'target' :'Select the <bpt x="1" i="1"/>net<ph x="2"/>work <bpt x="3" i="2"/>BLK360<ept i="2"/> tag <ept i="1"/>',
then if we would have fuzzy request call with segment:
"Select the <g>net<x/>work <g>BLK360</g> tag </g>"
after normalization we would get this:
"Select the <bpt x="1" i="1"/>net<ph x="2"/>work <bpt x="3" i="2"/>BLK360<ept i="2"/> tag <ept i="1"/>"
and then we would try to find matching tags in source and normalized request segments and in case of match-replace tag in src with original from fuzzy search request and then do the same with target and request
in response we should have:
'source' :'Select the <g>net<x/>work <g>BLK360</q> tag </g>',
'target' :'Select the <g>net<x/>work <g>BLK360</q> tag </g>',
////////////////////////////TagReplacer class//////////////////////
tag normalization statements:
- all single tags we understand as ph_tag that have only x attribute, and looks like this: "<ph x="1"/>"
- all opening pair tags we understand as bpt_tag that always have both i and x attributes, and looks like this: "<bpt x="1" i="1"/>"
- all closing pair tags we understand as ept_tag that always have only i attribute looks like this: "<ept i="1"/>"
- we ignore/skip context within <bpt> and </bpt> and replace this with single <bpt/> type tag, same is true for <ph/> and <ept/>
- as id we understand one of following attributes(which is present in original tag) : 'x', 'id'
- as i we understand one of following attributes(which is present in original tag) : 'i', 'rid'
TagReplacer{
// lists of tagInfo
SOURCE_TAGS
TARGET_TAGS
REQUEST_TAGS
activeSegment //could be one of following SOURCE_SEGMENT(default value), TARGET_SEGMENT, REQUEST_SEGMENT. Tells us how we should handle tag replacement
iHighestI = 0; // increments with each opening pair tags
iHighestId = 0; // increments with each tag
fFuzzyRequest = false; // flag, that tracks if we are dealing with import or fuzzy request. Tells us how we should handle tag replacement
//to track id and i attributes in request and then generate new values for tags in srt and trg that is not matching
iHighestRequestsOriginalI = 0; // during saving original data of tags in request segment we save here biggest original I and Id,
iHighestRequestsOriginalId = 0; // and in case if we couldn't find match in source segment, we would generate xliff tag([bx, ex ,x] or can we left [bpt, ept, ph]?)
// with using and incrementing this values
//functions
// during parsing of tags by xercesc we call this function to
// - collect and save original tagType and attributes(only 'id' and 'i')
// - generate normalized tag data
// - find matches between TARGET and SOURCE segment tags. If we have match - use generated data from SOURCE, if not - generate new unique data
// - save generated tags in lists depends on activeSegment value
// - returns tagInfo data structure
GenerateReplacingTag(tagType, attributes);
//accepts tagInfo data
//depending on fFuzzyRequest and activeSegment values just prints generated normalized tags with generated attributes
// or try to find match for tag from SOURCE\TARGET to REQUEST and print matching tag from REQUEST, or, if no matched, generate new xliff tag with unique attributes
PrintTag(tagInfo);
};
TagInfo{
fPairedTagClosed = false; // flag, set to false for bpt/ept tag - waiting for matching ept/bpt tag
fTagAlreadyUsedInTarget = false; // flag, that we use only when we save tags from source segment and then try to match\bind them in target
generated_i = 0; // for pair tags - generated identifier to find matching tag. the same as in original_i if it's not binded to other tag in segment
generated_id = 0; // id of tag. should match original_id, if it's not occupied by other tags
generated_tagType = UNKNOWN_ELEMENT; // replaced tagType, could be PH_ELEMENT, BPT_ELEMENT, EPT_ELEMENT
original_i = 0; // original paired tags i
original_id = 0; // original id of tag
original_tagType; // original tagType
};
//////////////////////////////////////////////////
GenerateReplacingTag{
SOURCE_SEGMENT/REQUEST_SEGMENT
//we handle SOURCE and REQUEST segments here the same way, but we
// use variables activeTagList, that should point to SOURCE_TAGS or REQUEST_TAGS
// to make code more generic
{
<single tags> -> would be saved as <ph>{ // for ph and all single tags
if(type == "lb"){
replace with newline
}else{
save original_tagType
save original_id if provided
if it's REQUEST_SEGMENT AND original_id > iHighestRequestsOriginalId
save original_id as new iHighestRequestsOriginalId
set generated_tagType to PH_ELEMENT
set fPairedTagClosed to true
generate generated_id incrementally ( increment iHighestId value, then use it )
save tag to activeTagList // SOURCE_TAGS or REQUEST_TAGS
}
}
<opening pair tags> -> would be saved as <bpt>{
save original_i if provided
save original_id if provided
save original_tagType
set generated_tagType to BPT_ELEMENT
//save biggest id and i attributes in request original data to generate new values
// that wouldn't overlap with other tags in case we wouldn't have matches
if it's REQUEST_SEGMENT AND original_i > iHighestRequestsOriginalI
save original_i as new iHighestRequestsOriginalI
if it's REQUEST_SEGMENT AND original_id > iHighestRequestsOriginalId
save original_id as new iHighestRequestsOriginalId
originalTagTypeToFind = UNKNOWN_ELEMENT // use this variable to identify which tag type we are looking for
if generated_tagType is BPT_ELEMENT
set originalTagTypeToFind to EPT_ELEMENT
else if generated_tagType is BX_ELEMENT
set originalTagTypeToFind to EX_ELEMENT
else
skip search, because other tags could never have wrong order between opening and closing tags
that would be error in <xml> and parser would throw INVALID_XML error then
if originalTagTypeToFind is not UNKNOWN_ELEMENT
try to find matching ept tag in this segment
looking in REVERSE order in activeTagList for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == EPT_ELEMENT //all CLOSING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == originalTagTypeToFind
AND matchingTag.original_i == our_bpt_tag.original_i
]
if mathingTag found
set generated_i to mathingTag.generated_i
set generated_id to -mathingTag.generated_id // EPT_TAGS have negative id's/x's that is equal to matching -bpt.x
// if there are no matching bpt, ept have unique, but still negative value.
// negative values and 0 would never be printed in PrintTag
set fPairTagClosed to true
set matchingTag.fPairTagClosed to true
else
generate generated_i incrementally ( increment iHighestI value, then use it )
generate generated_id incrementally ( increment iHighestId value, then use it )
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
save tag to activeTagList
}
<closing pair tags> -> would be saved as <ept>{
save original_i if provided
save original_id if provided
save original_tagType
set generated_tagType to EPT_ELEMENT
//save biggest id and i attributes in request original data to generate new values
// that wouldn't overlap with other tags in case we wouldn't have matches
if it's REQUEST_SEGMENT AND original_i > iHighestRequestsOriginalI
save original_i as new iHighestRequestsOriginalI
if it's REQUEST_SEGMENT AND original_id > iHighestRequestsOriginalId
save original_id as new iHighestRequestsOriginalId
originalTagTypeToFind = UNKNOWN_ELEMENT // use this variable to identify which tag type we are looking for
if generated_tagType is EPT_ELEMENT
set originalTagTypeToFind to BPT_ELEMENT
else if generated_tagType is EX_ELEMENT
set originalTagTypeToFind to BX_ELEMENT
else
skip search, because other tags could never have wrong order between opening and closing tags
that would be error in <xml> and parser would throw INVALID_XML error then
if originalTagTypeToFind is not UNKNOWN_ELEMENT
try to find matching ept tag in this segment
looking in REVERSE order in activeTagList for matchingTag which have [
matchingTag.fPairTagClosed == false
AND matchingTag.generated_tagType == BPT_ELEMENT //all CLOSING PAIR TAGs always has BPT_ELEMENT here
AND matchingTag.original_tagType == originalTagTypeToFind
AND matchingTag.original_i == our_ept_tag.original_i
]
if mathingTag found
set generated_i to mathingTag.generated_i
set generated_id to -mathingTag.generated_id // EPT_TAGS have negative id's/x's that is equal to matching -bpt.x
// if there are no matching bpt, ept have unique, but still negative value.
// negative values and 0 would never be printed in PrintTag
set fPairTagClosed to true
set matchingTag.fPairTagClosed to true
else
generate generated_i incrementally ( increment iHighestI value, then use it )
generate generated_id incrementally ( increment iHighestId value, then multiply it by *(-1) and use it )
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
save tag to activeTagList
}
}
TARGET_SEGMENT
//here we try to find connections from original Target tags to original Source tags and use data,
// that was generated for matching SOURCE tag. If there are no matching SOURCE tag - generate new unique attributes
{
save original_tagType
save original_id if provided
save original_i if provided
set generated_tagType to - PH_ELEMENT if we have single tag
- BPT_ELEMENT if we have opening pair tag
- EPT_ELEMENT if we have closing pair tag
try to find matching source tag
looking in SOURCE_TAGS for matchingSourceTag which have [
matchingSourceTag.fAlreadyUsedInTarget == false
AND matchingSourceTag.original_tagType == our_tag.original_tagType
AND matchingSourceTag.original_id == our_tag.original_id
]
if found:
set generated_i to matchingSourceTag.generated_i
set generated_id to matchingSourceTag.generated_id
// maybe we should add here search for matching ept\bpt tag in TARGET_TAGS, to set valid fPairTagClosed for both
set matchingSourceTag.fAlreadyUsedInTarget to true
else
if generated_tagType is PH_ELEMENT
set fPairTagClosed = true
else
use matchingTagOriginalType and matchingTagGeneratedType to find matching tag in TARGET_TAGS
if original_tagType is BPT_ELEMENT
set matchingTagOriginalType to EPT_ELEMENT
else if generated_tagType is BX_ELEMENT
set matchingTagOriginalType to EX_ELEMENT
else if original_tagType is EPT_ELEMENT
set matchingTagOriginalType to BPT_ELEMENT
else if generated_tagType is EX_ELEMENT
set matchingTagOriginalType to BX_ELEMENT
else
matchingTagOriginalType = original_tagType
if our_tag.generated_tagType = BPT_ELEMENT
set matchingTagGeneratedType to EPT_ELEMENT
else
set matchingTagGeneratedType to BPT_ELEMENT
try to find matching pair tag in this segment
looking in REVERSE order in TARGET for matchingPairTag which have [
matchingPairTag.fPairTagClosed == false
AND matchingPairTag.original_tagType == matchingTagOriginalType
AND matchingPairTag.generated_tagType == matchingTagGeneratedType
AND matchingPairTag.original_i == our_tag.original_i
]
if found:
set generated_i to mathingTag.generated_i
set generated_id to -mathingTag.generated_id // EPT_TAGS have negative id's/x's that is equal to matching -bpt.x
// if there are no matching bpt, ept have unique, but still negative value.
// negative values and 0 would never be printed in PrintTag
set fPairTagClosed to true
set matchingPairTag.fPairTagClosed to true
else:
if we dealing with pair tags -> generate generated_i incrementally ( increment iHighestI value, then use it )
generate generated_id incrementally ( increment iHighestId value, then multiply it by *(-1) and use it )
set fPairTagClosed to false; // it would be set to true if we would use this tag as matching
save tag in TARGET_TAGS
}
}
PrintTag{
variables: idToPrint = 0,
iToPrint = 0,
tagTypeToPrint = tag.generated_tagType
flags: fClosedTag = true; //for slash at the end of tags like <ph/>
fClosingTag = false; //for slash at the beginning of tag like </g>
if it's REQUEST_SEGMENT
// we need this only to track how tag replacement normalized tags in request segment
idToPrint = tag.generated_id
iToPrint = tag.generated_i
else
try to find matching request tag
looking in SOURCE_TAGS for matchingRequestTag which have [
matchingRequestTag.generated_id == our_tag.generated_i
AND matchingRequestTag.generated_tagType == our_tag.generated_tagType
]
if found:
set idToPrint to matchingRequestTag.original_id
set iToPrint to matchingRequestTag.original_i
set tagTypeToPrint to matchingRequestTag.original_tagType
set fClosingTag to tag.generated_tagType == EPT_ELEMENT
AND tagTypeToPrint != EPT_ELEMENT
AND tagTypeToPrint != EX_ELEMENT
else
//generate new id and i
generate idToPrint using iHighestRequestsOriginalId incrementally ( increment incrementally value and use it )
if generated_tagType is not PH_ELEMENT
//could be improved here if we need
generate iToPrint using iHighestRequestsOriginalI incrementally ( increment incrementally value and use it )
if fClosingTag is true
return ["</" + tagTypeToPrint + ">"]
else
output = ["<" + tagTypeToPrint]
if idToPrint > 0
if fFuzzyRequest is true:
append to output [' id="' + idToPrint + '"']
else
append to output [' x="' + idToPrint + '"']
if idToPrint > 0
if fFuzzyRequest is true:
append to output [' rid="' + iToPrint + '"']
else
append to output [' i="' + iToPrint + '"']
//tag that has slash at the end looks like this: <tag />
fClosedTag = tagTypeToPrint == BPT_ELEMENT OR
tagTypeToPrint == EPT_ELEMENT OR
tagTypeToPrint == PH_ELEMENT OR
tagTypeToPrint == BX_ELEMENT OR
tagTypeToPrint == EX_ELEMENT OR
tagTypeToPrint == X_ELEMENT ; // other tags could be only not closed(<g>) or closing(</g>)
if fClosedTag is true
append to output "/"
append to output ">"
return output
}
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Previous documentation: |
1 Comment
Marc Mittag