Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: documented np tags

...

TM files structure and other related info

Info below is actual for version 0_5_x

Starting from version 0_5_0 .mem file is excluded from TM files - tm now consists only with .tmd and .tmi files. That files have 2kb headers which have some useful information, like creation date and version in which that file was created. In general, changing mid_version number means binary incompatible files. During reorganize there would be created new empty tm and then segments would be reimported from previous, and then old files would be deleted and new ones would be renamed to replace old files. That means that reorganize would also update creation t5memory version of files to the newest.


TM file is just archive with tmi and tmd files. 

tmd and tmi files should be flushed in a safe way - saved on disk with temporary filename and then replacing old files.(Should be implemented)

There is tmmanager(as singletone) which have list of tm, and one tm instance have two binary trees(for both (tmd)data and (tmi)index files), with each have own filebuffer instance(before there used to be a pool of filebuffers and it's files operation functions, like write, read, close and open was handling requests). 

Request handler - it's an instance of class in request handler hierarhy classes. For each type of requests there is class to handle it. In general it have private functions "parseJSON"(would parse json if provided and would return error if json is invalid), "checkData"(whould check if all required fields was provided), "requestTM"(would request readOnly, write or service tm handlers. It would load tm if it is not loaded in RAM yet) and "execute" - original requests code. And also it has public function "run" which is stategy template to operate listed private function. 

The TMs is saved in TMManager using smart pointers(it's pointer which track references to itself and call destructor automaticaly). That means that on request it's possible to clear list from some TM, while it would still be active in other thread(like in fuzzy search). Then ram would be freed at the end of last request handling that TM.
In case if in the middle of some request(like fuzzy search) there was a call to delete tm, first we clear TMlist(but we keep smart pointer in fuzzy requests thread, so this is not calling destructor yet, but would after fuzzy request would be done).  Destructor would try to flush filebuffer into filesystem but because there is no files in the disk, filebuffers would not create them again and it would just clean the RAM(in that case log would be writen about filebuffer flush not founding file in the folder).  

From TMManager, request could ask for one of 3 types of tm handers - readonly, write or service. ReadOnly\write requests here have it's name from inside-tm perspective(so operations with tm files in filesystem is service requests).
ReadOnly(concordance search, fuzzy search, exportTmx) would be provided if there is no write handlers, for write handlers(deleteEntry, updateEntry, importTmx) there should be no other write handlers and no readOnly handlers. Service handlers could mean different for different requests. For example status request should be able to access something like readonly handler, but it shouldn't be blocked if there is any write requests, since it's used for checking import\reorganize status and progress. For some filesystem requests(deleteTM, createTM, cloneTM, importTM, exportTM(internal format)) there should be other blocking mechanism, since most of them even doesn't require to load tm into the ram. 

 In case if tm is not in RAM, requesting handler from TMManager would try to load TM into the RAM, considering RAM limit explained in this document. 


TAG REPLACEMENT

NUMBER PROTECTION TAGS (NP TAG, t5:n)

NP Feature is also implemented in tagReplacer, but it has other branch in code - for import it's just saves original id, r and n attributes, without generating new, for fuzzy requests it's just outputs original data without searching for mathing tag in src and trg. So NP tags is influence ID generation for other tags(or matching if it's trg segment). 
For fuzzy requests TagReplacer would use GenerateNormalizedString to generate copy of string for src and input(from fuzzy request) where NP tags would be replaced with their r attribute(to be equal to 1 word in match) and then in fuzzy calculation other tags would be removed.
So 

"Press the <t5:n id="5" r="encodedRegex" n="25th of 2043"/>, power button to turn on <bpt x="501" i="1"/>text<ept i="1"/>
 for fuzzy requests would give you

 "Press the encodedRegex, power button to turn on <bpt id="501" rid="1"/>text<ept rid="1"/>"
first and for fuzzy calculation it would become:
 "Press the encodedRegex, power button to turn on text"

and saved segment would be:
"Press the  <t5:n id="5" r="encodedRegex" n="2nd of 1999"/>, power button  to turn on the <bpt id="501" rid="1"/>text<ept rid="1"/>"  
would become
"Press the encodedRegex, power button  to turn on the text"  
And fuzzy match would give you 92% because it counted 13 words and 1 diff.  [(13-1)/13 = 0.92 ]

Tag replacement
Pseudocode for tag replacement in import call: 

TAG_REPLACEMENT PSEUDO CODE

...