The plug-in enables you to create, train and use language resources based on Large Language Models (LLM) within translate5. It is possible to select from various GPT models, some of which can be trained, which serve as the basis for a language resource. These GPT language resources can then be trained with prompts and translation examples as needed. In addition, they can be fine-tuned via “Temperature”, “Top‑p”, “Presence” and “Frequency” parameters.
In translate5, the following sections are relevant for the management of GPT language resources:
Creation, fine-tuning and training of GPT language resources as well as prompt management are all available to project managers. |
The models available are continuously queried at OpenAI’s platform and therefore correspond to what is currently available there.
The models that you have available in your AzureCloud are available here.
A language resource based on a GPT model is created like any other language resource in the language resource overview:
The language resource is then created and will be visible immediately afterwards in the language resource overview.
Start typing in drop-down fields to find options more quickly. For languages, for example, you can type the ISO code: “de-de” will find “German (Germany) (de-DE)”. |
The following options are available for GPT language resources in the language resource overview:
Button | Explanation |
---|---|
Opens the “Edit language resource” window, but the basic settings can no longer be edited. However, clients can be added/removed for whom the language resource should be:
| |
Deletes the language resource. The deletion process must be completed by confirming it in a window that appears after clicking the button. | |
Opens the “Adjust OpenAI model” window, in which you can adjust various parameters to fine-tune the GPT language resource. | |
Opens the “Fine-tune OpenAI model”, via which you can train and test the GPT resource with prompts. |
This parameter determines how “random” or “creative” the language model should be when generating text. A low Temperature means that the model translates more objectively and predictably, while a higher temperature means that it can translate very creatively and therefore unpredictably.
The “Top P” parameter (also known as “nucleus sampling”) is a nuanced alternative to Temperature-based sampling. It is like a “spotlight” that emphasizes the most probable words. With a default value of 1.0 all words are taken into account. This parameter can help to control the distribution of word choice and thus ensure the relevance and coherence of the generated text.
Attention: If Temperature is set to a very high value, it is possible that the model will generate contradictory or meaningless content. | |
It is advisable to adjust either the Temperature or top P, but not both. | |
Please have a look at this page for further information on the two parameters Temperature and Top P. |
This parameter is used to encourage the model to include a wide range of tokens in the generated text. This is a value that is deducted from the log probability of a token each time it is generated. A high Presence penalty value means that the model tends to generate tokens that are not yet contained in the generated text.
This parameter is used to prevent the model from using the same words or phrases too often within the generated text. It is a value that is added to the log probability of a token each time it appears in the generated text. A high Frequency Penalty value means that the model is more careful when using recurring tokens.
Please have a look at this page for further information on the two parameters Presence Penalty and Frequency Penalty. |
A GPT model can only ever process a limited number of tokens. This maximum number of tokens includes both the sent and the returned tokens. In the case of a (pre-)translation, this includes the prompt(s), the text or batch to be translated and the returned translations. An appropriate ratio must be maintained so that there is enough “space” for the returned tokens. This setting is only relevant for batch translations, such as those used for pre-translation.
You can add one or more of the preconfigured prompt sets. | |
As the use of terminology during training does not lead to good results, no TermCollections can be added in translate5 for this purpose. Instead, we recommend that you add the desired TermCollections as usual when creating tasks so that GPT can then take the resource into account for pre-translation during import. |