Logo von nextlevels
Hey!
Back to the wiki

Fine-Tuning

Fine tuning (fine-tuning or retraining) in the field of artificial intelligence refers to the targeted further training of an already pre-trained model - such as a large language model (LLM) - with additional, specialised data. The initial model has already acquired general skills (language, knowledge, reasoning) in a complex, expensive pre-training programme. During fine-tuning, this model is further adapted with a comparatively small, curated data set so that it fulfils a specific task better, hits a desired tone of voice or masters the specialist vocabulary of an industry. So you're not starting from scratch, but refining an existing foundation.

How fine-tuning works

During fine-tuning, the internal parameters (weights) of the model are adjusted based on the new sample data. Typically, the training data set consists of pairs of input and desired output - such as customer enquiries and ideal responses. The model learns from these examples to shift its behaviour in the desired direction. Because the complete retraining of all billions of parameters is very computationally intensive, resource-saving methods have become established, above all LoRA (Low-Rank Adaptation) and related methods of "Parameter-Efficient Fine-Tuning". Only a small number of additional parameters are trained, while the basic model remains largely frozen - this significantly reduces costs and computing requirements.

Differentiation: embeddings

Embeddings are fundamentally different from fine-tuning and are often confused with it. An embedding is a numerical representation - a vector of many numbers - that maps the meaning of a text, image or product in a mathematical space. Contents with similar meanings lie close to each other in this space. Embeddings do not change the model; they are the tool for making similarity and meaning measurable. They are used for semantic searches, recommendations and - particularly important - for Retrieval Augmented Generation (RAG), in which a model retrieves relevant documents at runtime and includes them in its response without being retrained.

Fine-tuning, RAG or prompting - when to do what

One of the most common practical questions is: should you fine-tune a model, build a RAG system or simply instruct it skilfully (prompting)? The three approaches solve different problems:

ApproachSuitable forchanging the model
PromptingFast adaptation of behaviour and format without effortNo
RAG / EmbeddingsIntegrate current, factual domain knowledge (e.g. own product catalogue)No
Fine-tuningConsistently anchor style, tone of voice, format, specialised behaviourYes

An important rule of thumb: Fine-tuning is good at teaching the model a certain behaviour or style, but bad at providing it with up-to-date factual knowledge. If you want to "teach" a model your own changing product catalogue, you are almost always better off with RAG - the facts are stored there in a searchable knowledge base and can be updated without having to retrain the model. Fine-tuning is worthwhile if a consistent style, a fixed response format or specialised behaviour is required that cannot be reliably achieved using a prompt.

A concrete example

A retailer wants AI customer service that firstly responds in the brand's own friendly, concise tone and secondly always knows the current product range and delivery times. The right solution combines both worlds: The tone of voice and response format are anchored by fine-tuning (or often already by good prompting); the current product and delivery knowledge comes via RAG from a constantly updated knowledge base. If you tried to "burn" the product range into the model using fine-tuning, it would be outdated the next time the product range was changed - a classic and expensive mistake.

Costs, risks and limits

Fine-tuning is not a sure-fire success. It requires a high-quality, representative training data set - bad data leads to bad behaviour. There is a risk of "catastrophic forgetting", where the model loses general capabilities due to retraining. There are also costs for training, versioning and re-tuning when requirements change. And a fine-tuned model needs to be maintained: If a better base model appears, the work may have to be done again. For these reasons, the pragmatic order applies: first prompting, then RAG, and fine-tuning only if neither is sufficient.

Frequent misunderstandings

Firstly: Fine-tuning is not the standard way to give a model "its own data" - RAG is usually more suitable for this. Secondly, fine-tuning does not generally make a model "smarter", but more specialised; it can become better in one place and worse in another. Thirdly, embeddings and fine-tuning are often lumped together, although they do completely different things - embeddings measure meaning, fine-tuning changes the model. Fourthly, more training data is not automatically better; quality and fit beat quantity.

Outlook

With increasingly capable base models and larger context windows, the boundary is shifting: many things that used to require fine-tuning can now be solved with good prompting or RAG. At the same time, efficient processes such as LoRA make fine-tuning cheaper and more accessible. For most companies, the recommendation remains to start with the easier tools and use fine-tuning specifically where consistent, specialised behaviour brings real added value. The English Wikipedia article on fine-tuning in deep learning provides a technical classification of the term

.

FAQ

What is fine-tuning simply explained
The retraining of a finished AI model with its own specialised sample data so that it can better master a certain task or style.

What is the difference between fine-tuning and embeddings?
Fine-tuning changes the model itself. Embeddings are numerical representations of meaning that do not change the model, but make similarity measurable - for example for search and RAG.

Should I use fine-tuning or RAG?
For up-to-date factual knowledge (e.g. your own product catalogue), usually RAG. Fine-Tuning for consistent style, tone of voice or fixed behaviour. A combination is often ideal.

What is LoRA?
LoRA (Low-Rank Adaptation) is a resource-saving fine-tuning method that only trains a few additional parameters and leaves the basic model largely unchanged - this saves computing power and costs.

Further reading