llms.txt

05/26/2026

llms.txt is a proposed Markdown index that is located at the root of a domain (typically at https://beispiel.de/llms.txt) and is intended to tell language models and AI agents in compressed form which website content is particularly suitable for machine processing. Unlike the long-established robots.txt, which tells bots which paths they are allowed to crawl, llms.txt does not describe the access rights, but the content structure: Which documents are canonical, which Markdown versions exist, which order makes sense from a technical perspective? The proposal comes from Jeremy Howard (Answer.AI) and was published in September 2024.

Structure of a llms.txt

A llms.txt is a Markdown file with a clearly defined hierarchy. The first block contains the name of the website as an H1 heading and a short description as a blockquote. This is optionally followed by short introductory paragraphs in which the operator gives the models contextualising information, such as the brand, target group or special features of the content. The main part is made up of H2-headed sections in which links to the most important documents are grouped together. A variant called llms-full.txt extends the concept: it not only contains links, but also the complete Markdown text of all relevant pages so that a model can process the entire site without further HTTP requests. The format is deliberately kept lean because it should fit into the already tight context window of a language model.

What llms.txt is intended for

Classic websites are optimised for human readers: HTML with navigation, adverts, cookie banners and visual accents. This form is inefficient for a language model because a lot of token budget is used for layout, JavaScript and irrelevant markup. llms.txt reduces the website to its semantic core and provides a machine-friendly index map with which a model can find the actual response content more quickly and precisely. The format is particularly useful for documentation, technical knowledge bases and glossaries, because the separation of structure and content is clear there anyway.

Current distribution and adoption

As of 2026, llms.txt is not an official web standard. Neither the IETF nor the W3C have ratified the proposal, and none of the major AI providers, i.e. OpenAI, Anthropic, Google or Perplexity, have publicly committed to read llms.txt in production. According to estimates, the actual distribution is around 5 to 15 per cent among tech and documentation sites, with significantly higher adoption in the developer tools niche than in classic SME e-commerce. In practical terms, this means that a properly maintained llms.txt does not provide a measurable visibility boost today. It is more of an upfront investment in case one of the major providers should officially support it.

Relationship to robots.txt and Schema.org

llms.txt does not replace robots.txt or Schema.org. The three mechanisms serve different layers. The operator uses robots.txt to regulate which bots are allowed to visit which paths, i.e. the question of access. Schema.org markups provide structured data within a page, i.e. the question of semantics. llms.txt supplements these levels with a global view of the site, i.e. the question of architecture. If you maintain all three mechanisms properly, you cover the most important points of contact between the model and the website.

llms.txt in an e-commerce context

For classic shop setups, such as those based on Shopware, llms.txt is more of a hygiene requirement than a growth driver. Useful content for the file includes links to main categories, brand hubs, glossary pages, FAQ areas and editorial guides. Product pages do not usually belong here because their volume would blow up the context window and their topicality is better mapped via Schema.org and Sitemap. An additional llms-full.txt can be useful if a shop maintains a clearly defined editorial area, such as a topic blog or a knowledge database whose markdown sources are available anyway.

Frequent misunderstandings

There are several misunderstandings surrounding the proposal. Firstly, llms.txt is often misunderstood as an "anti-scraping lever", which it is not; the format does not communicate whether a bot is allowed to train or quote, but what it finds. Second, the file is often confused with the older ai.txt, a similar-sounding but standalone proposal with a different focus. Thirdly, many operators overestimate the immediate effect: a maintained llms.txt is only usable if the crawling models also read it, and this adoption is currently largely the responsibility of the providers, not the website operators. The pragmatic recommendation therefore remains to create the file, maintain it properly and synchronise it with your own sitemap and schema maintenance cycles, but not to expect a short-term ROI.

How to get started

A realistic start involves three steps. In the first step, a lean llms.txt is created with a brand name, short description and two to three H2 sections that link to the glossary, FAQ and central guides. In the second step, the operator checks whether an accompanying llms-full.txt makes sense; it is only worthwhile if the central Markdown content already exists or is easily derivable. In the third step, the file ends up in the build and deployment pipeline so that it grows automatically as new glossary entries or guides are created. This keeps the maintenance effort low and the site is prepared as soon as the major providers take the format seriously.

AI visibility: Can AI even find your website?

KI & Automation05/28/2026

Your top ranking on Google is useless if the AI response doesn't come from Google. How to check in a 1-line test whether AI reads your website at all - and the five pillars that will make you citation-worthy.