Companion files for a structured AI ingestion framework
Companion files for a structured AI ingestion framework
What These AI Files Are & Why They Matter
Over the last 20+ years, websites were optimized primarily for search engines like Google and Bing.
Today, we are entering a new phase:
- ChatGPT
- Perplexity
- Claude
- Gemini
- AI Overviews inside Google
- Voice assistants
- AI research agents
These systems do not “rank pages” the way Google traditionally did.
They retrieve, interpret, summarize, and cite content.
To prepare your website for this shift, we implement a structured AI ingestion framework.
What Are These Files?
We added a small set of structured files that act as a discovery and clarity layer for AI systems.
Think of them as A sitemap + instruction manual + structured index — specifically for AI systems.
- They do not change how your site looks.
- They do not affect user experience.
- They do not affect your CMS.
They simply make your site easier for AI systems to:
- Understand
- Retrieve
- Summarize
- Cite correctly
- Recommend
File Breakdown
llms.txt
This is the most important file.
It tells AI systems:
- Who the company is
- What services are offered
- What the main conversion actions are (call, request quote, book appointment, etc.)
- Which pages are authoritative
- Which pages should be cited first
- What AI systems should not invent (pricing, guarantees, medical claims, etc.)
Think of it as a professional briefing document for AI agents.
llms.json / llms.yaml
These contain the same information as llms.txt, but in structured machine-readable formats.
Some AI systems prefer structured data over plain text.
They increase:
- Parsing accuracy
- Citation reliability
- Retrieval precision
AI Ingestion Manifest
This file tells AI crawlers:
- Which pages are most important
- Which pages represent services
- Which pages represent conversion actions
- Which areas to prioritize
- Which areas to de-prioritize (admin, tag archives, etc.)
It helps AI systems understand what actually matters on this website.
Perplexity Ingestion Hints
Perplexity is one of the most aggressive AI citation engines right now.
This file:
- Encourages citation of specific service pages
- Prevents AI from hallucinating pricing or guarantees
- Guides answer structure
It increases:
- Citation probability
- Accuracy of summaries
- Proper linking
RAG Embeddings Index
RAG = Retrieval Augmented Generation.
This file provides:
- A structured list of service pages
- Page types
- Priority weighting
- Chunking recommendations
This allows AI systems to:
- Break content into logical segments
- Retrieve only relevant sections
- Avoid mixing services
- Avoid misinformation
This is extremely forward-thinking.
Vector Chunking Map
This defines:
- How pages should be broken into semantic sections
- Which content should stay together (e.g., contact info)
- Which content should be de-prioritized (navigation, menus, etc.)
It improves:
- Context integrity
- Answer quality
- AI citation consistency
The robots.txt Changes
Your robots.txt file controls who is allowed to crawl the website.
Traditionally, this was only about Googlebot and Bingbot.
Now we explicitly allow known AI and research crawlers.
Example:
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
What this means:
- We are not blocking AI systems
- We are encouraging discovery
- We are staying ahead of competitors
- We are signaling openness to AI indexing
It does NOT:
- Harm Google rankings
- Reduce SEO performance
- Open security vulnerabilities
It simply ensures that AI systems can legally and cleanly access the content.
Why This Is Good for the Website
Increased AI Citations
AI tools increasingly cite structured, clear sources.
This increases:
- Brand mentions
- Referral traffic
- Authority perception
- Higher Conversion Alignment
We explicitly define:
- Call actions
- Quote forms
- Booking pages
This makes it more likely that AI assistants will guide users toward:
- Contacting the company
- Requesting service
- Calling directly
Reduced Hallucinations
By defining:
- Authoritative pages
- No-invent rules
- Citation priorities
We reduce:
- Incorrect pricing claims
- False guarantees
- Inaccurate service descriptions
Competitive Advantage
Most businesses:
- Have no AI ingestion framework
- Are invisible to AI retrieval logic
- Rely entirely on traditional SEO
This gives your site a first-mover advantage in AI search.
Future-Proofing
Search is evolving from "Ranked results" to "AI summarized answers with citations"
These files prepare the site for:
- AI Overviews
- Conversational search
- Voice search
- Autonomous research agents
- Multi-step AI decision workflows
