How I optimized my blog for AI search engines
People don’t just find websites through classic search results anymore. Discovery is increasingly happening through answer engines, AI summaries, and chat-based tools.
Because of this shift, I wanted to make my blog easier for both traditional crawlers and modern AI systems to parse. Don’t get me wrong—there isn’t some secret “AI SEO” trick that guarantees your site will get cited. It’s really just about making things easier to crawl, understand, and summarize.
The priorities
When tackling this, I split the work into two distinct buckets:
- Proven baseline improvements: Things like structured data, canonical URLs, Open Graph metadata, solid sitemap coverage, and clear
robots.txtrules. - Optional machine-readable indexes: Files like
llms.txtandllms-full.txt. While useful, they aren’t standardized in quite the same way yet.
That distinction is actually pretty important. I definitely wouldn’t recommend skipping the basics just to jump straight into adding an llms.txt file.
Structured data and metadata
Honestly, the most impactful improvements were the boring ones.
I went ahead and added site-level and article-level structured data. This way, machines can answer simple questions about the site without having to guess:
- what the site is about
- who writes it
- what a specific page represents
- how a single blog post fits into the broader site structure
This effort resulted in four main schema types:
WebSitePersonBlogPostingBreadcrumbList
Down at the HTML level, I also double-checked that every single page includes:
- a canonical URL
- a genuinely useful meta description
- Open Graph tags for rich previews
- Twitter card tags
These tweaks don’t just help classic search engines—they make the entire site much easier for other tools to interpret correctly.
robots.txt for explicit crawler access
If you want AI crawlers to actually access your content, you need to be explicit about it.
My robots.txt still keeps the broad allow rule, but I’ve added named entries specifically for the bots I want to permit:
User-agent: GPTBotAllow: /
User-agent: ChatGPT-UserAllow: /
User-agent: Google-ExtendedAllow: /
User-agent: PerplexityBotAllow: /
User-agent: Anthropic-aiAllow: /
User-agent: Claude-WebAllow: /
User-agent: Applebot-ExtendedAllow: /
User-agent: cohere-aiAllow: /Adding these isn’t a promise that every AI system will suddenly crawl, cite, or rank the site. It just makes your intent crystal clear and removes an easily avoidable blocker.
llms.txt as an optional index
I also decided to add /llms.txt and /llms-full.txt.
I don’t treat these files as guaranteed ranking factors by any means. Instead, I view them as optional, low-effort indexes that make the site incredibly easy to inspect programmatically.
llms.txtserves as the quick, short version.llms-full.txtpacks in fuller summaries and richer metadata.
A minimal llms.txt entry can look as simple as this:
## Technical Articles
### Web Development
- [How I migrated my blog from Gatsby to Astro](https://theodoroskokosioulis.com/blog/gatsby-to-astro-migration): A complete guide to moving a personal blog from Gatsby to Astro.If a tool happens to use those files, it can understand the site a lot faster. If it doesn’t, no harm done—the baseline improvements still stand perfectly well on their own.
security.txt and trust signals
While I was at it, I added a security contact file at /.well-known/security.txt (the standard RFC 9116 location). To be clear, this isn’t some weird growth hack. It’s just a clean, standardized way to expose a security contact, which happens to make the site look a bit more complete and intentional.
Keeping the indexes updated
The most obvious downside of having an llms.txt file is the maintenance. Every time you publish a new post, you’ve got another place where metadata can quickly go stale.
I solved that headache with a GitHub Action that uses Cursor Agent. The workflow scopes itself to the posts changed in the current run and refreshes only the relevant index entries automatically, ensuring everything stays perfectly aligned with the actual site content.
Checklist
If you’re looking to do this yourself, here are the changes actually worth making:
| Item | Why I added it |
|---|---|
| Canonical URLs | One clear URL per page |
| Open Graph and Twitter tags | Better metadata for previews and parsers |
WebSite, Person, BlogPosting, BreadcrumbList | Machine-readable structure |
robots.txt rules | Explicit crawler permissions |
| Sitemap and RSS feed | Baseline discovery signals |
/llms.txt and /llms-full.txt | Optional machine-readable site indexes |
/.well-known/security.txt | Standard security contact and trust signal |
How I verified it
I checked my implementation using a few straightforward methods:
- Running pages through Google’s Rich Results Test to catch any glaring structured data issues.
- Validating the JSON-LD using the Schema.org validator.
- Visiting
/robots.txt,/llms.txt, and/.well-known/security.txtdirectly in the browser just to confirm they’re public and up to date.
What is actually worth doing
If you only have time to do three things, start here:
- Clean up your metadata and canonical URLs.
- Add structured data that accurately matches the page content.
- Keep your overall content organized and easy to crawl.
Once that’s done, you can think about adding an llms.txt file—but only if you’re willing to maintain it and actually see value in publishing a machine-readable index of your site.
My general rule for AI-facing SEO is pretty simple: nail the durable basics first, and then you can start experimenting with the optional extras.