How to run prompt-level SEO experiments for AI search
As LLMs continue to grow, optimizing brand visibility in AI-generated responses is becoming increasingly important. Consumers are turning to these models for answers, recommendations, recipes, vacations, and nearly everything else imaginable.
But what happens if your brand isn’t included in those responses? Can you influence the outcome? And what are some proven ways to improve your brand’s inclusion and visibility?
That’s where structured experimentation comes in. Prompt-level SEO requires more than assumptions or one-off wins. It requires repeatable testing frameworks that help isolate what actually influences LLM responses.
Build prompt-level SEO tests with a hypothesis framework
There are countless recommendations on how to improve your LLM presence. Experimentation is key to discovering what works for your industry and brand.
Hypothesis-driven testing is the way we structure these tests for our brands. It breaks things down in a structured way that can be replicated across tests and situations.
This framework creates a common approach to testing and helps you quickly understand the test and its outputs. The structure consists of three main pieces: if, then, because.
- If: This part provides the hypothesis: what is the test action?
- “If we include more detailed product specifications in our content.”
- Then: What will happen once the “if” section is completed? The outcome.
- “Then we’ll see our brand get included in more product-specific prompts.”
- Because: This is why you believe this will occur. What is the theory behind this test?
- “Because LLMs value detailed and specific information in their prompt responses.”
This framework requires some basic fundamentals that ensure you’re thinking through the test. It also allows you to go back later and validate whether you have tested these specific elements in the past and what the premises, theories, and outcomes were.
This helps because, as things change, the test elements may still be valid simply because the world shifts — changing the “because” section.
Your customers search everywhere. Make sure your brand shows up.
The SEO toolkit you know, plus the AI visibility data you need.
Start Free Trial
Get started with
Key considerations before running prompt-level SEO tests
Before we get to the recommendations for testing best practices, here are some considerations when running these tests:
- Model updates: These models are updated constantly. As some models move from 4.1 to 4.2, it’s time to revisit those results. How did the model change the inputs and outputs?
- Prompt drift: Have you ever run the exact same prompt twice in a day or on consecutive days? Often, the results change. Therefore, running the prompt more than once and on consecutive days to evaluate the outcome is important to get a true baseline. This is no different from personalized search results. Brands get comfortable with the variance, but some averages surface and become the benchmark. Prompt testing works much the same way.
Now that you have the framework of the test, let’s think about the core elements of tests that can be used in prompt-specific testing.
How to isolate variables: A methodological approach
Designing a reliable prompt-level SEO experiment requires isolating a single causal variable. This is crucial for confidently attributing changes in LLM response inclusion or position to a specific action.
1. Content changes
When testing content modifications, the variable must be surgical. A common pitfall is changing too much at once (e.g., updating a product description and the page’s schema).
- Best practice — The single-paragraph swap: Focus on modifying a single, targeted piece of text on the page, such as a product description, FAQ answer, or a specific feature bullet point.
- Methodology: For true isolation, implement A/B testing with a control page containing the original content and a test page containing the modified content. The prompt should be designed to target the specific information you changed. Measure the brand’s inclusion rate and position-in-response over a defined period (e.g., seven days – keep in mind these models are moving at a variety of speeds. This work, much like SEO, isn’t a microwave, but more like an oven).
2. Structured data
Structured data (schema) provides explicit signals to both search engines and LLM ingestion layers. Testing this requires treating the schema update as the only change to the page.
- Variable isolation: Test adding new properties (e.g., brand, model, and offer details) without altering the visible HTML text. This isolates the impact of the machine-readable layer.
- Specific experiment — FAQ schema: A highly effective experiment is adding FAQ schema to pages that already have Q&A sections in their HTML, isolating the effect of the explicit schema markup on LLM ingestion. Our work with brands has demonstrated that adding FAQ schema to pages with Q&A sections makes those sections easier for LLMs to ingest.
3. Before-and-after prompt testing
This process involves establishing a stringent baseline, making the change, and then repeating the prompt query. This is an essential control method in lieu of true A/B testing on the LLM itself.
Protocol
- Phase 1 (baseline): Execute a set of 5-10 target prompts daily for seven consecutive days to establish a true average of inclusion and position-in-response, accounting for prompt drift.
- Action: Deploy the isolated change (e.g., content or schema update).
- Phase 2 (measurement): Re-run the exact same set of prompts daily for the next seven days.
- Analysis: Compare the average inclusion rate and position of Phase 1 versus Phase 2. This method is central to initial presence score analyses, such as using three buckets of 25 keywords and prompts for a total of 75 queries.
Get the newsletter search marketers rely on.
Encouraging reproducible experiments
With the speed of model evolution and the lack of detailed model insights, it’s difficult to ensure reproducibility of results. However, the goal is to move beyond simple “it worked once” findings to build a durable methodology.
Mandatory frameworks
Ensure every test is documented using the “if, then, because” hypothesis structure. This archives the premise, action, and expected outcome, allowing future teams to quickly validate whether a test remains relevant as LLMs evolve.
Technical integrity
- Version control: Document the specific model and version used for testing (e.g., “Gemini 4.1.2”). This allows for easy comparison when a model update occurs.
- Prompt libraries: Maintain an organized, time-stamped repository of the exact prompt queries used for baseline and measurement phases. This repository should track inclusion rate, position-in-response, and sentiment/framing for each query.
Infrastructure consistency
Define the testing environment (e.g., clear browser cache, no login state) and, where possible, use APIs or synthetic testing platforms to remove the impact of personalization and location bias, which is analogous to controlling for personalized search results in traditional SEO.
See the complete picture of your search visibility.
Track, optimize, and win in Google and AI search from one platform.
Start Free Trial
Get started with
Moving beyond one-off wins in AI search
The key to prompt-level SEO is rigorous methodology. By adopting a hypothesis-driven approach, surgically isolating variables (content, entities, schema), and establishing strict before-and-after testing protocols, you can confidently move past speculation.
The path to influencing LLM responses is paved with controlled, documented, and reproducible experiments.
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.

