Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

Getting prompts right is still the hardest part of shipping reliable LLM applications. Small wording changes can swing accuracy by 20 percent. What works on a few examples often breaks at scale. When a multi-step pipeline returns a wrong answer, finding the failing step means inspecting intermediate outputs by hand.

Cisco AI introduced FAPO to address that bottleneck. FAPO stands for Fully Automated Prompt Optimization. It is a Claude Code-driven system that optimizes LLM pipelines from baseline prompts to target accuracy. You supply a dataset and an initial prompt. FAPO then evaluates, classifies failures, proposes variants, validates them, and iterates. The whole loop is orchestrated by Claude Code agents. The project ships open source under Apache 2.0, and also supports Codex as the optimization agent.

In Cisco’s reported evaluation, FAPO beat GEPA, a state-of-the-art prompt optimizer, on 15 of 18 model-benchmark comparisons. On the two benchmarks where FAPO escalated to pipeline changes, the mean gain over GEPA reached +33.8pp.

TL;DR

FAPO is a Claude Code-driven system that autonomously optimizes multi-step LLM pipelines from baseline prompts to target accuracy, open source under Apache 2.0.
It escalates through three levels — prompt, parameter, then chain structure — using step-level failure attribution to decide what to change next.
In Cisco’s evaluation, FAPO beat GEPA on 15 of 18 model-benchmark comparisons, with a +14.1pp mean gain.
On HoVer and IFBench, where it escalated to pipeline changes, FAPO won all six pairs at a +33.8pp mean gain; AIME was GEPA’s only win, within sampling noise.
Guardrails against overfitting include training-split-only inspection, immutable variant files, and an independent reviewer on every proposal.

What is FAPO

FAPO is a multi-tenant evaluation and optimization framework. A tenant is a self-contained optimization project. Each tenant directory holds one task’s prompts, dataset, chain definition, scorer, and config. Tenants stay isolated, so unrelated tasks optimize side by side without interference.

The core engine is named hephaestus and is domain-agnostic. It handles evaluation, chain execution, and scoring. Chains are LangGraph state graphs that process each test case. Out of the box, FAPO supports three providers: OpenAI, Baseten, and SageMaker.

The one input you must bring is a dataset. It is paired inputs and expected outputs that define success. FAPO splits it into a validation set and a held-out test set. The validation set drives iteration; the test set is used only for a final one-shot evaluation. From a task description, Claude can scaffold the rest: the initial prompt, the chain, and the scorer.

How the Optimization Loop Works

Once the pieces exist, FAPO runs a closed loop until target accuracy is reached. Each cycle runs six stages:

Evaluate — run the chain on the dataset, collect per-case scores and step-level outputs.
Attribute — classify failures by root cause using rule-based heuristics plus LLM analysis.
Propose — generate a variant targeting the dominant failure cluster.
Review — an independent agent validates the proposal for scope compliance and data leakage.
Compare — accept the variant only if it improves on the previous best, otherwise reject.
Iterate — continue until target accuracy is reached or the optimization budget is exhausted.

The system works at three escalating levels. Prompt edits are lowest cost and tried first. Parameter changes adjust config values like retrieval_k or temperature. Structural changes alter chain topology, such as adding a self-reflection node or switching to a ReAct pattern. FAPO exhausts one level before escalating to the next.

Step attribution sorts failures into four classes. Retrieval failures return empty or irrelevant content. Cascading failures begin when an early step produces empty output. Format failures hide the correct answer inside text the scorer cannot parse. Reasoning failures occur when good inputs still produce a wrong conclusion. Format and reasoning issues are prompt-addressable. Retrieval and cascade issues are structural-addressable.

Guardrails keep the optimizer from overfitting. It inspects only training-split cases, while validation and test expose aggregate scores only. Every variant is a new immutable file, never edited in place. An independent reviewer checks each proposal before it runs.

The Benchmark Case: FAPO vs. GEPA

Cisco team evaluated FAPO against GEPA (Generalized Evolutionary Prompt Architecture), a state-of-the-art prompt optimization method. GEPA uses evolutionary search with genetic operators to optimize prompts for multi-step pipelines. Both systems started from identical baseline pipelines and prompts. FAPO could escalate to structural changes when attribution found bottlenecks. GEPA was limited to prompt-level optimization.

The comparison spanned six benchmarks and three task models: GPT-4.1-mini, GPT-5.4-mini, and Gemma 3-12B. Claude Opus 4.6 served as both FAPO’s orchestrator and GEPA’s reflector. Scores below are averaged across the three task models.

Benchmark	Baseline	GEPA	FAPO	Gain vs. GEPA
HoVer	35.9	48.5	83.8	+35.3pp
IFBench	35.7	48.5	80.7	+32.2pp
LiveBench-Math	51.0	52.6	62.0	+9.4pp
HotpotQA	50.9	61.8	68.3	+6.5pp
Papillon	73.6	90.7	94.9	+4.2pp
AIME	16.7	16.0	12.9	-3.1pp

FAPO won 15 of 18 model-benchmark comparisons, with a mean gain of +14.1pp over GEPA. On HoVer and IFBench, where FAPO escalated to pipeline changes, it won all six model-benchmark pairs. The mean gain there was +33.8pp. On the four benchmarks without structural changes, FAPO still won 9 of 12 through prompt optimization alone. AIME was the only benchmark where GEPA led, by 3.1pp. The gap is smaller than the standard deviation across stochastic trials.

A capability comparison shows the design difference reported by Cisco. Every row below reflects the source description of the two systems.

Capability	GEPA	FAPO
Optimization levels	Prompt text only	Prompt → parameter → structural
Can change chain structure	No	Yes, when attribution finds bottlenecks
How it is driven	Evolutionary search with genetic operators	Claude Code or Codex agent loop
Result across 18 model-benchmark pairs	Reference	Wins 15 of 18; +14.1pp mean

Where It Fits: Use Cases

FAPO targets multi-step LLM pipelines, not single prompts. A few concrete examples:

Multi-hop question answering: A chain retrieves documents, extracts facts, reasons over evidence, and formats an answer. In Cisco’s documented walkthrough, a multi-hop QA chain rose from 39.3% to 70.3% validation exact match across two iterations. Attribution then flagged the remaining failures as retrieval-limited, signaling a structural fix. Separately, on the HotpotQA benchmark, FAPO reached 68.3% test accuracy versus GEPA’s 61.8%.
Instruction following: On IFBench, format-constraint failures pushed FAPO to escalate beyond prompts, reaching 80.7% test accuracy.
Classification: A software-name-to-category task can be scaffolded by Claude Code, then optimized to exact-match targets.
ReAct agents: An MCP workflow extension optimizes a tool-calling ReAct agent using trajectory scoring and LLM-as-Judge scoring.

Getting Started

The fastest path is to let Claude Code create the tenant files. From the repo, describe your task in plain English, then add a JSONL dataset. Each line is one test case with case_id, task_type, context, expected, and metadata:

{"case_id": "1", "task_type": "qa", "context": {"question": "What is the capital of France?"}, "expected": {"answer": "Paris"}, "metadata": {}}
{"case_id": "2", "task_type": "qa", "context": {"question": "What is 2 + 2?"}, "expected": {"answer": "4"}, "metadata": {}}

A scorer compares the chain output to the expected answer. It implements validate_case to catch bad data early and score_case to return a composite score:

from hephaestus.scoring.scorer import Scorer as BaseScorer

class Scorer(BaseScorer):
    def validate_case(self, case, scoring_profile):
        assert "answer" in case.expected, "Missing 'answer' in expected"

    def score_case(self, case, output_text, scoring_profile):
        expected = case.expected["answer"].strip().lower()
        predicted = output_text.strip().lower()
        em = 100.0 if predicted == expected else 0.0
        return {"composite_score": em, "score_breakdown": {"exact_match": em}}

Verify the setup with a baseline evaluation:

export OPENAI_API_KEY="sk-..."
python -m hephaestus.cli eval --config tenants/my_project/configs/eval.json

Then invoke the optimization agent with a tenant, config, and success criteria such as composite_score >= 90. Claude Code produces a scope contract, then iterates autonomously. Every prompt variant, config, and per-variant analysis is written to disk, so each run stays auditable. A local read-only UI called FAPO Explorer browses the artifacts afterward.

Strengths and Weaknesses

Strengths

Pipeline-aware scoring attributes failures to the step that caused them, not just the final output.
Three-level escalation handles failures that prompts alone cannot fix.
Guardrails against overfitting: training-split-only inspection, immutable variants, and an independent reviewer.
Open source under Apache 2.0, with both Claude Code and Codex supported.

Weaknesses

Optimization quality is bounded by the dataset’s quality and coverage, which you must supply.
The project is recent, so independent production track records are still limited.
The default loop depends on agentic coding tools (Claude Code or Codex) rather than a standalone optimizer.

Interactive Explainer

<br /><head><br /><meta charset="UTF-8"><br /><meta name="viewport" content="width=device-width, initial-scale=1.0"><br /><title>FAPO Optimization Loop Explorer</title></p><style>*{margin:0;padding:0;box-sizing:border-box} :root{ --bg:#ffffff;--ink:#0a0a0a;--mut:#6b6b6b;--line:#e2e2e2; --soft:#f5f5f5;--soft2:#ececec;--accent:#0a0a0a; } body{ background:var(--bg);color:var(--ink); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif; line-height:1.5;-webkit-font-smoothing:antialiased; } #fapo-wrap{max-width:860px;margin:0 auto;padding:22px 18px 14px} .mono{font-family:"SF Mono",ui-monospace,"Cascadia Code","Roboto Mono",Menlo,Consolas,monospace} .hd{border:1px solid var(--ink);padding:16px 18px;margin-bottom:16px} .kicker{font-size:11px;letter-spacing:.18em;text-transform:uppercase;color:var(--mut);margin-bottom:7px} .hd h1{font-size:21px;line-height:1.25;font-weight:700;letter-spacing:-.01em} .hd p{font-size:13px;color:var(--mut);margin-top:7px}</p> <p> .panel{border:1px solid var(--line);margin-bottom:14px} .panel-h{display:flex;align-items:center;gap:8px;padding:10px 14px;border-bottom:1px solid var(--line);background:var(--soft)} .panel-h .n{width:20px;height:20px;border:1px solid var(--ink);display:flex;align-items:center;justify-content:center;font-size:11px;font-weight:700} .panel-h h2{font-size:12px;letter-spacing:.04em;text-transform:uppercase;font-weight:700} .panel-b{padding:14px}</p> <p> .chips{display:flex;flex-wrap:wrap;gap:8px} .chip{border:1px solid var(--ink);background:var(--bg);color:var(--ink);padding:7px 12px;font-size:12.5px;cursor:pointer;transition:all .12s;font-weight:600} .chip:hover{background:var(--soft2)} .chip.on{background:var(--ink);color:var(--bg)} .chip .tag{font-size:9.5px;letter-spacing:.08em;display:block;opacity:.6;text-transform:uppercase;margin-top:1px;font-weight:600}</p> <p> .meterwrap{margin-top:4px} .mrow{display:flex;align-items:center;gap:10px;margin-bottom:9px} .mlabel{width:78px;font-size:11px;color:var(--mut);text-transform:uppercase;letter-spacing:.05em;text-align:right;flex-shrink:0} .track{flex:1;height:26px;background:var(--soft);border:1px solid var(--line);position:relative;overflow:hidden} .fill{height:100%;background:var(--soft2);transition:width .6s cubic-bezier(.4,0,.2,1)} .fill.live{background:repeating-linear-gradient(45deg,#0a0a0a,#0a0a0a 6px,#2b2b2b 6px,#2b2b2b 12px)} .fill.gepa{background:#bdbdbd} .fill.fapo{background:#0a0a0a} .fill.base{background:#dcdcdc} .mval{position:absolute;right:7px;top:50%;transform:translateY(-50%);font-size:12px;font-weight:700} .fill.live .mval,.fill.fapo .mval{color:#fff} .mval.dark{color:var(--ink)}</p> <p> .controls{display:flex;gap:8px;flex-wrap:wrap;margin-top:14px} button.act{border:1px solid var(--ink);background:var(--ink);color:var(--bg);padding:9px 16px;font-size:12.5px;font-weight:700;cursor:pointer;letter-spacing:.03em;transition:opacity .12s} button.act:hover{opacity:.82} button.act.ghost{background:var(--bg);color:var(--ink)} button.act.ghost:hover{background:var(--soft2)} button.act:disabled{opacity:.35;cursor:not-allowed}</p> <p> .loop{display:grid;grid-template-columns:repeat(6,1fr);gap:6px} .stage{border:1px solid var(--line);padding:9px 7px;text-align:center;position:relative;transition:all .2s} .stage .si{font-size:9.5px;color:var(--mut);font-weight:700} .stage .sn{font-size:11.5px;font-weight:700;margin-top:3px} .stage.on{background:var(--ink);color:var(--bg);border-color:var(--ink)} .stage.on .si{color:#bdbdbd} .stage.done{background:var(--soft)}</p> <p> .grid2{display:grid;grid-template-columns:1fr 1fr;gap:14px} @media(max-width:600px){.grid2{grid-template-columns:1fr}.loop{grid-template-columns:repeat(3,1fr)}.mlabel{width:60px}}</p> <p> .attr .arow{display:flex;align-items:center;gap:8px;margin-bottom:7px} .attr .an{flex:1;font-size:12px} .attr .abar{width:120px;height:14px;background:var(--soft);border:1px solid var(--line);position:relative} .attr .af{height:100%;background:var(--ink);transition:width .5s} .attr .av{width:30px;text-align:right;font-size:11px;font-weight:700;color:var(--mut)} .badge{display:inline-block;font-size:9px;letter-spacing:.06em;text-transform:uppercase;border:1px solid var(--ink);padding:1px 5px;font-weight:700;margin-left:6px;vertical-align:middle} .badge.struct{background:var(--ink);color:var(--bg)}</p> <p> .log{font-size:12px;max-height:188px;overflow-y:auto;border:1px solid var(--line);background:var(--soft)} .log .li{padding:8px 11px;border-bottom:1px solid var(--line);display:flex;gap:9px} .log .li:last-child{border-bottom:0} .log .lit{flex-shrink:0;font-weight:700;font-size:11px;color:var(--mut);width:26px} .log .lvl{display:inline-block;font-size:9px;text-transform:uppercase;letter-spacing:.05em;border:1px solid var(--ink);padding:0 4px;font-weight:700;margin-right:5px} .log .lvl.s{background:var(--ink);color:var(--bg)} .log .ok{font-weight:700} .log .empty{padding:14px 11px;color:var(--mut);font-style:italic}</p> <p> .note{font-size:11px;color:var(--mut);margin-top:10px;line-height:1.45;border-left:2px solid var(--line);padding-left:10px} .ft{display:flex;justify-content:space-between;align-items:center;margin-top:16px;padding-top:12px;border-top:1px solid var(--line);font-size:11px;color:var(--mut);flex-wrap:wrap;gap:6px} .ft a{color:var(--ink);text-decoration:none;font-weight:700;border-bottom:1px solid var(--ink)} .ft .brand{font-weight:700;color:var(--ink)} .verdict{font-size:13px;margin-top:12px;padding:11px 13px;border:1px solid var(--ink);background:var(--soft);display:none} .verdict.show{display:block} .verdict b{font-weight:700}</style><p></head><br /><body></p><div id="fapo-wrap"><div class="hd"><div class="kicker">Interactive · Cisco Foundation AI</div><h1>FAPO Optimization Loop Explorer</h1><p>Pick a benchmark, then run FAPO’s closed loop. Watch failure attribution drive each variant, and accuracy climb from baseline past GEPA toward the FAPO result.</p></p></div><p></p><div class="panel"><div class="panel-h"><span class="n">1</span></p><h2>Choose a benchmark task</h2></div><div class="panel-b"><div class="chips" id="chips"></div><div class="note" id="benchNote"></div></p></div></p></div><p></p><div class="panel"><div class="panel-h"><span class="n">2</span></p><h2>Run the six-stage loop</h2></div><div class="panel-b"><div class="loop" id="loop"></div><div class="controls"> <button class="act" id="stepBtn">Step one cycle</button><br /> <button class="act ghost" id="runBtn">Run to target</button><br /> <button class="act ghost" id="resetBtn">Reset</button></div></p></div></p></div><p></p><div class="grid2"><div class="panel"><div class="panel-h"><span class="n">3</span></p><h2>Accuracy</h2></div><div class="panel-b"><div class="meterwrap" id="meters"></div></p></div></p></div><div class="panel"><div class="panel-h"><span class="n">4</span></p><h2>Failure attribution</h2></div><div class="panel-b attr" id="attr"></div></p></div></p></div><p></p><div class="panel"><div class="panel-h"><span class="n">5</span></p><h2>Variant log</h2></div><div class="panel-b"><div class="log" id="log"></div></p></div></p></div><div class="verdict" id="verdict"></div><div class="note"> Published numbers (baseline, GEPA, FAPO) come from Cisco Foundation AI’s report, averaged across three task models (GPT-4.1-mini, GPT-5.4-mini, Gemma 3-12B). The per-cycle trajectory shown here is an illustrative reconstruction of how prompt → parameter → structural escalation reaches the published FAPO result; intermediate values are not from the source.</div><div class="ft"> <span class="brand">Marktechpost</span><br /> <span><a href="https://github.com/cisco-foundation-ai/fully-automated-prompt-optimization" target="_blank" rel="noopener">FAPO on GitHub</a> · Apache 2.0</span></div></div><p><script><br /> (function(){<br /> // Real published endpoints; "struct" = escalated to structural changes (HoVer, IFBench)<br /> var DATA = {<br /> hover: {name:"HoVer", base:35.9, gepa:48.5, fapo:83.8, struct:true, dom:"retrieval",<br /> desc:"Multi-hop claim verification. FAPO escalated to pipeline changes after attribution flagged retrieval bottlenecks."},<br /> ifbench: {name:"IFBench", base:35.7, gepa:48.5, fapo:80.7, struct:true, dom:"format",<br /> desc:"Instruction following. Format-constraint failures pushed FAPO beyond prompts into structural edits."},<br /> livebench:{name:"LiveBench-Math", base:51.0, gepa:52.6, fapo:62.0, struct:false, dom:"reasoning",<br /> desc:"Math reasoning. FAPO improved through prompt-level optimization alone."},<br /> hotpotqa: {name:"HotpotQA", base:50.9, gepa:61.8, fapo:68.3, struct:false, dom:"reasoning",<br /> desc:"Multi-hop QA. Prompt edits lifted accuracy; remaining failures were retrieval-limited."},<br /> papillon: {name:"Papillon", base:73.6, gepa:90.7, fapo:94.9, struct:false, dom:"format",<br /> desc:"Privacy-preserving delegation. Strong baseline, refined with prompt-level variants."},<br /> aime: {name:"AIME", base:16.7, gepa:16.0, fapo:12.9, struct:false, dom:"reasoning",<br /> desc:"Competition math. The only benchmark where GEPA leads — a gap within sampling noise."}<br /> };<br /> var ORDER = ["hover","ifbench","livebench","hotpotqa","papillon","aime"];<br /> var STAGES = ["Evaluate","Attribute","Propose","Review","Compare","Iterate"];<br /> var FAILS = ["retrieval","cascade","format","reasoning"];<br /> var FAILNAME = {retrieval:"Retrieval",cascade:"Cascade",format:"Format",reasoning:"Reasoning"};</p> <p> var cur="hover", cycle=0, acc=0, running=false, plan=[];</p> <p> // Build an illustrative escalation plan from base -> fapo<br /> function buildPlan(d){<br /> var p=[], steps, levels, gain;<br /> var total = d.fapo - d.base;<br /> if(d.fapo < d.gepa){ // AIME case: GEPA leads, FAPO regresses return [ {level:"prompt", target:d.dom, acc:d.base, accepted:false, msg:"Prompt edits did not beat baseline"}, {level:"prompt", target:d.dom, acc:d.fapo, accepted:false, msg:"Within sampling noise; loop halts"} ]; } if(d.struct){ // prompt gains, then parameter, then a big structural jump var afterPrompt = d.base + total*0.32; var afterParam = d.base + total*0.50; p.push({level:"prompt", target:d.dom==="retrieval"?"format":d.dom, acc:afterPrompt, accepted:true, msg:"Added task rules; partial lift"}); p.push({level:"prompt", target:"reasoning", acc:d.base+total*0.40, accepted:false, msg:"No gain; plateau at prompt level"}); p.push({level:"parameter", target:d.dom, acc:afterParam, accepted:true, msg:"Tuned retrieval_k / temperature"}); p.push({level:"structural",target:d.dom, acc:d.base+total*0.82, accepted:true, msg:"Added node to fix "+d.dom+" bottleneck"}); p.push({level:"structural",target:d.dom, acc:d.fapo, accepted:true, msg:"Refined topology; target reached"}); } else { p.push({level:"prompt", target:d.dom, acc:d.base+total*0.55, accepted:true, msg:"Added brevity / format rules"}); p.push({level:"prompt", target:d.dom, acc:d.base+total*0.45, accepted:false, msg:"Variant regressed; rejected"}); p.push({level:"prompt", target:d.dom, acc:d.base+total*0.80, accepted:true, msg:"Must-always-answer constraint"}); p.push({level:"parameter", target:d.dom, acc:d.fapo, accepted:true, msg:"Tuned params; target reached"}); } return p; } function attrFor(d, ratio){ // dominant failure shrinks as optimization proceeds var base = {retrieval:18,cascade:12,format:24,reasoning:22}; base[d.dom] = 46; var o={}, sum=0; FAILS.forEach(function(f){ var v=base[f]*(f===d.dom?(1-ratio*0.7):(1+ratio*0.15)); o[f]=Math.max(2,v); sum+=o[f]; }); FAILS.forEach(function(f){ o[f]=Math.round(o[f]/sum*100); }); return o; } // ---------- render ---------- function chips(){ var h=""; ORDER.forEach(function(k){ var d=DATA[k]; h+=' <div class="chip'+(k===cur?' on':'')+'" data-k="'+k+'">'+d.name+<br /> '<span class="tag">'+(d.struct?'structural':'prompt-level')+'</span></div> <p>';<br /> });<br /> document.getElementById("chips").innerHTML=h;<br /> document.querySelectorAll(".chip").forEach(function(c){<br /> c.onclick=function(){ if(running) return; cur=c.getAttribute("data-k"); reset(); };<br /> });<br /> document.getElementById("benchNote").textContent=DATA[cur].desc;<br /> }</p> <p> function renderLoop(active){<br /> var h="";<br /> STAGES.forEach(function(s,i){<br /> var cls="stage"; if(i===active) cls+=" on"; else if(active>i||active===-2) cls+=" done";<br /> h+='</p> <div class="'+cls+'"> <div class="si">'+(i+1)+'</div> <div class="sn">'+s+'</div> </div> <p>';<br /> });<br /> document.getElementById("loop").innerHTML=h;<br /> }</p> <p> function pct(v,d){ var max=Math.max(d.fapo,d.gepa,d.base,acc); var lo=Math.min(d.base,d.fapo,acc)-6; return Math.max(3,(v-Math.max(0,lo))/(max-Math.max(0,lo))*100); }</p> <p> function meters(){<br /> var d=DATA[cur];<br /> function row(label,val,cls){<br /> return '</p> <div class="mrow"> <div class="mlabel">'+label+'</div> <p>'+<br /> '</p> <div class="track"> <div class="fill '+cls+'" style="width:'+pct(val,d)+'%">'+<br /> '<span class="mval'+(cls==='base'||cls==='gepa'?' dark':'')+'">'+val.toFixed(1)+'</span></div> </div> </div> <p>';<br /> }<br /> var liveCls = acc>=d.fapo ? "fapo" : "live";<br /> var h = row("Baseline",d.base,"base")+<br /> row("GEPA",d.gepa,"gepa")+<br /> row("FAPO",d.fapo,"fapo")+<br /> row("Current",acc>0?acc:d.base, acc>0?liveCls:"base");<br /> document.getElementById("meters").innerHTML=h;<br /> }</p> <p> function attr(){<br /> var d=DATA[cur];<br /> var ratio = (d.fapo>d.base)? Math.min(1,(acc-d.base)/(d.fapo-d.base||1)) : 0;<br /> if(acc<=0) ratio=0; var a=attrFor(d, ratio<0?0:ratio); var h=""; FAILS.forEach(function(f){ var tag = (f==="retrieval"||f==="cascade")?'<span class="badge struct">structural</span>':'<span class="badge">prompt</span>';<br /> h+='</p> <div class="arow"> <div class="an">'+FAILNAME[f]+tag+'</div> <p>'+<br /> '</p> <div class="abar"> <div class="af" style="width:'+a[f]+'%"></div> </div> <p>'+<br /> '</p> <div class="av">'+a[f]+'%</div> </div> <p>';<br /> });<br /> document.getElementById("attr").innerHTML=h;<br /> }</p> <p> function logRender(){<br /> var el=document.getElementById("log");<br /> if(cycle===0){ el.innerHTML='</p> <div class="empty">No cycles run yet. Press “Step one cycle”.</div> <p>'; return; }<br /> var h="";<br /> for(var i=0;i<cycle;i++){ var s=plan[i]; var lvlCls = s.level==="structural"?"lvl s":"lvl"; var acc2=s.acc.toFixed(1); var verdict = s.accepted? '<span class="ok">accepted → '+acc2+'</span>' : 'rejected';<br /> h+='</p> <div class="li"> <div class="lit">'+(i+1).toString().padStart(2,"0")+'</div> <p>'+<br /> '</p> <div><span class="'+lvlCls+'">'+s.level+'</span>'+s.msg+'. <span style="color:var(--mut)">'+verdict+'</span></div> </div> <p>';<br /> }<br /> el.innerHTML=h; el.scrollTop=el.scrollHeight;<br /> }</p> <p> function verdictBox(){<br /> var d=DATA[cur], v=document.getElementById("verdict");<br /> if(cycle>=plan.length){<br /> var beat = d.fapo - d.gepa;<br /> var txt;<br /> if(d.fapo<d.gepa){ txt=">"+d.name+': FAPO finished at '+d.fapo.toFixed(1)+', '+Math.abs(beat).toFixed(1)+'pp under GEPA. This is the lone benchmark where GEPA leads, and the gap sits within sampling noise.';<br /> } else {<br /> txt="<b>"+d.name+':</b> FAPO reached '+d.fapo.toFixed(1)+', '+beat.toFixed(1)+'pp over GEPA'+<br /> (d.struct?' after escalating to structural pipeline changes.':' through prompt and parameter edits alone.');<br /> }<br /> v.innerHTML=txt; v.classList.add("show");<br /> } else { v.classList.remove("show"); }<br /> }</p> <p> function paint(active){ renderLoop(active===undefined?-1:active); meters(); attr(); logRender(); verdictBox(); }</p> <p> function reset(){<br /> cycle=0; acc=0; running=false; plan=buildPlan(DATA[cur]);<br /> document.getElementById("stepBtn").disabled=false;<br /> document.getElementById("runBtn").disabled=false;<br /> chips(); paint(-1); fit();<br /> }</p> <p> function applyCycle(cb){<br /> var s=plan[cycle];<br /> // animate through the 6 stages quickly<br /> var i=0;<br /> var iv=setInterval(function(){<br /> renderLoop(i);<br /> if(i===4){ // Compare -> update accuracy if accepted<br /> if(s.accepted) acc=s.acc;<br /> meters(); attr();<br /> }<br /> i++;<br /> if(i>=6){<br /> clearInterval(iv);<br /> cycle++;<br /> renderLoop(-2); logRender(); verdictBox(); fit();<br /> if(cycle>=plan.length){ document.getElementById("stepBtn").disabled=true; document.getElementById("runBtn").disabled=true; }<br /> if(cb) cb();<br /> }<br /> }, running?170:230);<br /> }</p> <p> document.getElementById("stepBtn").onclick=function(){<br /> if(running||cycle>=plan.length) return;<br /> running=true; setBtns(true);<br /> applyCycle(function(){ running=false; setBtns(false); });<br /> };<br /> document.getElementById("runBtn").onclick=function(){<br /> if(running||cycle>=plan.length) return;<br /> running=true; setBtns(true);<br /> (function next(){<br /> if(cycle>=plan.length){ running=false; setBtns(false); return; }<br /> applyCycle(function(){ setTimeout(next,120); });<br /> })();<br /> };<br /> document.getElementById("resetBtn").onclick=function(){ if(running) return; reset(); };</p> <p> function setBtns(dis){<br /> document.getElementById("runBtn").disabled=dis||cycle>=plan.length;<br /> document.getElementById("stepBtn").disabled=dis||cycle>=plan.length;<br /> }</p> <p> // ---- auto-resize for WordPress embed ----<br /> function fit(){<br /> try{<br /> var el=document.getElementById("fapo-wrap");<br /> var h=el.offsetHeight+40;<br /> if(window.parent){ window.parent.postMessage({fapoHeight:h},"*"); }<br /> }catch(e){}<br /> }<br /> window.addEventListener("load",fit);<br /> window.addEventListener("resize",fit);</p> <p> reset();<br /> })();</p> <p>

Source link

Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

TL;DR

What is FAPO

How the Optimization Loop Works

The Benchmark Case: FAPO vs. GEPA

Where It Fits: Use Cases

Getting Started

Strengths and Weaknesses

Strengths

Weaknesses

Interactive Explainer

Like this:

Related

TL;DR

What is FAPO

How the Optimization Loop Works

The Benchmark Case: FAPO vs. GEPA

Where It Fits: Use Cases

Getting Started

Strengths and Weaknesses

Strengths

Weaknesses

Interactive Explainer

Share this:

Like this:

Related

Related News

Nous Research Updates Hermes Agent With a Blank Slate Mode That Pins Toolsets via platform_toolsets.cli and disabled_toolsets

7 Crucial Barriers Between Data Teams and Self-Healing Data Architecture

Making a PDF’s Images Searchable for RAG, Without Paying to Read Them All

Materialized Lake Views in Microsoft Fabric: When Your Medallion Fits in a SELECT Statement