Qwen’s Former Lead on What Hybrid Thinking Got Wrong — and Why He Now Backs Agents

Junyang Lin was the technical lead of Alibaba’s Qwen project. He announced he was stepping down on March 3, 2026. He now lists himself as an independent researcher on his personal site.

In a talk titled ‘Qwen: Towards a Generalist Model / Agent,‘ he walks through the Qwen family. It ends on a single line: “Training models -> training agents.” He later expanded that line into an detailed post as an independent researcher. This article reads the talk and the detailed post together.

What Lin’s Talk Actually Covers

The talk is a tour of the Qwen model family, not a single release. It moves through QwQ-32B, Qwen2.5-Max, Qwen3, Qwen2.5-VL, and Qwen2.5-Omni. Each stop shows benchmark charts against contemporaries. The named baselines include DeepSeek-R1, Grok 3 Beta, Gemini 2.5 Pro, and OpenAI’s o-series.

The Qwen3 stop carries the most detail. Lin highlights hybrid thinking modes: a thinking mode for step-by-step reasoning, and a non-thinking mode for near-instant responses. He adds dynamic thinking budgets, so callers can cap how much the model reasons. Qwen3 expanded multilingual support from 29 to 119 languages and dialects.

The presentation lists many model types and sizes from 0.6B to 235B parameters. It also lists quantized formats including GGUF, GPTQ, AWQ, and MLX, all under Apache 2.0. Two demos follow: a Web Dev demo and a Deep Research demo. The closing “Future work” slide points at agents. It lists more pretraining, RL with environment feedback, longer context, and more modalities. The last key mention is the “training models -> training agents.”

Qwen3 Architecture, As Shown in the Talk

The talk includes the Qwen3 architecture tables, reproduced below.

Model	Layers	Heads (Q/KV)	Tie Embedding / Experts (Total/Act.)	Context
Qwen3-0.6B	28	16 / 8	Tie: Yes	32K
Qwen3-1.7B	28	16 / 8	Tie: Yes	32K
Qwen3-4B	36	32 / 8	Tie: Yes	32K
Qwen3-8B	36	32 / 8	Tie: No	128K
Qwen3-14B	40	40 / 8	Tie: No	128K
Qwen3-32B	64	64 / 8	Tie: No	128K
Qwen3-30B-A3B	48	32 / 4	Experts: 128 / 8	128K
Qwen3-235B-A22B	94	64 / 4	Experts: 128 / 8	128K

The small dense models tie input and output embeddings and use a 32K context. The larger dense and MoE models drop tying and extend context to 128K. The two MoE models activate 8 of 128 experts per token.

Hybrid Thinking, and Why Merging is Hard

Lin presents hybrid thinking as a clean feature. The post explains why it was hard to build. Lin writes that thinking mode and instruct mode pull in opposite directions.

A strong instruct model is rewarded for directness, brevity, and low latency. A strong thinking model is rewarded for spending more tokens on hard problems. Merge the two carelessly, and both degrade. The thinking behavior gets bloated, and the instruct behavior gets less crisp.

Qwen3 tried the merge with a four-stage post-training pipeline. That pipeline included a long-CoT cold start, reasoning RL, and a “thinking mode fusion” step. Later in 2025, the 2507 line shipped separate Instruct and Thinking variants instead. Lin frames this as a data problem more than a model problem.

Anthropic took the opposite route, and Lin calls it a useful corrective. Claude 3.7 Sonnet shipped as a hybrid model with a user-set thinking budget. Claude 4 let reasoning interleave with tool use, aimed at coding and long-running tasks. His point: a longer reasoning trace does not make a model smarter. Thinking should be shaped by the target workload, not by the benchmark.

Interactive Explainer

<br /><head><br /><meta charset="UTF-8"><br /><meta name="viewport" content="width=device-width, initial-scale=1.0"><br /><title>Reasoning vs Agentic Thinking</title></p><style>:root{ --bg:#F5F4EE; --panel:#FBFAF7; --card:#FFFFFF; --ink:#191917; --ink2:#54524B; --ink3:#8A867B; --clay:#D97757; --clay-d:#C4623F; --clay-soft:#F3E1D8; --line:#E4DFD2; --line2:#D8D2C2; --ok:#5B8A6F; --warn:#B98A2E; --r:14px; --r2:10px; --f:-apple-system,BlinkMacSystemFont,"Segoe UI",Inter,Roboto,Helvetica,Arial,sans-serif; --serif:"Iowan Old Style","Palatino Linotype",Palatino,Georgia,serif; } *{box-sizing:border-box} body{margin:0;background:var(--bg);color:var(--ink);font-family:var(--f); -webkit-font-smoothing:antialiased;line-height:1.5} .wrap{max-width:860px;margin:0 auto;padding:22px 18px 14px} .head h1{font-family:var(--serif);font-weight:600;font-size:23px;line-height:1.2;margin:0 0 6px; letter-spacing:-.2px} .head h1 .arrow{color:var(--clay)} .head p{margin:0;color:var(--ink2);font-size:13.5px;max-width:640px} .bar{display:flex;flex-wrap:wrap;gap:8px;align-items:center;margin:18px 0 4px} .lab{font-size:11px;letter-spacing:.06em;text-transform:uppercase;color:var(--ink3); font-weight:600;margin-right:4px} .chip{border:1px solid var(--line2);background:var(--card);color:var(--ink2); padding:7px 13px;border-radius:999px;font-size:13px;cursor:pointer;font-weight:500; transition:all .15s ease;font-family:var(--f)} .chip:hover{border-color:var(--clay);color:var(--ink)} .chip.on{background:var(--clay);border-color:var(--clay);color:#fff} .ctrls{display:flex;gap:8px;margin:16px 0 6px;flex-wrap:wrap} .btn{border:none;background:var(--ink);color:#fff;padding:9px 16px;border-radius:var(--r2); font-size:13px;font-weight:600;cursor:pointer;font-family:var(--f);transition:opacity .15s} .btn:hover{opacity:.88} .btn.sec{background:var(--card);color:var(--ink);border:1px solid var(--line2)} .btn.clay{background:var(--clay)} .btn:disabled{opacity:.4;cursor:default} .grid{display:grid;grid-template-columns:1fr 1fr;gap:14px;margin-top:10px} .col{background:var(--panel);border:1px solid var(--line);border-radius:var(--r); padding:14px 14px 16px;min-height:230px} .col h3{margin:0 0 2px;font-size:14.5px;font-weight:700;display:flex;align-items:center;gap:8px} .col .tag{font-size:11px;color:var(--ink3);margin:0 0 12px} .dot{width:9px;height:9px;border-radius:50%;flex:0 0 auto} .d-r{background:#9a93de}.d-a{background:var(--clay)} .steps{display:flex;flex-direction:column;gap:7px} .step{opacity:0;transform:translateY(6px);transition:all .32s ease; border:1px solid var(--line);background:var(--card);border-radius:var(--r2); padding:8px 11px;font-size:12.5px;color:var(--ink)} .step.show{opacity:1;transform:none} .step .k{font-weight:700;font-size:10.5px;letter-spacing:.05em;text-transform:uppercase; color:var(--ink3);display:block;margin-bottom:1px} .step.think{border-left:3px solid #9a93de} .step.act{border-left:3px solid var(--clay)} .step.env{border-left:3px solid var(--ok);background:#F4F7F4} .step.env .k{color:var(--ok)} .step.rev{border-left:3px solid var(--warn);background:#FBF6EB} .step.rev .k{color:var(--warn)} .step.done{border-left:3px solid var(--ink);background:#F1EFE9;font-weight:600} .loopnote{font-size:10.5px;color:var(--ink3);text-align:center;margin:6px 0 0;font-style:italic} .scale{margin-top:22px;background:var(--panel);border:1px solid var(--line); border-radius:var(--r);padding:15px 16px 12px} .scale h3{margin:0 0 3px;font-size:14.5px} .scale .sub{font-size:12px;color:var(--ink2);margin:0 0 14px} .slwrap{display:flex;align-items:center;gap:14px;flex-wrap:wrap} .slbox{flex:1;min-width:220px} input[type=range]{width:100%;accent-color:var(--clay);height:22px} .ticks{display:flex;justify-content:space-between;font-size:10.5px;color:var(--ink3); margin-top:-2px} .readout{background:var(--clay-soft);border:1px solid #E9C9BA;border-radius:var(--r2); padding:8px 14px;text-align:center;min-width:120px} .readout .big{font-family:var(--serif);font-size:26px;font-weight:700;color:var(--clay-d); line-height:1} .readout .sm{font-size:10.5px;color:var(--ink2);margin-top:2px} .chart{display:flex;align-items:flex-end;gap:10px;height:96px;margin:16px 2px 4px; border-bottom:1px solid var(--line2);padding-bottom:0} .cbar{flex:1;display:flex;flex-direction:column;align-items:center;justify-content:flex-end;height:100%} .cbar .bar-el{width:70%;background:linear-gradient(180deg,#E7A98F,var(--clay)); border-radius:5px 5px 0 0;transition:height .35s ease,background .2s;min-height:3px} .cbar.act .bar-el{background:linear-gradient(180deg,#C4623F,#A94E2F)} .cbar .v{font-size:10.5px;color:var(--ink2);margin-bottom:3px;font-weight:600} .cbar .x{font-size:10px;color:var(--ink3);margin-top:5px} .note{font-size:11px;color:var(--ink3);margin-top:12px;line-height:1.45; border-top:1px dashed var(--line2);padding-top:9px} .foot{margin-top:16px;padding-top:12px;border-top:1px solid var(--line); display:flex;justify-content:space-between;align-items:center;flex-wrap:wrap;gap:6px} .foot .brand{font-size:12px;color:var(--ink2)} .foot .brand b{color:var(--clay-d)} .foot .mini{font-size:10.5px;color:var(--ink3)} @media(max-width:640px){ .grid{grid-template-columns:1fr} .head h1{font-size:20px} .wrap{padding:16px 13px 12px} }</style><p></head><br /><body></p><div class="wrap"><div class="head"><h1>Reasoning Thinking <span class="arrow">→</span> Agentic Thinking</h1><p>Two ways a model can “think.” One deliberates, then answers. The other thinks in order to act, looping with an environment. Pick a task and step through both.</p></p></div><div class="bar"> <span class="lab">Task</span><br /> <button class="chip on" data-task="0">Fix a failing test</button><br /> <button class="chip" data-task="1">Research a question</button><br /> <button class="chip" data-task="2">Solve a hard math problem</button></div><div class="ctrls"> <button class="btn clay" id="run">▶ Step through</button><br /> <button class="btn sec" id="auto">Auto-run</button><br /> <button class="btn sec" id="reset">Reset</button></div><div class="grid"><div class="col"><h3><span class="dot d-r"></span> Reasoning model</h3><div class="tag">Deliberate internally, emit one answer · no world in the loop</div><div class="steps" id="rsteps"></div><div class="loopnote" id="rnote"></div></p></div><div class="col"><h3><span class="dot d-a"></span> Agentic system</h3><div class="tag">Plan · act · observe feedback · revise · repeat over long horizons</div><div class="steps" id="asteps"></div><div class="loopnote" id="anote"></div></p></div></p></div><div class="scale"><h3>Test-time scaling: more thinking budget, more accuracy</h3><p class="sub">MathVision accuracy vs. max thinking length, as shown in Junyang Lin’s talk. Drag to set the thinking budget.</p><div class="slwrap"><div class="slbox"> <input type="range" id="budget" min="0" max="3" step="1" value="1"></p><div class="ticks"><span>4k</span><span>8k</span><span>16k</span><span>24k</span></div></p></div><div class="readout"><div class="big" id="acc">45.6%</div><div class="sm" id="budlab">8k tokens</div></p></div></p></div><div class="chart" id="chart"></div><div class="note" id="tradenote"></div></p></div><div class="foot"><div class="brand">Built by <b>Marktechpost</b> · interactive explainer</div><div class="mini">Loops are illustrative. Accuracy figures are from the talk (MathVision, test-time scaling).</div></p></div></div><p><script data-no-optimize="1">window.lazyLoadOptions=Object.assign({},{threshold:300},window.lazyLoadOptions||{});!function(t,e){"object"==typeof exports&&"undefined"!=typeof module?module.exports=e():"function"==typeof define&&define.amd?define(e):(t="undefined"!=typeof globalThis?globalThis:t||self).LazyLoad=e()}(this,function(){"use strict";function e(){return(e=Object.assign||function(t){for(var e=1;e<arguments.length;e++){var n,a=arguments[e];for(n in a)Object.prototype.hasOwnProperty.call(a,n)&&(t[n]=a[n])}return t}).apply(this,arguments)}function o(t){return e({},at,t)}function l(t,e){return t.getAttribute(gt+e)}function c(t){return l(t,vt)}function s(t,e){return function(t,e,n){e=gt+e;null!==n?t.setAttribute(e,n):t.removeAttribute(e)}(t,vt,e)}function i(t){return s(t,null),0}function r(t){return null===c(t)}function u(t){return c(t)===_t}function d(t,e,n,a){t&&(void 0===a?void 0===n?t(e):t(e,n):t(e,n,a))}function f(t,e){et?t.classList.add(e):t.className+=(t.className?" ":"")+e}function _(t,e){et?t.classList.remove(e):t.className=t.className.replace(new RegExp("(^|\\s+)"+e+"(\\s+|$)")," ").replace(/^\s+/,"").replace(/\s+$/,"")}function g(t){return t.llTempImage}function v(t,e){!e||(e=e._observer)&&e.unobserve(t)}function b(t,e){t&&(t.loadingCount+=e)}function p(t,e){t&&(t.toLoadCount=e)}function n(t){for(var e,n=[],a=0;e=t.children[a];a+=1)"SOURCE"===e.tagName&&n.push(e);return n}function h(t,e){(t=t.parentNode)&&"PICTURE"===t.tagName&&n(t).forEach(e)}function a(t,e){n(t).forEach(e)}function m(t){return!!t[lt]}function E(t){return t[lt]}function I(t){return delete t[lt]}function y(e,t){var n;m(e)||(n={},t.forEach(function(t){n[t]=e.getAttribute(t)}),e[lt]=n)}function L(a,t){var o;m(a)&&(o=E(a),t.forEach(function(t){var e,n;e=a,(t=o[n=t])?e.setAttribute(n,t):e.removeAttribute(n)}))}function k(t,e,n){f(t,e.class_loading),s(t,st),n&&(b(n,1),d(e.callback_loading,t,n))}function A(t,e,n){n&&t.setAttribute(e,n)}function O(t,e){A(t,rt,l(t,e.data_sizes)),A(t,it,l(t,e.data_srcset)),A(t,ot,l(t,e.data_src))}function w(t,e,n){var a=l(t,e.data_bg_multi),o=l(t,e.data_bg_multi_hidpi);(a=nt&&o?o:a)&&(t.style.backgroundImage=a,n=n,f(t=t,(e=e).class_applied),s(t,dt),n&&(e.unobserve_completed&&v(t,e),d(e.callback_applied,t,n)))}function x(t,e){!e||0<e.loadingCount||0<e.toLoadCount||d(t.callback_finish,e)}function M(t,e,n){t.addEventListener(e,n),t.llEvLisnrs[e]=n}function N(t){return!!t.llEvLisnrs}function z(t){if(N(t)){var e,n,a=t.llEvLisnrs;for(e in a){var o=a[e];n=e,o=o,t.removeEventListener(n,o)}delete t.llEvLisnrs}}function C(t,e,n){var a;delete t.llTempImage,b(n,-1),(a=n)&&--a.toLoadCount,_(t,e.class_loading),e.unobserve_completed&&v(t,n)}function R(i,r,c){var l=g(i)||i;N(l)||function(t,e,n){N(t)||(t.llEvLisnrs={});var a="VIDEO"===t.tagName?"loadeddata":"load";M(t,a,e),M(t,"error",n)}(l,function(t){var e,n,a,o;n=r,a=c,o=u(e=i),C(e,n,a),f(e,n.class_loaded),s(e,ut),d(n.callback_loaded,e,a),o||x(n,a),z(l)},function(t){var e,n,a,o;n=r,a=c,o=u(e=i),C(e,n,a),f(e,n.class_error),s(e,ft),d(n.callback_error,e,a),o||x(n,a),z(l)})}function T(t,e,n){var a,o,i,r,c;t.llTempImage=document.createElement("IMG"),R(t,e,n),m(c=t)||(c[lt]={backgroundImage:c.style.backgroundImage}),i=n,r=l(a=t,(o=e).data_bg),c=l(a,o.data_bg_hidpi),(r=nt&&c?c:r)&&(a.style.backgroundImage='url("'.concat(r,'")'),g(a).setAttribute(ot,r),k(a,o,i)),w(t,e,n)}function G(t,e,n){var a;R(t,e,n),a=e,e=n,(t=Et[(n=t).tagName])&&(t(n,a),k(n,a,e))}function D(t,e,n){var a;a=t,(-1<It.indexOf(a.tagName)?G:T)(t,e,n)}function S(t,e,n){var a;t.setAttribute("loading","lazy"),R(t,e,n),a=e,(e=Et[(n=t).tagName])&&e(n,a),s(t,_t)}function V(t){t.removeAttribute(ot),t.removeAttribute(it),t.removeAttribute(rt)}function j(t){h(t,function(t){L(t,mt)}),L(t,mt)}function F(t){var e;(e=yt[t.tagName])?e(t):m(e=t)&&(t=E(e),e.style.backgroundImage=t.backgroundImage)}function P(t,e){var n;F(t),n=e,r(e=t)||u(e)||(_(e,n.class_entered),_(e,n.class_exited),_(e,n.class_applied),_(e,n.class_loading),_(e,n.class_loaded),_(e,n.class_error)),i(t),I(t)}function U(t,e,n,a){var o;n.cancel_on_exit&&(c(t)!==st||"IMG"===t.tagName&&(z(t),h(o=t,function(t){V(t)}),V(o),j(t),_(t,n.class_loading),b(a,-1),i(t),d(n.callback_cancel,t,e,a)))}function $(t,e,n,a){var o,i,r=(i=t,0<=bt.indexOf(c(i)));s(t,"entered"),f(t,n.class_entered),_(t,n.class_exited),o=t,i=a,n.unobserve_entered&&v(o,i),d(n.callback_enter,t,e,a),r||D(t,n,a)}function q(t){return t.use_native&&"loading"in HTMLImageElement.prototype}function H(t,o,i){t.forEach(function(t){return(a=t).isIntersecting||0<a.intersectionRatio?$(t.target,t,o,i):(e=t.target,n=t,a=o,t=i,void(r(e)||(f(e,a.class_exited),U(e,n,a,t),d(a.callback_exit,e,n,t))));var e,n,a})}function B(e,n){var t;tt&&!q(e)&&(n._observer=new IntersectionObserver(function(t){H(t,e,n)},{root:(t=e).container===document?null:t.container,rootMargin:t.thresholds||t.threshold+"px"}))}function J(t){return Array.prototype.slice.call(t)}function K(t){return t.container.querySelectorAll(t.elements_selector)}function Q(t){return c(t)===ft}function W(t,e){return e=t||K(e),J(e).filter(r)}function X(e,t){var n;(n=K(e),J(n).filter(Q)).forEach(function(t){_(t,e.class_error),i(t)}),t.update()}function t(t,e){var n,a,t=o(t);this._settings=t,this.loadingCount=0,B(t,this),n=t,a=this,Y&&window.addEventListener("online",function(){X(n,a)}),this.update(e)}var Y="undefined"!=typeof window,Z=Y&&!("onscroll"in window)||"undefined"!=typeof navigator&&/(gle|ing|ro)bot|crawl|spider/i.test(navigator.userAgent),tt=Y&&"IntersectionObserver"in window,et=Y&&"classList"in document.createElement("p"),nt=Y&&1<window.devicePixelRatio,at={elements_selector:".lazy",container:Z||Y?document:null,threshold:300,thresholds:null,data_src:"src",data_srcset:"srcset",data_sizes:"sizes",data_bg:"bg",data_bg_hidpi:"bg-hidpi",data_bg_multi:"bg-multi",data_bg_multi_hidpi:"bg-multi-hidpi",data_poster:"poster",class_applied:"applied",class_loading:"litespeed-loading",class_loaded:"litespeed-loaded",class_error:"error",class_entered:"entered",class_exited:"exited",unobserve_completed:!0,unobserve_entered:!1,cancel_on_exit:!0,callback_enter:null,callback_exit:null,callback_applied:null,callback_loading:null,callback_loaded:null,callback_error:null,callback_finish:null,callback_cancel:null,use_native:!1},ot="src",it="srcset",rt="sizes",ct="poster",lt="llOriginalAttrs",st="loading",ut="loaded",dt="applied",ft="error",_t="native",gt="data-",vt="ll-status",bt=[st,ut,dt,ft],pt=[ot],ht=[ot,ct],mt=[ot,it,rt],Et={IMG:function(t,e){h(t,function(t){y(t,mt),O(t,e)}),y(t,mt),O(t,e)},IFRAME:function(t,e){y(t,pt),A(t,ot,l(t,e.data_src))},VIDEO:function(t,e){a(t,function(t){y(t,pt),A(t,ot,l(t,e.data_src))}),y(t,ht),A(t,ct,l(t,e.data_poster)),A(t,ot,l(t,e.data_src)),t.load()}},It=["IMG","IFRAME","VIDEO"],yt={IMG:j,IFRAME:function(t){L(t,pt)},VIDEO:function(t){a(t,function(t){L(t,pt)}),L(t,ht),t.load()}},Lt=["IMG","IFRAME","VIDEO"];return t.prototype={update:function(t){var e,n,a,o=this._settings,i=W(t,o);{if(p(this,i.length),!Z&&tt)return q(o)?(e=o,n=this,i.forEach(function(t){-1!==Lt.indexOf(t.tagName)&&S(t,e,n)}),void p(n,0)):(t=this._observer,o=i,t.disconnect(),a=t,void o.forEach(function(t){a.observe(t)}));this.loadAll(i)}},destroy:function(){this._observer&&this._observer.disconnect(),K(this._settings).forEach(function(t){I(t)}),delete this._observer,delete this._settings,delete this.loadingCount,delete this.toLoadCount},loadAll:function(t){var e=this,n=this._settings;W(t,n).forEach(function(t){v(t,e),D(t,n,e)})},restoreAll:function(){var e=this._settings;K(e).forEach(function(t){P(t,e)})}},t.load=function(t,e){e=o(e);D(t,e)},t.resetStatus=function(t){i(t)},t}),function(t,e){"use strict";function n(){e.body.classList.add("litespeed_lazyloaded")}function a(){console.log("[LiteSpeed] Start Lazy Load"),o=new LazyLoad(Object.assign({},t.lazyLoadOptions||{},{elements_selector:"[data-lazyloaded]",callback_finish:n})),i=function(){o.update()},t.MutationObserver&&new MutationObserver(i).observe(e.documentElement,{childList:!0,subtree:!0,attributes:!0})}var o,i;t.addEventListener?t.addEventListener("load",a,!1):t.attachEvent("onload",a)}(window,document);</script><script data-no-optimize="1">window.litespeed_ui_events=window.litespeed_ui_events||["mouseover","click","keydown","wheel","touchmove","touchstart"];var urlCreator=window.URL||window.webkitURL;function litespeed_load_delayed_js_force(){console.log("[LiteSpeed] Start Load JS Delayed"),litespeed_ui_events.forEach(e=>{window.removeEventListener(e,litespeed_load_delayed_js_force,{passive:!0})}),document.querySelectorAll("iframe[data-litespeed-src]").forEach(e=>{e.setAttribute("src",e.getAttribute("data-litespeed-src"))}),"loading"==document.readyState?window.addEventListener("DOMContentLoaded",litespeed_load_delayed_js):litespeed_load_delayed_js()}litespeed_ui_events.forEach(e=>{window.addEventListener(e,litespeed_load_delayed_js_force,{passive:!0})});async function litespeed_load_delayed_js(){let t=[];for(var d in document.querySelectorAll('script[type="litespeed/javascript"]').forEach(e=>{t.push(e)}),t)await new Promise(e=>litespeed_load_one(t[d],e));document.dispatchEvent(new Event("DOMContentLiteSpeedLoaded")),window.dispatchEvent(new Event("DOMContentLiteSpeedLoaded"))}function litespeed_load_one(t,e){console.log("[LiteSpeed] Load ",t);var d=document.createElement("script");d.addEventListener("load",e),d.addEventListener("error",e),t.getAttributeNames().forEach(e=>{"type"!=e&&d.setAttribute("data-src"==e?"src":e,t.getAttribute(e))});let a=!(d.type="text/javascript");!d.src&&t.textContent&&(d.src=litespeed_inline2src(t.textContent),a=!0),t.after(d),t.remove(),a&&e()}function litespeed_inline2src(t){try{var d=urlCreator.createObjectURL(new Blob([t.replace(/^(?:)?$/gm,"$1")],{type:"text/javascript"}))}catch(e){d="data:text/javascript;base64,"+btoa(t.replace(/^(?:)?$/gm,"$1"))}return d}</script><script data-no-optimize="1">var litespeed_vary=document.cookie.replace(/(?:(?:^|.*;\s*)_lscache_vary\s*\=\s*([^;]*).*$)|^.*$/,"");litespeed_vary||(sessionStorage.getItem("litespeed_reloaded")?console.log("LiteSpeed: skipping guest vary reload (already reloaded this session)"):fetch("/wp-content/plugins/litespeed-cache/guest.vary.php",{method:"POST",cache:"no-cache",redirect:"follow"}).then(e=>e.json()).then(e=>{console.log(e),e.hasOwnProperty("reload")&&"yes"==e.reload&&(sessionStorage.setItem("litespeed_docref",document.referrer),sessionStorage.setItem("litespeed_reloaded","1"),window.location.reload(!0))}));</script><script data-optimized="1" type="litespeed/javascript" data-src="https://digitalsolucen.com/wp-content/litespeed/js/672c2029b3b2585ccb6bf5b8a6171bed.js?ver=26c18"></script></body><br /></html><br /> “>

From ‘Reasoning’ Thinking to ‘Agentic’ Thinking

Lin draws a line between two eras. The first was reasoning thinking, defined by o1 and DeepSeek-R1. It taught the field that RL needs deterministic, verifiable rewards, so math, code, and logic became central. It also turned RL into a systems problem of large-scale rollouts and verification.

The next era, in his framing, is agentic thinking: thinking in order to act. An agent formulates plans, decides when to act, uses tools, reads environment feedback, and revises. It is defined by closed-loop interaction with the world, not by a long internal monologue.

Lin lists what agentic thinking must handle that pure reasoning can avoid:

Deciding when to stop thinking and take an action
Choosing which tool to invoke, and in what order
Incorporating noisy or partial observations from the environment
Revising plans after failures
Maintaining coherence across many turns and many tool calls

The optimization target changes with the era. The table below summarizes the contrast Lin draws.

Dimension	Reasoning thinking	Agentic thinking
Judged by	Quality of internal deliberation before an answer	Whether progress is sustained while acting
Reward signal	Verifiable answers (math, code, logic)	Task success in an interactive environment
Core object of training	The model	The model plus its environment (the harness)
Infra bottleneck	Rollouts, verification, stable policy updates	Tool servers, sandboxes, train-serve decoupling
Main failure mode	Verbose, low-value reasoning traces	Reward hacking through tool access and env leaks

Use Cases, With Examples

The distinction changes how you build:

Coding agents: A reasoning model emits one patch from a stack trace. An agentic system runs the test harness, reads the real error, revises, and re-runs until the suite passes. Thinking here should help with codebase navigation, error recovery, and tool orchestration.
Deep research: A reasoning model writes a long answer from memory. An agentic system breaks the question into sub-queries, calls search, drops weak sources, and returns grounded citations. Qwen’s own Deep Research demo sits in this category.
Multi-agent orchestration: Lin expects ‘harness engineering’ to matter more. An orchestrator plans and routes work. Specialized sub-agents execute narrower tasks and help control context pollution.

A Concrete Hook: Qwen3 Thinking Toggle

Hybrid thinking is exposed directly in code. The enable_thinking flag switches modes in the chat template.

from transformers import AutoModelForCausalLM, AutoTokenizer

name = "Qwen/Qwen3-8B"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
    name, torch_dtype="auto", device_map="auto"
)

messages = [{"role": "user", "content": "Refactor this function and explain the change."}]

# enable_thinking=True  -> step-by-step thinking mode
# enable_thinking=False -> near-instant, non-thinking mode
text = tok.apply_chat_template(
    messages, tokenize=False,
    add_generation_prompt=True, enable_thinking=True,
)
inputs = tok(text, return_tensors="pt").to(model.device)

# Qwen's recommended sampling for thinking mode
out = model.generate(
    **inputs, max_new_tokens=2048,
    temperature=0.6, top_p=0.95, top_k=20,
)

enable_thinking=True is the default, and the output wraps reasoning in a ... block. Qwen3 also accepts soft switches. Appending /think or /no_think to a user turn flips the mode per message. That per-turn control is what dynamic thinking budgets build on.

Why Agentic RL Infrastructure is Harder

The presentation’s core engineering point is about infrastructure. In reasoning RL, rollouts are mostly self-contained trajectories with clean evaluators. In agentic RL, the policy lives inside a harness of tool servers, browsers, terminals, and sandboxes.

That harness forces a new requirement: training and inference must be cleanly decoupled. Without it, rollout throughput collapses. A coding agent waiting on live test execution stalls inference and starves training. GPU utilization drops well below what reasoning RL achieves.

Lin also reframes what to obsess over. In the SFT era, teams optimized data diversity. In the agent era, he argues teams should optimize environment quality: stability, realism, coverage, and exploit resistance. He names reward hacking as the hardest problem, because tool access enlarges the attack surface for spurious optimization.

Key Takeaways

Junyang Lin left Qwen on March 3, 2026, and now publishes as an independent researcher.
His talk ends on one thesis: the field is moving from training models to training agents.
Agentic thinking is judged by sustained action in an environment, not by internal deliberation.
Agentic RL needs decoupled train-serve infra and high-quality environments, not just verifiable rewards.
Reward hacking is the central risk once models gain real tool access.

Sources:

Primary source — the talk

https://www.youtube.com/watch?v=b0xlsQ_6wUQ

Primary source — Junyang Lin’s Blog

“From ‘Reasoning’ Thinking to ‘Agentic’ Thinking”: https://justinlin610.github.io/blog/from-reasoning-to-agentic-thinking/
His homepage (independent-researcher status): https://justinlin610.github.io/

Qwen3 technical details (architecture, 119 languages, hybrid thinking)

Qwen3 Technical Report (arXiv:2505.09388): https://arxiv.org/abs/2505.09388 · HTML: https://arxiv.org/html/2505.09388v1

Code verification (enable_thinking, /think /no_think, sampling)

Qwen docs Quickstart: https://qwen.readthedocs.io/en/latest/getting_started/quickstart.html
Qwen3-8B model card: https://huggingface.co/Qwen/Qwen3-8B
Qwen3-32B model card: https://huggingface.co/Qwen/Qwen3-32B

Departure facts (cited in the article)

TechCrunch: https://techcrunch.com/2026/03/03/alibabas-qwen-tech-lead-steps-down-after-major-ai-push/
Bloomberg: https://www.bloomberg.com/news/articles/2026-03-04/alibaba-qwen-head-who-warned-of-openai-gap-steps-down
VentureBeat: https://venturebeat.com/technology/did-alibaba-just-kneecap-its-powerful-qwen-ai-team-key-figures-depart-in

Supporting departure/context coverage (used for cross-checking, not all cited inline)

RecodeChinaAI (LatePost translation): https://www.recodechinaai.com/p/alibabas-qwen-lead-just-stepped-down
Simon Willison: https://simonwillison.net/2026/Mar/4/qwen/
Geopolitechs: https://www.geopolitechs.org/p/inside-the-stepping-down-of-qwens
OfficeChai: https://officechai.com/ai/alibaba-qwens-tech-lead-junyang-lin-steps-down/
MLQ News: https://mlq.ai/news/key-researcher-steps-down-from-alibabas-qwen-ai-project/
GenAI Assembling (essay analysis, used to first locate the essay): https://genaiassembling.substack.com/p/what-junyang-lin-saw

Two X posts

https://x.com/h100envy/status/2068987470960623783
https://x.com/h100envy/status/2073433806254624930

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Source link

Qwen’s Former Lead on What Hybrid Thinking Got Wrong — and Why He Now Backs Agents

What Lin’s Talk Actually Covers

Qwen3 Architecture, As Shown in the Talk

Hybrid Thinking, and Why Merging is Hard

Interactive Explainer

From ‘Reasoning’ Thinking to ‘Agentic’ Thinking

Use Cases, With Examples

A Concrete Hook: Qwen3 Thinking Toggle

Why Agentic RL Infrastructure is Harder

Key Takeaways

Sources:

Like this:

Related

What Lin’s Talk Actually Covers

Qwen3 Architecture, As Shown in the Talk

Hybrid Thinking, and Why Merging is Hard

Interactive Explainer

From ‘Reasoning’ Thinking to ‘Agentic’ Thinking

Use Cases, With Examples

A Concrete Hook: Qwen3 Thinking Toggle

Why Agentic RL Infrastructure is Harder

Key Takeaways

Sources:

Share this:

Like this:

Related

Related News

Structured PDF-to-JSON: A Guide to Open-Source Extraction Models in 2026

Anthropic Launches Claude Science Beta: A Multi-Agent AI Workbench for Reproducible Genomics, Proteomics, and Cheminformatics Pipelines

NVIDIA HORIZON: A Hands-Free Agent that Evolves Git Worktrees and Hits 100% RTL Benchmark Completion

Setting Up Your Own Large Language Model