Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken

fig, ax = plt.subplots(2, 2, figsize=(14, 9)) lang_counts.head(12).iloc[::-1].plot.barh(ax=ax[0, 0], color=”#76b900″) ax[0, 0].set_title(“Top 12 languages (sample)”); ax[0, 0].set_xlabel(“files”) df[“ext”].value_counts().head(12).iloc[::-1].plot.barh(ax=ax[0, 1], color=”#5b8def”) ax[0, 1].set_title(“Top 12 file extensions (sample)”); ax[0, 1].set_xlabel(“files”) df[“depth”].clip(upper=12).plot.hist(bins=range(0, 14), ax=ax[1, 0], color=”#f4a261″, edgecolor=”white”) ax[1, 0].set_title(“Directory nesting depth”); ax[1, 0].set_xlabel(“‘/’ count in path”) (df[“repo”].value_counts().head(10).iloc[::-1] .plot.barh(ax=ax[1, 1], color=”#9b5de5″)) ax[1, 1].set_title(“Most common repos (sample)”); ax[1, 1].set_xlabel(“files”) plt.tight_layout(); […]
Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Google just announced Gemini 3.5 Live Translate. It is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model detects over 70 languages automatically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn systems wait for […]
10 Common RAG Mistakes We Keep Seeing in Production

I of this series with Angela Shi. This pitfalls article lists the failure modes we both kept seeing on production RAG systems, and that pushed us toward the four-brick contract in the first place. I’ll admit something. Even when we work on this series, we dump big documents into ChatGPT. One PDF, one question, send, […]
The Hardware That Makes AI Possible

AI, we often describe it as a software revolution, which it is! From breakthroughs in neural networks and transformers to large language models, it is easy to assume that these smart algorithms are responsible for the progress we have seen in recent years. But today, I want to shed light on how modern AI is […]
Prefill Once, Fan Out: KV Snapshot Sharing for Multi-Agent LLM Pipelines

A humorous-but-real tour of SwarmKV — KV-snapshot fan-out, copy-on-fork host buffers, and how to make a two-agent analytical pipeline ~1.95× faster (and the second branch’s activation latency 52× faster) by being mildly mean to llama.cpp. of the “Production-Grade Agentic Inference” series. Each part removes one kind of redundant work from an agentic LLM pipeline. Part […]
The Exact ML Project I’d Build to Get Hired in 2026

I get asked all the time: “What project should I build?” The question is filled with great intention, but it’s fundamentally flawed. over 100 applications and portfolios, and only a few times has someone’s project wowed me enough to progress them to the interview stage. So in this article, I’m going to give you the […]
How to sign PDFs easily online with a PDF signer

Signing PDFs has become an important task for businesses and individuals alike. Whether you’re handling contracts, legal agreements, or forms, the ability to quickly and securely sign PDFs online is essential. Fortunately, with the rise of online PDF signers, signing PDFs has never been easier. Common challenges in signing PDFs Signing PDFs might seem straightforward, […]
Autonomous AI Data Loss in DevOps: How to Survive It

Autonomous AI agents are altering the speed at which software is shipped. Unfortunately, they are also shrinking the time it takes for a mistake to become a catastrophe, creating a dangerous blind spot in many security strategies. The threat no longer comes just from external ransomware or malicious insiders. It comes from authorized, internal tools. […]
NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab

print(“\n” + “=” * 90) print(“[5] cuTile kernels are defined only if cuda.tile imports successfully”) print(“=” * 90) if cutile_import_ok: ConstInt = ct.Constant[int] @ct.kernel def cutile_vec_add_direct_kernel(a, b, c, TILE: ConstInt): bid = ct.bid(0) a_tile = ct.load(a, index=(bid,), shape=(TILE,)) b_tile = ct.load(b, index=(bid,), shape=(TILE,)) c_tile = a_tile + b_tile ct.store(c, index=(bid,), tile=c_tile) @ct.kernel def cutile_vec_add_gather_kernel(a, b, […]
A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

A new working research from Perplexity and Harvard offers field evidence on what AI agents do to knowledge work. It draws on production data from two Perplexity products: Search and Computer. The setup is a natural comparison. Search is a conversational answer engine. Computer is an agent that plans and executes tasks end to end. […]
