Synopsis
With new interviews thrice-weekly, The New Stack Makers stream of featured speakers and interviews is all about the new software stacks that change the way we development and deploy software. For The New Stack Analysts podcast, please see https://soundcloud.com/thenewstackanalysts.For The New Stack @ Scale podcast, please see https://soundcloud.com/thenewstackatscaleSubcribe to TNS on YouTube at: https://www.youtube.com/c/TheNewStack
Episodes
-
Confronting AI’s Next Big Challenge: Inference Compute
06/08/2025 Duration: 24minWhile AI training garners most of the spotlight — and investment — the demands ofAI inferenceare shaping up to be an even bigger challenge. In this episode ofThe New Stack Makers, Sid Sheth, founder and CEO of d-Matrix, argues that inference is anything but one-size-fits-all. Different use cases — from low-cost to high-interactivity or throughput-optimized — require tailored hardware, and existing GPU architectures aren’t built to address all these needs simultaneously.“The world of inference is going to be truly heterogeneous,” Sheth said, meaning specialized hardware will be required to meet diverse performance profiles. A major bottleneck? The distance between memory and compute. Inference, especially in generative AI and agentic workflows, requires constant memory access, so minimizing the distance data must travel is key to improving performance and reducing cost.To address this, d-Matrix developed Corsair, a modular platform where memory and compute are vertically stacked — “like pancakes” — enabling fa
-
Databricks VP: Don’t Try to Speed AI Evolution through Brute Force
04/08/2025 Duration: 38minIn the latest episode ofThe New Stack Agents, Naveen Rao, VP of AI at Databricks and a former neuroscientist, reflects on the evolution of AI, neural networks, and the energy constraints that define both biological and artificial intelligence. Rao, who once built circuit systems as a child and later studied the brain’s 20-watt efficiency at Duke and Brown, argues that current AI development—relying on massive energy-intensive data centers—is unsustainable. He believes true intelligence should emerge from low-power, efficient systems, more aligned with biological computing.Rao warns that the industry is headed toward “model collapse,” where large language models (LLMs) begin training on AI-generated content instead of real-world data, leading to compounding inaccuracies and hallucinations. He stresses the importance of grounding AI in reality and moving beyond brute-force scaling. Rao sees intelligence not just as a function of computing power, but as a distributed, observational system—“life is a learning mac
-
How Fal.ai Went From Inference Optimization to Hosting Image and Video Models
25/07/2025 Duration: 52minFal.ai, once focused on machine learning infrastructure, has evolved into a major player in generative media. In this episode of The New Stack Agents, hosts speak with Fal.ai CEO Burkay Gur and investor Glenn Solomon of Notable Capital. Originally aiming to optimize Python runtimes, Fal.ai shifted direction as generative AI exploded, driven by tools like DALL·E and ChatGPT. Today, Fal.ai hosts hundreds of models—from image to audio and video—and emphasizes fast, optimized inference to meet growing demand.Speed became Fal.ai’s competitive edge, especially as newer generative models require GPU power not just for training but also for inference. Solomon noted that while optimization alone isn't a sustainable business model, Fal’s value lies in speed and developer experience. Fal.ai offers both an easy-to-use web interface and developer-focused APIs, appealing to both technical and non-technical users.Gur also addressed generative AI’s impact on creatives, arguing that while the cost of creation has plummeted, t
-
Why AI Agents Need a New Kind of Browser
18/07/2025 Duration: 48minTraditional headless browsers weren’t built for AI agents, often breaking when web elements shift even slightly. Paul Klein IV, founder of Browserbase and its open-source tool Stagehand, is tackling this by creating a browser infrastructure designed specifically for AI control. On The New Stack Agents podcast, Klein explained that Stagehand enables AI agents to interpret vague, natural-language instructions and still function reliably—even when web pages change. This flexibility contrasts with brittle legacy tools built for deterministic testing. Instead of writing 100 scripts for 100 websites, one AI-powered script can now handle thousands.Klein’s broader vision is a world where AI can fully operate the web on behalf of users—automating tasks like filing taxes without human input. He acknowledges the technical challenges, from running browsers on servers to handling edge cases like time zones and emojis. The episode also touches on Klein’s concerns with AWS, which he says held a “partnership” meeting that fe
-
How AWS is Working to Help Developers with AI Reality
11/07/2025 Duration: 40minIn a recent episode of The New Stack Agents livestream, Antje Barth, AWS Developer Advocate for Generative AI, discussed the growing developer interest in building agentic and multi-agent systems. While foundational model knowledge is now common, Barth noted that developers are increasingly focused on tools, frameworks, and protocols for scaling agent-based applications. She emphasized the complexity of deploying such systems, particularly around navigating human-centric interfaces and minimizing latency in multi-agent communication.Barth highlighted AWS’s support for developers through tools like Amazon Q CLI and the newly launched open-source Strands SDK, which AWS used internally to accelerate development cycles. Strands enables faster, flexible agentic system development, while services like Bedrock Agents offer a managed, enterprise-ready solution.Security was another key theme. Barth stressed that safety must be a “day one” priority, with built-in support for authentication, secure communication, and ob
-
How Shortwave Wants To Reinvent Email With AI
03/07/2025 Duration: 36minIn this episode of The New Stack Agents, Andrew Lee, co-founder of Shortwave and Firebase, discusses the evolution of his Gmail-centric email client into an AI-first platform. Initially launched in 2020 with traditional improvements like better threading and search, Shortwave pivoted to agentic AI after the rise of large language models (LLMs). Early features like summarization and translation garnered hype but lacked deep utility. However, as models improved in 2023—especially Anthropic’s Claude Sonnet 3.5—Shortwave leaned heavily into tool-calling agents that could execute complex, multi-step tasks autonomously. Lee notes Anthropic’s lead in this area, especially in chaining tools intelligently, unlike earlier models from OpenAI. Still, challenges remain with managing large numbers of tools without breaking model reasoning. Looking ahead, Lee envisions AI that can take proactive actions—like responding to emails—and dynamically generate interfaces tailored to tasks in real-time. This shift could fundamental
-
Cracking the Complexity: Teleport CEO Pushes Identity-First Security
18/06/2025 Duration: 21minIn this on-the-road episode of The New Stack Makers, Editor in Chief Heather Joslyn speaks with Ev Kontsevoy, CEO and co-founder of Teleport, from the floor of KubeCon + CloudNativeCon Europe in London. The discussion centers on infrastructure security and the growing need for robust identity management. Citing alarming cybersecurity statistics—such as the $5 million average cost of a breach and rising attack frequency—Kontsevoy stresses that complexity is the root challenge in securing infrastructure. Today’s environments involve countless layers and technologies, each with its own identity and access controls, increasing the risk of human error and breaches. Kontsevoy argues for treating all entities—humans, laptops, servers, AI agents—as identities managed under a unified framework. Teleport provides a zero trust access platform that enforces strong, cryptographically-backed identity across systems. He also highlights Teleport’s version 17 release, which boosts support for non-human identities and integrat
-
No SSH? What is Talos, this Linux Distro for Kubernetes?
12/06/2025 Duration: 19minContainer-based Linux distributions are gaining traction, especially for edge deployments that demand lightweight and secure operating systems. Talos Linux, developed by Sidero Labs, is purpose-built for Kubernetes with security-first features like a fully immutable file system and disabled SSH access. In a demo, Sidero CTO Andrew Rynhard and Head of Product Justin Garrison explained Talos’s design philosophy, highlighting its minimalism and focus on automation. Inspired by CoreOS, Talos removes traditional tools like systemd and Bash, replacing them with machineD, a custom process manager written in Go.Talos emphasizes API-driven management rather than SSH, making Kubernetes cluster operations more scalable and consistent. Its design supports cloud, bare metal, Docker, and edge devices like Raspberry Pi. Kernel immutability is reinforced by ephemeral signing keys. Through Sidero's Omni SaaS, Talos nodes connect securely via WireGuard. The operating system handles all certificates and network connectivity int
-
Aptori Is Building an Agentic AI Security Engineer
03/06/2025 Duration: 18minAI agents hold the promise of continuously testing, scanning, and fixing code for security vulnerabilities, but we're still progressing toward that vision. Startups like Aptori are helping bridge the gap by building AI-powered security engineers for enterprises. Aptori maps an organization’s codebase, APIs, and cloud infrastructure in real time to understand data flows and authorization logic, allowing it to detect and eventually remediate security issues. At Google Cloud Next, Aptori CEO Sumeet Singh discussed how earlier tools merely alerted developers to issues—often overwhelming them—but newer models like Gemini 2.5 Flash and Claude Sonnet 4 are improving automated code fixes, making them more practical. Singh and co-founder Travis Newhouse previously built AppFormix, which automated OpenStack cloud operations before being acquired by Juniper Networks. Their experiences with slow release cycles due to security bottlenecks inspired Aptori’s focus. While the goal is autonomous agents, Singh emphasizes the n
-
The AI Code Generation Problem Nobody's Talking About
29/05/2025 Duration: 19minIn this episode ofThe New Stack Makers, Nitric CEO Steve Demchuk discusses how the frustration of building frontend apps within rigid FinTech environments led to the creation of the Nitric framework — a tool designed to eliminate the friction between developers and cloud infrastructure. Unlike traditional Infrastructure as Code (IaC), where developers must manage both app logic and infrastructure definitions separately, Nitric introduces “Infrastructure from Code.” This approach allows developers to focus solely on application logic while the platform infers and automates infrastructure needs using SDKs and CLI tools across multiple languages and cloud providers.Demchuk emphasizes that Nitric doesn't remove platform team control but enforces it consistently. Guardrails defined by platform teams guide infrastructure provisioning, ensuring security and compliance — even as developers use AI tools to rapidly generate code. The result is a streamlined workflow where developers move faster, AI enhances productivit
-
The New Bottleneck: AI That Codes Faster Than Humans Can Review
27/05/2025 Duration: 20minCodeRabbit, led by founder Harjot Gill, is tackling one of software development's biggest bottlenecks: the human code review process. While AI coding tools like GitHub Copilot have sped up code generation, they’ve inadvertently slowed down shipping due to increased complexity in code reviews. Developers now often review AI-generated code they didn’t write, leading to misunderstandings, bugs, and security risks. In an episode of The New Stack Makers, Gill discusses how Code Rabbit leverages advanced reasoning models—OpenAI’s o1, o3 mini, and Anthropic’s Claude series—to automate and enhance code reviews. Unlike rigid, rule-based static analysis tools, Code Rabbit builds rich context at scale by spinning up sandbox environments for pull requests and allowing AI agents to navigate codebases like human reviewers. These agents can run CLI commands, analyze syntax trees, and pull in external context from Jira or vulnerability databases. Gill envisions a hybrid future where AI handles the grunt work of code review,
-
Google Cloud Next Wrap-Up
22/05/2025 Duration: 18minAt the close of this year’s Google Cloud Next, The New Stack’s Alex Williams, AI editor Frederic Lardinois, and analyst Janakiram MSV discussed the event’s dominant theme: AI agents. The conversation focused heavily on agent frameworks, noting a shift from last year's third-party tools like Langchain, CrewAI, and Microsoft’s Autogen, to first-party offerings from model providers themselves. Google’s newly announced Agent Development Kit (ADK) highlights this trend, following closely on the heels of OpenAI’s agent SDK. MSV emphasized the significance of this shift, calling it a major milestone as Google joins the race alongside Microsoft and OpenAI. Despite the buzz, Lardinois pointed out that many companies are still exploring how AI agents can fit into real-world workflows. The panel also highlighted how Google now delivers a full-stack AI development experience — from models to deployment platforms like Vertex AI. New enterprise tools like Agent Space and Agent Garden further signal Google’s commitment to m
-
Agentic AI and A2A in 2025: From Prompts to Processes
20/05/2025 Duration: 19minAgentic AI represents the next phase beyond generative AI, promising systems that not only generate content but also take autonomous actions within business processes. In a conversation recorded at Google Cloud Next, Kevin Laughridge of Deloitte explains that businesses are moving from AI pilots to production-scale deployments. Agentic AI enables decision-making, reasoning, and action across complex enterprise environments, reducing the need for constant human input. A key enabler is Google’s newly announced open Agent2Agent (A2A) protocol, which allows AI agents from different vendors to communicate and collaborate securely across platforms. Over 50 companies, including PayPal, Salesforce, and Atlassian, are already adopting it. However, deploying agentic AI at scale requires more than individual tools—it demands an AI platform with runtime frameworks, UIs, and connectors. These platforms allow enterprises to integrate agents across clouds and systems, paving the way for AI that is collaborative, adaptive, a
-
Your AI Coding Buddy Is Always Available at 2 a.m.
15/05/2025 Duration: 20minAja Hammerly, director of developer relations at Google, sees AI as the always-available coding partner developers have long wished for—especially in those late-night bursts of inspiration. In a conversation with Alex Williams at Google Cloud Next, she described AI-assisted coding as akin to having a virtual pair programmer who can fill in gaps and offer real-time support. Hammerly urges developers to start their AI journey with tools that assist in code writing and explanation before moving into more complex AI agents. She distinguishes two types of DevEx AI: using AI to build apps and using it to eliminate developer toil. For Hammerly, this includes letting AI handle frontend work while she focuses on backend logic. The newly launched Firebase Studio exemplifies this dual approach, offering an AI-enhanced IDE with flexible tools like prototyping, code completion, and automation. Her advice? Developers should explore how AI fits into their unique workflow—because development, at its core, is deeply personal
-
Google AI Infrastructure PM On New TPUs, Liquid Cooling and More
13/05/2025 Duration: 19minAt Google Cloud Next '25, the company introduced Ironwood, its most advanced custom Tensor Processing Unit (TPU) to date. With 9,216 chips per pod delivering 42.5 exaflops of compute power, Ironwood doubles the performance per watt compared to its predecessor. Senior product manager Chelsie Czop explained that designing TPUs involves balancing power, thermal constraints, and interconnectivity. Google's long-term investment in liquid cooling, now in its fourth generation, plays a key role in managing the heat generated by these powerful chips. Czop highlighted the incremental design improvements made visible through changes in the data center setup, such as liquid cooling pipe placements. Customers often ask whether to use TPUs or GPUs, but the answer depends on their specific workloads and infrastructure. Some, like Moloco, have seen a 10x performance boost by moving directly from CPUs to TPUs. However, many still use both TPUs and GPUs. As models evolve faster than hardware, Google relies on collaborations w
-
Google Cloud Therapist on Bringing AI to Cloud Native Infrastructure
08/05/2025 Duration: 24minAt Google Cloud Next, Bobby Allen, Group Product Manager for Google Kubernetes Engine (GKE), emphasized GKE’s foundational role in supporting AI platforms. While AI dominates current tech conversations, Allen highlighted that cloud-native infrastructure like Kubernetes is what enables AI workloads to function efficiently. GKE powers key Google services like Vertex AI and is trusted by organizations including DeepMind, gaming companies, and healthcare providers for AI model training and inference. Allen explained that GKE offers scalability, elasticity, and support for AI-specific hardware like GPUs and TPUs, making it ideal for modern workloads. He noted that Kubernetes was built with capabilities—like high availability and secure orchestration—that are now essential for AI deployment. Looking forward, GKE aims to evolve into a model router, allowing developers to access the right AI model based on function, not vendor, streamlining the development experience. Allen described GKE as offering maximum control w
-
VMware's Kubernetes Evolution: Quashing Complexity
06/05/2025 Duration: 30minWithout this, developers waste time managing infrastructure instead of focusing on code. VMware addresses this with VCF, a pre-integrated Kubernetes solution that includes components like Harbor, Valero, and Istio, all managed by VMware. While some worry about added complexity from abstraction, Turner dismissed concerns about virtualization overhead, pointing to benchmarks showing 98.3% of bare metal performance for virtualized AI workloads. He emphasized that AI is driving nearly half of Kubernetes deployments, prompting VMware’s partnership with Nvidia to support GPU virtualization. Turner also highlighted VMware's open source leadership, contributing to major projects and ensuring Kubernetes remains cloud-independent and standards-based. VMware aims to simplify Kubernetes and AI workload management while staying committed to the open ecosystem.Learn more from The New Stack about the latest insights with VMware Has VMware Finally Caught Up With Kubernetes?VMware’s Golden PathJoin our community of newsletter
-
Prequel: Software Errors Be Gone
05/05/2025 Duration: 05minPrequel is launching a new developer-focused service aimed at democratizing software error detection—an area typically dominated by large cloud providers. Co-founded by Lyndon Brown and Tony Meehan, both former NSA engineers, Prequel introduces a community-driven observability approach centered on Common Reliability Enumerations (CREs). CREs categorize recurring production issues, helping engineers detect, understand, and communicate problems without reinventing solutions or working in isolation. Their open-source tools, cre and prereq, allow teams to build and share detectors that catch bugs and anti-patterns in real time—without exposing sensitive data, thanks to edge processing using WebAssembly.The urgency behind Prequel’s mission stems from the rapid pace of AI-driven development, increased third-party code usage, and rising infrastructure costs. Traditional observability tools may surface symptoms, but Prequel aims to provide precise problem definitions and actionable insights. While observability giant
-
Arm’s Open Source Leader on Meeting the AI Challenge
01/05/2025 Duration: 18minAt Arm, open source is the default approach, with proprietary software requiring justification, says Andrew Wafaa, fellow and senior director of software communities. Speaking at KubeCon + CloudNativeCon Europe, Wafaa emphasized Arm’s decade-long commitment to open source, highlighting its investment in key projects like the Linux kernel, GCC, and LLVM. This investment is strategic, ensuring strong support for Arm’s architecture through vital tools and system software.Wafaa also challenged the hype around GPUs in AI, asserting that CPUs—especially those enhanced with Arm’s Scalable Matrix Extension (SME2) and Scalable Vector Extension (SVE2)—are often more suitable for inference workloads. CPUs offer greater flexibility, and Arm’s innovations aim to reduce dependency on expensive GPU fleets.On the AI framework front, Wafaa pointed to PyTorch as the emerging hub, likening its ecosystem-building potential to Kubernetes. As a PyTorch Foundation board member, he sees PyTorch becoming the central open source platf
-
Why Kubernetes Cost Optimization Keeps Failing
29/04/2025 Duration: 17minIn today’s uncertain economy, businesses are tightening costs, including for Kubernetes (K8s) operations, which are notoriously difficult to optimize. Yodar Shafrir, co-founder and CEO of ScaleOps, explained at KubeCon + CloudNativeCon Europe that dynamic, cloud-native applications have constantly shifting loads, making resource allocation complex. Engineers must provision enough resources to handle spikes without overspending, but in large production clusters with thousands of applications, manual optimization often fails. This leads to 70–80% resource waste and performance issues. Developers typically prioritize application performance over operational cost, and AI workloads further strain resources. Existing optimization tools offer static recommendations that quickly become outdated due to the dynamic nature of workloads, risking downtime. Shafrir emphasized that real-time, fully automated solutions like ScaleOps' platform are crucial. By dynamically adjusting container-level resources based on real-time