Synopsis
With new interviews thrice-weekly, The New Stack Makers stream of featured speakers and interviews is all about the new software stacks that change the way we development and deploy software. For The New Stack Analysts podcast, please see https://soundcloud.com/thenewstackanalysts.For The New Stack @ Scale podcast, please see https://soundcloud.com/thenewstackatscaleSubcribe to TNS on YouTube at: https://www.youtube.com/c/TheNewStack
Episodes
-
Solving the Problems that Accompany API Sprawl with AI
15/01/2026 Duration: 19minAPI sprawl creates hidden security risks and missed revenue opportunities when organizations lose visibility into the APIs they build. According to IBM’s Neeraj Nargund, APIs power the core business processes enterprises want to scale, making automated discovery, observability, and governance essential—especially when thousands of APIs exist across teams and environments. Strong governance helps identify endpoints, remediate shadow APIs, and manage risk at scale. At the same time, enterprises increasingly want to monetize the data APIs generate, packaging insights into products and pricing and segmenting usage, a need amplified by the rise of AI.To address these challenges, Nargund highlights “smart APIs,” which are infused with AI to provide context awareness, event-driven behavior, and AI-assisted governance throughout the API lifecycle. These APIs help interpret and act on data, integrate with AI agents, and support real-time, streaming use cases.IBM’s latest API Connect release embeds AI across API manage
-
CloudBees CEO: Why Migration Is a Mirage Costing You Millions
13/01/2026 Duration: 34minA CloudBees survey reveals that enterprise migration projects often fail to deliver promised modernization benefits. In 2024, 57% of enterprises spent over $1 million on migrations, with average overruns costing $315,000 per project. In The New Stack Makers podcast, CloudBees CEO Anuj Kapur describes this pattern as “the migration mirage,” where organizations chase modernization through costly migrations that push value further into the future. Findings from the CloudBees 2025 DevOps Migration Index show leaders routinely underestimate the longevity and resilience of existing systems. Kapur notes that applications often outlast CIOs, yet new leadership repeatedly mandates wholesale replacement. The report argues modernization has been mistakenly equated with migration, which diverts resources from customer value to replatforming efforts. Beyond financial strain, migration erodes developer morale by forcing engineers to rework functioning systems instead of building new solutions. CloudBees advocates meeting d
-
Human Cognition Can’t Keep Up with Modern Networks. What’s Next?
07/01/2026 Duration: 23minIBM’s recent acquisitions of Red Hat, HashiCorp, and its planned purchase of Confluent reflect a deliberate strategy to build the infrastructure required for enterprise AI. According to IBM’s Sanil Nambiar, AI depends on consistent hybrid cloud runtimes (Red Hat), programmable and automated infrastructure (HashiCorp), and real-time, trustworthy data (Confluent). Without these foundations, AI cannot function effectively. Nambiar argues that modern, software-defined networks have become too complex for humans to manage alone, overwhelmed by fragmented data, escalating tool sophistication, and a widening skills gap that makes veteran “tribal knowledge” hard to transfer. Trust, he says, is the biggest barrier to AI adoption in networking, since errors can cause costly outages. To address this, IBM launched IBM Network Intelligence, a “network-native” AI solution that combines time-series foundation models with reasoning large language models. This architecture enables AI agents to detect subtle warning patterns,
-
From Group Science Project to Enterprise Service: Rethinking OpenTelemetry
30/12/2025 Duration: 17minAri Zilka, founder of MyDecisive.ai and former Hortonworks CPO, argues that most observability vendors now offer essentially identical, reactive dashboards that highlight problems only after systems are already broken. After speaking with all 23 observability vendors at KubeCon + CloudNativeCon North America 2025, Zilka said these tools fail to meaningfully reduce mean time to resolution (MTTR), a long-standing demand he heard repeatedly from thousands of CIOs during his time at New Relic.Zilka believes observability must shift from reactive monitoring to proactive operations, where systems automatically respond to telemetry in real time. MyDecisive.ai is his attempt to solve this, acting as a “bump in the wire” that intercepts telemetry and uses AI-driven logic to trigger actions like rolling back faulty releases.He also criticized the rising cost and complexity of OpenTelemetry adoption, noting that many companies now require large, specialized teams just to maintain OTel stacks. MyDecisive aims to turn Ope
-
Why You Can't Build AI Without Progressive Delivery
23/12/2025 Duration: 27minFormer GitHub CEO Thomas Dohmke’s claim that AI-based development requires progressive delivery frames a conversation between analyst James Governor and The New Stack’s Alex Williams about why modern release practices matter more than ever. Governor argues that AI systems behave unpredictably in production: models can hallucinate, outputs vary between versions, and changes are often non-deterministic. Because of this uncertainty, teams must rely on progressive delivery techniques such as feature flags, canary releases, observability, measurement and rollback. These practices, originally developed to improve traditional software releases, now form the foundation for deploying AI safely. Concepts like evaluations, model versioning and controlled rollouts are direct extensions of established delivery disciplines. Beyond AI, Governor’s book “Progressive Delivery” challenges DevOps thinking itself. He notes that DevOps focuses on development and operations but often neglects the user feedback loop. Using a framewo
-
How Nutanix Is Taming Operational Complexity
18/12/2025 Duration: 15minMost enterprises today run workloads across multiple IT infrastructures rather than a single platform, creating significant operational challenges. According to Nutanix CTO Deepak Goel, organizations face three major hurdles: managing operational complexity amid a shortage of cloud-native skills, migrating legacy virtual machine (VM) workloads to microservices-based cloud-native platforms, and running VM-based workloads alongside containerized applications. Many engineers have deep infrastructure experience but lack Kubernetes expertise, making the transition especially difficult and increasing the learning curve for IT administrators. To address these issues, organizations are turning to platform engineering and internal developer platforms that abstract infrastructure complexity and provide standardized “golden paths” for deployment. Integrated development environments (IDEs) further reduce friction by embedding capabilities like observability and security. Nutanix contributes through its hyper converged pl
-
Do All Your AI Workloads Actually Require Expensive GPUs?
18/12/2025 Duration: 29minGPUs dominate today’s AI landscape, but Google argues they are not necessary for every workload. As AI adoption has grown, customers have increasingly demanded compute options that deliver high performance with lower cost and power consumption. Drawing on its long history of custom silicon, Google introduced Axion CPUs in 2024 to meet needs for massive scale, flexibility, and general-purpose computing alongside AI workloads. The Axion-based C4A instance is generally available, while the newer N4A virtual machines promise up to 2x price performance.In this episode, Andrei Gueletii, a technical solutions consultant for Google Cloud joined Gari Singh, a product manager for Google Kubernetes Engine (GKE), and Pranay Bakre, a principal solutions engineer at Arm for this episode, recorded at KubeCon + CloudNativeCon North America, in Atlanta. Built on Arm Neoverse V2 cores, Axion processors emphasize energy efficiency and customization, including flexible machine shapes that let users tailor memory and CPU resource
-
Breaking Data Team Silos Is the Key to Getting AI to Production
17/12/2025 Duration: 30minEnterprises are racing to deploy AI services, but the teams responsible for running them in production are seeing familiar problems reemerge—most notably, silos between data scientists and operations teams, reminiscent of the old DevOps divide. In a discussion recorded at AWS re:Invent 2025, IBM’s Thanos Matzanas and Martin Fuentes argue that the challenge isn’t new technology but repeating organizational patterns. As data teams move from internal projects to revenue-critical, customer-facing applications, they face new pressures around reliability, observability, and accountability.The speakers stress that many existing observability and governance practices still apply. Standard metrics, KPIs, SLOs, access controls, and audit logs remain essential foundations, even as AI introduces non-determinism and a heavier reliance on human feedback to assess quality. Tools like OpenTelemetry provide common ground, but culture matters more than tooling.Both emphasize starting with business value and breaking down silos
-
Why AI Parallelization Will Be One of the Biggest Challenges of 2026
16/12/2025 Duration: 24minRob Whiteley, CEO of Coder, argues that the biggest winners in today’s AI boom resemble the “picks and shovels” sellers of the California Gold Rush: companies that provide tools enabling others to build with AI. Speaking onThe New Stack Makersat AWS re:Invent, Whiteley described the current AI moment as the fastest-moving shift he’s seen in 25 years of tech. Developers are rapidly adopting AI tools, while platform teams face pressure to approve them, as saying “no” is no longer viable. Whiteley warns of a widening gap between organizations that extract real value from AI and those that don’t, driven by skills shortages and insufficient investment in training. He sees parallels with the cloud-native transition and predicts the rise of “AI-native” companies. As agentic AI grows, developers increasingly act as managers overseeing many parallel AI agents, creating new challenges around governance, security, and state management. To address this, Coder introduced Mux, an open source coding agent multiplexer design
-
Kubernetes GPU Management Just Got a Major Upgrade
11/12/2025 Duration: 35minNvidia Distinguished Engineer Kevin Klues noted that low-level systems work is invisible when done well and highly visible when it fails — a dynamic that frames current Kubernetes innovations for AI. At KubeCon + CloudNativeCon North America 2025, Klues and AWS product manager Jesse Butler discussed two emerging capabilities: dynamic resource allocation (DRA) and a new workload abstraction designed for sophisticated AI scheduling.DRA, now generally available in Kubernetes 1.34, fixes long-standing limitations in GPU requests. Instead of simply asking for a number of GPUs, users can specify types and configurations. Modeled after persistent volumes, DRA allows any specialized hardware to be exposed through standardized interfaces, enabling vendors to deliver custom device drivers cleanly. Butler called it one of the most elegant designs in Kubernetes.Yet complex AI workloads require more coordination. A forthcoming workload abstraction, debuting in Kubernetes 1.35, will let users define pod groups with strict
-
The Rise of the Cognitive Architect
10/12/2025 Duration: 22minAt KubeCon North America 2025, GitLab’s Emilio Salvador outlined how developers are shifting from individual coders to leaders of hybrid human–AI teams. He envisions developers evolving into “cognitive architects,” responsible for breaking down large, complex problems and distributing work across both AI agents and humans. Complementing this is the emerging role of the “AI guardian,” reflecting growing skepticism around AI-generated code. Even as AI produces more code, humans remain accountable for reviewing quality, security, and compliance.Salvador also described GitLab’s “AI paradox”: developers may code faster with AI, but overall productivity stalls because testing, security, and compliance processes haven’t kept pace. To fix this, he argues organizations must apply AI across the entire development lifecycle, not just in coding. GitLab’s Duo Agent Platform aims to support that end-to-end transformation.Looking ahead, Salvador predicts the rise of a proactive “meta agent” that functions like a full team m
-
Why the CNCF's New Executive Director is Obsessed With Inference
09/12/2025 Duration: 25minJonathan Bryce, the new CNCF executive director, argues that inference—not model training—will define the next decade of computing. Speaking at KubeCon North America 2025, he emphasized that while the industry obsesses over massive LLM training runs, the real opportunity lies in efficiently serving these models at scale. Cloud-native infrastructure, he says, is uniquely suited to this shift because inference requires real-time deployment, security, scaling, and observability—strengths of the CNCF ecosystem. Bryce believes Kubernetes is already central to modern inference stacks, with projects like Ray, KServe, and emerging GPU-oriented tooling enabling teams to deploy and operationalize models. To bring consistency to this fast-moving space, the CNCF launched a Kubernetes AI Conformance Program, ensuring environments support GPU workloads and Dynamic Resource Allocation. With AI agents poised to multiply inference demand by executing parallel, multi-step tasks, efficiency becomes essential. Bryce predicts tha
-
Kubernetes Gets an AI Conformance Program — and VMware Is Already On Board
08/12/2025 Duration: 30minThe Cloud Native Computing Foundation has introduced the Certified Kubernetes AI Conformance Program to bring consistency to an increasingly fragmented AI ecosystem. Announced at KubeCon + CloudNativeCon North America 2025, the program establishes open, community-driven standards to ensure AI applications run reliably and portably across different Kubernetes platforms. VMware by Broadcom’s vSphere Kubernetes Service (VKS) is among the first platforms to achieve certification.In an interview with The New Stack, Broadcom leaders Dilpreet Bindra and Himanshu Singh explained that the program applies lessons from Kubernetes’ early evolution, aiming to reduce the “muddiness” in AI tooling and improve cross-platform interoperability. They emphasized portability as a core value: organizations should be able to move AI workloads between public and private clouds with minimal friction.VKS integrates tightly with vSphere, using Kubernetes APIs directly to manage infrastructure components declaratively. This approach, al
-
How etcd Solved Its Knowledge Drain with Deterministic Testing
05/12/2025 Duration: 21minThe etcd project — a distributed key-value store older than Kubernetes — recently faced significant challenges due to maintainer turnover and the resulting loss of unwritten institutional knowledge. Lead maintainer Marek Siarkowicz explained that as longtime contributors left, crucial expertise about testing procedures and correctness guarantees disappeared. This gap led to a problematic release that introduced critical reliability issues, including potential data inconsistencies after crashes.To rebuild confidence in etcd’s correctness, the new maintainer team introduced “robustness testing,” creating a framework inspired by Jepsen to validate both basic and distributed-system behavior. Their goal was to ensure linearizability, the “Holy Grail” of distributed systems, which required developing custom failure-injection tools and teaching the community how to debug complex scenarios.The team later partnered with Antithesis to apply deterministic simulation testing, enabling fully reproducible execution paths a
-
Helm 4: What’s New in the Open Source Kubernetes Package Manager?
03/12/2025 Duration: 24minHelm — originally a hackathon project called Kate’s Place — turned 10 in 2025, marking the milestone with the release of Helm 4, its first major update in six years. Created by Matt Butcher and colleagues as a playful take on “K8s,” the early project won a small prize but quickly grew into a serious effort when Deus leadership recognized the need for a Kubernetes package manager. Renamed Helm, it rapidly expanded with community contributors and became one of the first CNCF graduating projects.Helm 4 reflects years of accumulated design debt and evolving use cases. After the rapid iterations of Helm 1, 2, and 3, the latest version modernizes logging, improves dependency management, and introduces WebAssembly-based plugins for cross-platform portability—addressing the growing diversity of operating systems and architectures. Beyond headline features, maintainers emphasize that mature projects increasingly deliver “boring” but essential improvements, such as better logging, which simplify workflows and integrate
-
All About Cedar, an Open Source Solution for Fine-Tuning Kubernetes Authorization
02/12/2025 Duration: 16minKubernetes has relied on role-based access control (RBAC) since 2017, but its simplicity limits what developers can express, said Micah Hausler, principal engineer at AWS, on The New Stack Makers. RBAC only allows actions; it can’t enforce conditions, denials, or attribute-based rules. Seeking a more expressive authorization model for Kubernetes, Hausler explored Cedar, an authorization engine and policy language created at AWS in 2022 and later open-sourced. Although not designed specifically for Kubernetes, Cedar proved capable of modeling its authorization needs in a concise, readable way. Hausler highlighted Cedar’s clarity—nontechnical users can often understand policies at a glance—as well as its schema validation, autocomplete support, and formal verification, which ensures policies are correct and produce only allow or deny outcomes.Now onboarding to the CNCF sandbox, Cedar is used by companies like Cloudflare and MongoDB and offers language-agnostic tooling, including a Go implementation donated by S
-
Teaching a Billion People to Code: How JupyterLite Is Scaling the Impossible
01/12/2025 Duration: 19minJupyterLite, a fully browser-based distribution of JupyterLab, is enabling new levels of global scalability in technical education. Developed by Sylvain Corlay’s QuantStack team, it allows math and programming lessons to run entirely in students’ browsers — kernel included — without relying on Docker or cloud-scale infrastructure. Its most prominent success is Capytale, a French national deployment that supports half a million high school students and over 200,000 weekly sessions from essentially a single server, which hosts only teaching content while computation happens locally in each browser.QuantStack, founded in 2016 as what Corlay calls an “accidental startup,” has since grown into a 30-person team contributing across Jupyter, Conda-Forge, and Apache Arrow. But JupyterLite embodies its most ambitious goal: making programming education accessible to countries with rapidly growing youth populations, such as Nigeria, where traditional cloud-hosted notebooks are impractical. Achieving a billion-user future
-
2026 Will Be the Year of Agentic Workloads in Production on Amazon EKS
28/11/2025 Duration: 23minAWS’s approach to Elastic Kubernetes Service has evolved significantly since its 2018 launch. According to Mike Stefanik, Senior Manager of Product Management for EKS and ECR, today’s users increasingly represent the late majority—teams that want Kubernetes without managing every component themselves. In a conversation onThe New Stack Makers, Stefanik described how AI workloads are reshaping Kubernetes operations and why AWS open-sourced an MCP server for EKS. Early feedback showed that meaningful, task-oriented tool names—not simple API mirrors—made MCP servers more effective for LLMs, prompting AWS to design tools focused on troubleshooting, runbooks, and full application workflows. AWS also introduced a hosted knowledge base built from years of support cases to power more capable agents.While “agentic AI” gets plenty of buzz, most customers still rely on human-in-the-loop workflows. Stefanik expects that to shift, predicting 2026 as the year agentic workloads move into production. For experimentation, he r
-
From Cloud Native to AI Native: Where Are We Going?
28/11/2025 Duration: 44minAt KubeCon + CloudNativeCon 2025 in Atlanta, the panel of experts - Kate Goldenring of Fermyon Technologies, Idit Levine of Solo.io, Shaun O'Meara of Mirantis, Sean O'Dell of Dynatrace and James Harmison of Red Hat - explored whether the cloud native era has evolved into an AI native era — and what that shift means for infrastructure, security and development practices. Jonathan Bryce of the CNCF argued that true AI-native systems depend on robust inference layers, which have been overshadowed by the hype around chatbots and agents. As organizations push AI to the edge and demand faster, more personalized experiences, Fermyon’s Kate Goldenring highlighted WebAssembly as a way to bundle and securely deploy models directly to GPU-equipped hardware, reducing latency while adding sandboxed security.Dynatrace’s Sean O’Dell noted that AI dramatically increases observability needs: integrating LLM-based intelligence adds value but also expands the challenge of filtering massive data streams to understand user behavi
-
Amazon CTO Werner Vogels' Predictions for 2026
25/11/2025 Duration: 54minAWS re:Invent has long featured CTO Werner Vogels’ closing keynote, but this year he signaled it may be his last, emphasizing it’s time for “younger voices” at Amazon. After 21 years with the company, Vogels reflected on arriving as an academic and being stunned by Amazon’s technical scale—an energy that still drives him today. He released his annual predictions ahead of re:Invent, with this year’s five themes focused heavily on AI and broader societal impacts.Vogels highlights technology’s growing role in addressing loneliness, noting how devices like Alexa can offer comfort to those who feel isolated. He foresees a “Renaissance developer,” where engineers must pair deep expertise with broad business and creative awareness. He warns quantum-safe encryption is becoming urgent as data harvested today may be decrypted within five years. Military innovations, he notes, continue to influence civilian tech, for better and worse. Finally, he argues personalized learning can preserve children’s curiosity and better