Afleveringen

  • Instabase founder and CEO Anant Bhardwaj joins a16z Infra partner Guido Appenzeller to discuss the revolutionary impact of LLMs on analyzing unstructured data and documents (like letting banks verify identity and approve loans via WhatsApp) and shares his vision for how AI agents could take things even further (by automating actions based on those documents). In more detail, they discuss:

    Why legacy robotic process automation (RPA) struggles with unstructured inputs.How Instabase developed layout-aware models to extract insights from PDFs and complex documents.Why predictability, not perfection, is the key metric for generative AI in the enterprise.The growing role of AI agents at compile time (not runtime).A vision for decentralized, federated AI systems that scale automation across complex workflows.

    Follow everyone on X:

    Anant Bhardwaj

    Guido Appenzeller

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • LMArena cofounders Anastasios N. Angelopoulos, Wei-Lin Chiang, and Ion Stoica sit down with a16z general partner Anjney Midha to talk about the future of AI evaluation. As benchmarks struggle to keep up with the pace of real-world deployment, LMArena is reframing the problem: what if the best way to test AI models is to put them in front of millions of users and let them vote? The team discusses how Arena evolved from a research side project into a key part of the AI stack, why fresh and subjective data is crucial for reliability, and what it means to build a CI/CD pipeline for large models.

    They also explore:

    Why expert-only benchmarks are no longer enough.How user preferences reveal model capabilities — and their limits.What it takes to build personalized leaderboards and evaluation SDKs.Why real-time testing is foundational for mission-critical AI.

    Follow everyone on X:

    Anastasios N. Angelopoulos

    Wei-Lin Chiang

    Ion Stoica

    Anjney Midha

    Timestamps

    0:04 -  LLM evaluation: From consumer chatbots to mission-critical systems

    6:04 -  Style and substance: Crowdsourcing expertise

    18:51 -  Building immunity to overfitting and gaming the system

    29:49 -  The roots of LMArena

    41:29 -   Proving the value of academic AI research

    48:28 -  Scaling LMArena and starting a company

    59:59 -  Benchmarks, evaluations, and the value of ranking LLMs

    1:12:13 -  The challenges of measuring AI reliability

    1:17:57 -  Expanding beyond binary rankings as models evolve

    1:28:07 -  A leaderboard for each prompt

    1:31:28 -  The LMArena roadmap

    1:34:29 -  The importance of open source and openness

    1:43:10 -  Adapting to agents (and other AI evolutions)

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • Zijn er afleveringen die ontbreken?

    Klik hier om de feed te vernieuwen.

  • In this episode of AI + a16z, Distributional cofounder and CEO Scott Clark, and a16z partner Matt Bornstein, explore why building trust in AI systems matters more than just optimizing performance metrics. From understanding the hidden complexities of generative AI behavior to addressing the challenges of reliability and consistency, they discuss how to confidently deploy AI in production.

    Why is trust becoming a critical factor in enterprise AI adoption? How do traditional performance metrics fail to capture crucial behavioral nuances in generative AI systems? Scott and Matt dive into these questions, examining non-deterministic outcomes, shifting model behaviors, and the growing importance of robust testing frameworks.

    Among other topics, they cover:

    The limitations of conventional AI evaluation methods and the need for behavioral testing. How centralized AI platforms help enterprises manage complexity and ensure responsible AI use. The rise of "shadow AI" and its implications for security and compliance. Practical strategies for scaling AI confidently from prototypes to real-world applications.

    Follow everyone:

    Scott Clark

    Distributional

    Matt Bornstein

    Derrick Harris

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of the a16z AI podcast, a16z Infra partners Guido Appenzeller, Matt Bornstein, and Yoko Li explore how generative AI is reshaping software development. From its potential as a new high-level programming abstraction to its current practical impacts, they discuss whether AI coding tools will redefine what it means to be a developer.

    Why has coding emerged as one of AI's most powerful use cases? How much can AI truly boost developer productivity, and will it fundamentally change traditional computer science education? Guido, Yoko, and Matt dive deep into these questions, addressing the dynamics of "vibe coding," the enduring role of formal programming languages, and the critical challenge of managing non-deterministic behavior in AI-driven applications.Among other things, they discuss:

    The enormous market potential of AI-generated code, projected to deliver trillions in productivity gains.How "prompt-based programming" is evolving from Stack Overflow replacements into sophisticated development assistants.Why formal languages like Python and Java are here to stay, even as natural language interactions become common.The shifting landscape of programming education, and why understanding foundational abstractions remains essential.The unique complexities of integrating AI into enterprise software, from managing uncertainty to ensuring reliability.

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Anthropic's David Soria Parra — who created MCP (Model Context Protocol) along with Justin Spahr-Summers — sits down with a16z's Yoko Li to discuss the project's inception, exciting use cases for connecting LLMs to external sources, and what's coming next for the project. If you're unfamiliar with the wildly popular MCP project, this edited passage from their discussion is a great starting point to learn:

    David: "MCP tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers, and really bring the workflows you care about, the things you want to do, to these AI applications. It's a protocol that just defines how whatever you are building as a developer for that integration piece, and that AI application, talk to each other.

    "It's a very boring specification, but what it enables is hopefully ... something that looks like the current API ecosystem, but for LLM interactions."

    Yoko: "I really love the analogy with the API ecosystem, because they give people a mental model of how the ecosystem evolves ... Before, you may have needed a different spec to query Salesforce versus query HubSpot. Now you can use similarly defined API schema to do that.

    "And then when I saw MCP earlier in the year, it was very interesting in that it almost felt like a standard interface for the agent to interface with LLMs. It's like, 'What are the set of things that the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen?' When I tried it out, it was just super powerful and I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails, and I use it for everything on Cursor, on Claude Desktop, on Goose."

    Learn more:

    A Deep Dive Into MCP and the Future of AI Tooling

    What Is an AI Agent?

    Benchmarking AI Agents on Full-Stack Coding

    Agent Experience: Building an Open Web for the AI Era

    Follow everyone on X:

    David Soria Parra

    Yoko Li

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, a16z Infra partners Guido Appenzeller, Matt Bornstein, and Yoko Li discuss and debate one of the tech industry's buzziest words right now: AI agents. The trio digs into the topic from a number of angles, including:

    Whether a uniform definition of agent actually existsHow to distinguish between agents, LLMs, and functionsHow to think about pricing agentsWhether agents can actually replace humans, andThe effects of data siloes on agents that can access the web.

    They don't claim to have all the answers, but they raise many questions and insights that should interest anybody building, buying, and even marketing AI agents.

    Learn more:

    Benchmarking AI Agents on Full-Stack Coding

    Automating Developer Email with MCP and Al Agents

    A Deep Dive Into MCP and the Future of AI Tooling

    Agent Experience: Building an Open Web for the AI Era

    DeepSeek, Reasoning Models, and the Future of LLMs

    Agents, Lawyers, and LLMs

    Reasoning Models Are Remaking Professional Services

    From NLP to LLMs: The Quest for a Reliable Chatbot

    Can AI Agents Finally Fix Customer Support?

    Follow everybody on X:

    Guido Appenzeller

    Matt Bornstein

    Yoko Li

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode, a16z General Partner Martin Casado sits down with Sujay Jayakar, co-founder and Chief Scientist at Convex, to talk about his team’s latest work benchmarking AI agents on full-stack coding tasks. From designing Fullstack Bench to the quirks of agent behavior, the two dig into what’s actually hard about autonomous software development, and why robust evals—and guardrails like type safety—matter more than ever. They also get tactical: which models perform best for real-world app building? How should developers think about trajectory management and variance across runs? And what changes when you treat your toolchain like part of the prompt? Whether you're a hobbyist developer or building the next generation of AI-powered devtools, Sujay’s systems-level insights are not to be missed.

    Drawing from Sujay’s work developing the Fullstack-Bench, they cover:

    Why full-stack coding is still a frontier task for autonomous agentsHow type safety and other “guardrails” can significantly reduce variance and failureWhat makes a good eval—and why evals might matter more than clever promptsHow different models perform on real-world app-building tasks (and what to watch out for)Why your toolchain might be the most underrated part of the promptAnd what all of this means for devs—from hobbyists to infra teams building with AI in the loop

    Learn More:

    Introducing Fullstack-Bench

    Follow everyone on X:

    Sujay Jayakar

    Martin Casado

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Resend founder and CEO Zeno Rocha sits down with a16z partner Yoko Li to discuss:

    How generative AI — powered by agents and, now, MCP — is reshaping the email experience for developers, as well as the overall world of programming. Zeno's obsession with developer experience has evolved into designing for "agent experience" — a new frontier where LLM-powered agents are not only building products but also operating within them. How email, one of the most ubiquitous tools for developers and end users alike, is being reimagined for a future where agents send, parse, and optimize communication. What it means to build agent-friendly APIs. The emerging MCP protocol, and how AI is collapsing the creative loop for prosumers and developers alike.

    Learn more:

    What is AX (agent experience) and how to improve it

    A deep dive into MCP and the future of AI tooling

    Dracula theme

    Follow everyone on X:

    Zeno Rocha

    Yoko Li

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, a16z Partner Joe Schmidt sits down with 11x CTO Prabhav Jain for an inside look at how AI-powered digital workers are reshaping sales and revenue operations. They discuss the evolution of agentic AI, the trade-offs between orchestration and autonomy, and the technical innovations driving 11x’s products, Alice and Mike.

    Prabhav breaks down the challenges of real-time voice AI, the complexities of multimodal agent interactions, and why the future of enterprise AI is about delivering measurable customer outcomes—not just automation. They also dive into the fast-moving landscape of model providers, the impact of open-source AI, and how startups can stay ahead in an environment of constant technological change.

    Plus, they explore 11x’s bold decision to re-architect its platform from the ground up, the lessons learned from scaling AI-powered sales automation, and what it takes to build truly effective digital workers.

    Key Takeaways:The difference between true AI agents and complex orchestrations—and why it matters.How 11x built Alice and Mike to deliver human-like sales performance at scale.The cutting-edge advancements shaping AI voice assistants and real-time multimodal interactions.Lessons from rebuilding an AI platform while supporting a fast-growing customer base.How AI startups can balance rapid iteration with long-term strategic bets.

    For anyone interested in AI-powered automation, enterprise sales, or the future of digital work, this episode offers a front-row seat to the latest innovations pushing the boundaries of AI agents.

    Learn more:

    11x

    Follow everybody on X:

    Prabhav Jain

    Joe Schmidt

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Sesame Cofounder and CTO Ankit Kumar joins a16z general partner Anjney Midha for a deep dive into the research and engineering behind their voice technology. They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions.

    They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction.
    Plus, we take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.

    Key Takeaways:
    How Sesame AI achieves natural voice interactions through real-time speech generation.

    The impact of open-sourcing their speech model and what it means for AI research.The role of full-duplex modeling in improving AI responsiveness.How computational efficiency and system latency shape AI conversation quality.The growing role of natural language as a user interface in AI-driven experiences.

    For anyone interested in AI and voice technology, this episode offers an in-depth look at the latest advancements pushing the boundaries of human-computer interaction.

    Learn more:

    The Maya + Miles demo

    Crossing the uncanny valley of conversational voice

    Sesame CSM 1B model

    Follow everybody on X:

    Ankit Kumar

    Anjney Midha

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Netlify CEO and Cofounder Matt Biilmann joins a16z General Partner Martin Casado to explore how AI is reshaping web development — not just through faster code generation, but by fundamentally shifting how we think about building for the web. At the center of this shift is Agent Experience (AX), a new paradigm where AI agents aren’t just tools, but active participants in development, shaping both the creative process and the underlying infrastructure.

    Matt shares how Netlify is evolving to meet this future, why the next 100 million web developers will collaborate with AI, and what’s at stake if the web doesn’t adapt — will we see a thriving, open, AI-powered internet, or a future dominated by walled gardens?

    Learn more:

    Introducing AX: Why Agent Experience Matters

    Follow everyone on X:

    Matt Biilmann

    Martin Casado

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, a trio of security experts join a16z partner Joel de la Garza to discuss the security implications of the DeepSeek reasoning model that made waves recently. It's three separate discussions, focusing on different aspects of DeepSeek and the fast-moving world of generative AI.

    The first segment, with Ian Webster of Promptfoo, focuses on vulnerabilities within DeepSeek itself, and how users can protect themselves against backdoors, jailbreaks, and censorship.

    The second segment, with Dylan Ayrey of Truffle Security, focuses on the advent of AI-generated code and how developers and security teams can ensure it's safe. As Dylan explains, many problem lie in how the underlying models were trained and how their security alignment was carried out.

    The final segment features Brian Long of Adaptive, who highlights a growing list of risk vectors for deepfakes and other threats that generative AI can exacerbate. In his view, it's up to individuals and organizations to keep sharp about what's possible — while the the arms race between hackers and white-hat AI agents kicks into gear.

    Learn more:

    What Are the Security Risks of Deploying DeepSeek-R1?

    Research finds 12,000 ‘Live’ API Keys and Passwords in DeepSeek's Training Data

    Follow everybody on social media:

    Ian Webster

    Dylan Ayrey

    Brian Long

    Joel de la Garza

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Aatish Nayak, head of product at Harvey, sits down with a16z partner Kimberly Tan to share his experience building AI products for enterprises — including the legal profession — and how to address areas like UX, trust, and customer engagement. Importantly, Aatish explains, industries like law don't need AGI or even the latest and greatest models; they need products that augment their existing workflows so they can better serve clients and still make it home for dinner.

    Learn more:

    BigLaw Bench

    Follow everyone on X:

    Aatish Nayak

    Kimberly Tan

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, a16z partner Alex Immerman sits down with Hebbia founder and CEO George Sivulka to discuss the potential for reasoning models and AI agents to supercharge knowledge-worker productivity — and the global economy along with it. As George explains, his customers are already saving significant time and and effort on important, but monotonous, tasks, and improved models paired with savvy users will continue to reshape how industries including finance, law, and other professional services operate.

    Follow everyone on X:

    George Sivulka

    Alex Immerman

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Fivetran cofounder and CEO George Fraser and a16z partner Guido Appenzeller discuss how LLMs fit into the data management picture within large enterprises. In order to take advantage of a potentially revolutionary technology, organizations don't need to rip out their existing infrastructure, but they do need to rethink their data hygiene so language models can understand it.

    Follow everyone on X:

    George Fraser

    Guido Appenzeller

    Derrick Harris

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, a16z General Partner Martin Casado and Rasa cofounder and CEO Alan Nichol discuss the past, present, and future of AI agents and chatbots. Alan shares his history working to solve this problem with traditional natural language processing (NLP), expounds on how large language models (LLMs) are helping to dull the many sharp corners of natural-language interactions, and explains how pairing them with inflexible business logic is a great combination.

    Learn more:

    Task-Oriented Dialogue with In-Context Learning

    GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Application

    CALM Summit

    Follow everyone on X:

    Alan Nichol

    Martin Casado

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • A 2024 highlight reel, featuring founders sharing their insights, advice, and experiences building AI companies — from foundation-model labs to vertical applications. Topics include:

    Building AI tools for developersGetting into AI as a systems expertThe researcher-to-founder journeyFounding AI companies in specific industriesEarly lessons from selling AI agentsAnd more

    Companies include:

    AmbienceAnyscaleBlack Forest LabsCommandZeroDatabricksDecagonIdeogramInngestReplicateSocket

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of the AI + a16z podcast, Decagon cofounder/CEO Jesse Zhang and a16z partner Kimberly Tan discuss how LLMs are reshaping customer support, the strong market demand for AI agents, and how AI agents give startups a a new pricing model to help disrupt incumbents.

    Here's an excerpt of Jesse explaining how conversation-based pricing can win over customers who are used to traditional seat-based pricing:

    "Our view on this is that, in the past, software is based per seat because it's roughly scaled based on the number of people that can take advantage of the software.

    "With most AI agents, the value . . . doesn't really scale in terms of the number of people that are maintaining it; it's just the amount of work output. . . . The pricing that you want to provide has to be a model where the more work you do, the more that gets paid.

    "So for us, there's two obvious ways to do that: you can pay per conversation, or you can pay per resolution. One fun learning for us has been that most people have opted into the per-conversation model . . . It just creates a lot more simplicity and predictability.

    . . .

    "It's a little bit tricky for incumbents if they're trying to launch agents because it just cannibalizes their seat-based model. . . . Incumbents have less risk tolerance, naturally, because they have a ton of customers. And if they're iterating quickly and something doesn't go well, that's a big loss for them. Whereas, younger companies can always iterate a lot faster, and the iteration process just inherently leads to better product. . .

    "We always want to pride ourselves on shipping speed, quality of the product, and just how hardcore our team is in terms of delivering things."

    Learn more:

    RIP to RPA: The Rise of Intelligent Automation

    Big Ideas in Tech for 2025

    Follow everyone on X:

    Jesse Zhang

    Kimberly Tan

    Derrick Harris

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • This is a replay of our first episode from April 12, featuring Databricks VP of AI Naveen Rao and a16z partner Matt Bornstein discussing enterprise LLM adoption, hardware platforms, and what it means for AI to be mainstream. If you're unfamiliar with Naveen, he has been in the AI space for more than decade working on everything from custom hardware to LLMs, and has founded two successful startups — Nervana Systems and MosaicML.

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

  • In this episode of AI + a16z, Replicate cofounder and CEO Ben Firshman, and a16z partner Matt Bornstein, discuss the art of building products and companies that appeal to software developers. Ben was the creator of Docker Compose, and Replicate has a thriving community of developers hosting and fine-tuning their own models to power AI-based applications.

    Here's an excerpt of Ben and Matt discussing the difference in the variety of applications built using multimedia models compared with language models:

    Matt: "I've noticed there's a lot of really diverse multimedia AI apps out there. Meaning that when you give someone an amazing primitive, like a FLUX API call or a Stable Diffusion API call, and Replicate, there's so many things they can do with it. And we actually see that happening — versus with language, where all LLM apps look kind of the same if you squint a little bit.

    "It's like you chat with something — there's obviously code, there's language, there's a few different things — but I've been surprised that even today we don't see as many apps built on language models as we do based on, say, image models."

    Ben: "It certainly maps with what we're seeing, as well. I think these language models, beyond just chat apps, are particularly good at turning unstructured information into structured information. Which is actually kind of magical. And computers haven't been very good at that before. That is really a kind of cool use case for it.

    "But with these image models and video models and things like that, people are creating lots of new products that were not possible before — things that were just impossible for computers to do. So yeah, I'm certainly more excited by all the magical things these multimedia models can make."

    "But with these image models and video models and things like that, people are creating lots of new products that were just not possible before — things that were just impossible for computers to do. So yeah, I'm certainly more excited by all the magical things these multimedia models can make."

    Follow everyone on X:

    Ben Firshman

    Matt Bornstein

    Derrick Harris

    Learn more:

    Replicate's AI model hub

    Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.