arize.ai

Arize AI

arize.ai

Arize is an AI engineering platform focused on evaluation and observability. It helps engineers develop, evaluate, and observe AI applications and age

Pro tip: Better agents come from better harnesses.

If you want to build a better agent, start by tracing the run. A trace gives you everything the agent did: tool calls, retrieved context, intermediate decisions, eval results, and failed spans.

arize.ai

aparnadhinak shared tips on how to build better agents with aakashgupta in this video: youtube.com/watch…

Once you have traces, create one targeted eval.

Don't start with a giant eval framework. Pick one behavior that matters and evaluate it on real traces

youtube.com

Everyone is building in Claude Code. No one is running evals.

arize.ai

In the live session, Aparna showed two key loops:

- The agent loop - The improvement loop that reads failed evals, opens the relevant traces, groups failures, and proposes the smallest safe change to the agent, evaluator, data sources, or tools.

arize.ai

The improvement loop is where things get interesting, and it's how agent systems become more reliable over time.

Learn more in our write up: arize.com/blog…

arize.com

How to build a better agent harness with traces and evals

arize.ai

AI has changed how teams ship.

Attention is moving away from code to intent and validation—a shift that can massively increase team velocity when done correctly.

chintanturakhia will be breaking down how

@coinbase

has made significant changes to how they work, and what results they're seeing.

Grab your ticket. June 4th. Arize Observe. arize.com/obser…

arize.ai

aithreads

🛑 One AI Question with Robert Mackey

We asked our Account Manager: Why Arize?

His answer: Stop being reactive.

Arize gives you full visibility into every trace and span, moving you from "fixing bugs" to proactive observing. Ensure your AI agents deliver exactly what your customers expect—before they tell you something's wrong.

AIThreads

arize.ai

The fastest path to self-improving agents may be your orchestration layer.

Most teams already run agent workflows through schedulers, pipelines, and recurring jobs. The missing piece is turning those workflows into structured feedback loops.

Today, we’re open sourcing the Arize AX Airflow Provider. 🧵

arize.ai

Apache Airflow already orchestrates critical ML and data workflows across the industry.

Now it can orchestrate agent improvement loops too.

The new Arize AX Airflow Provider brings AX directly into Airflow, making it easier to:

• run evals automatically • score agent outputs at scale • route failures for inspection • benchmark changes before deployment • operationalize feedback loops

arize.ai

A few example DAGs we’re shipping:

→ Drift detection with auto rollback Run daily evals against a stable baseline.

→ Prompt lifecycle management Treat prompts like deployable artifacts with gated promotion workflows.

→ Behavioral regression testing Catch issues aggregate scores miss, including rising refusal rates, formatting drift, or response quality regressions.

→ RAG evaluation pipelines Export production traces, build eval datasets, and test retriever + generator performance.

arize.ai

Airflow is a natural control plane for this because many teams already trust it to run production infrastructure.

Now it can help run agent evaluation infrastructure too.

Learn more: arize.com/blog…

arize.com

From production traces to better AI agents: Automating the LLMOps feedback loop