✨ Auto create domain-specific evals

Test and Evaluate your AI systems

From experiment tracking, to observability, to human annotation,
Parea helps teams confidently ship LLM apps to production.

Backed by

TRUSTED BY TEAMS AT

https://framerusercontent.com/images/aY6Q4SWoy8enOp9JEFs0CtbDJCs.jpg

https://assets-global.website-files.com/64754d86591b8cffd50b9e95/66298f22eba31dd243a9dd7e_6MB_fqkE_400x400.jpg

https://codestory.ai/_next/image?url=%2Fcs-logomark.png&w=48&q=75

Evaluation

Test, and track performance over time. Debug failures. Answer questions like “which samples regressed when I made a change?”, and “does upgrading to this new model improve performance?”

Docs

Human Review

Collect human feedback from end users, subject matter experts, and product teams. Comment on, annotate and label logs for Q&A and fine-tuning.

Docs

Prompt Playground & Deployment

Tinker with multiple prompts on samples, test them on large datasets, and deploy the good ones into production.

Docs

Observability

Log production and staging data. Debug issues, run online evals, and capture user feedback. Track cost, latency, and quality in one place.

Docs

Datasets

Incorporate logs from staging & production into test datasets and use them to fine-tune models.

Docs

Simple Python & JavaScript SDKs

Python

from openai import OpenAI
from parea import Parea, trace
 
client = OpenAI()
p = Parea(api_key="PAREA_API_KEY")
p.wrap_openai_client(client)  # auto-trace LLM calls
 
# trace and evaluate any steps
@trace(eval_funcs=[...])
def func(...):
    ...
 
# run tests on datasets
p.experiment(...)

TypeScript

import OpenAI from "openai";
import { Parea, patchOpenAI, trace } from "parea-ai";
 
const openai = new OpenAI();
const p = new Parea(process.env.PAREA_API_KEY);
patchOpenAI(openai);  // auto-trace LLM calls
 
const func = trace(  // trace step
  'funcName', origFunc,
  { evalFuncs: [...], },  // evaluate step
);
 
// run tests on datasets
p.experiment(...)