Francisco Utrera
Founding AI Engineer at Embrace.ai, where I own the entire AI stack — from LLM orchestration and RAG to agentic tool execution and document processing.
I've built, from scratch, a production multi-model inference pipeline (OpenAI, Claude, Gemini), a provider-agnostic LLM SDK with ReAct agents, a RAG system with vector search and neural reranking, an agentic tool execution framework with curated API endpoint definitions and automatic credential handling, and a multi-modal document extraction pipeline. Across 68 microservices, I'm the sole or primary author of the AI core, knowledge intelligence, and document processing layers.
Before Embrace I worked on adversarial robustness and transfer learning research at UC Berkeley, publishing work on how adversarially-trained networks transfer better to downstream tasks.
What I Build
- LLM Orchestration — Multi-model routing across OpenAI, Claude, and Gemini with provider-agnostic abstractions, type-safe tool definitions via Zod, and ReAct agent loops with automatic tool execution
- RAG Architecture — Embedding pipelines, Pinecone vector search with overfetch, Cohere neural reranking, metadata hydration, and contextual document enrichment via Claude and Gemini
- Agentic Tool Execution — Curated API endpoint definitions per platform (Zendesk, Jira, Confluence, Salesforce) with automatic credential handling. Code sandbox execution via ECS Fargate. Citation tracking back to source documents.
- Document Intelligence — Multi-format extraction (PDF, DOCX, images, audio/video), intelligent strategy selection, LLM-powered enrichment with multi-modal element processing, and autonomous article generation from knowledge gaps
- Production Infrastructure — AWS Lambda + ECS Fargate, SQS/SNS event-driven architecture, GraphQL federation across 68 microservices, Inngest durable workflows, Langfuse observability with per-model cost attribution
- LinkedIn fjutrera
- Medium @fjulozada
- GitHub @utrerf
- Email fjulozada@gmail.com