Architecture Overview¶

This document provides a high-level overview of the LangGraph OpenAI Serve architecture, explaining how the different components work together to provide an OpenAI-compatible API for LangGraph workflows.

System Architecture¶

LangGraph OpenAI Serve consists of several key components:

FastAPI Application: The web server that handles HTTP requests
LangchainOpenaiApiServe: The core class that bridges LangGraph and the API
Graph Registry: A registry that manages LangGraph instances
API Routers: FastAPI routers for different API endpoints
Schema Models: Pydantic models for data validation and serialization

Architecture Diagram¶

┌─────────────────────────────────────────────────────────┐
│                HTTP Clients                             │
│    (OpenAI Python SDK, JavaScript SDK, curl, etc.)      │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                  FastAPI Application                    │
│                                                         │
│  ┌─────────────────┐       ┌─────────────────────────┐  │
│  │  Models Router  │       │  Chat Completions Router│  │
│  │   /v1/models    │       │  /v1/chat/completions   │  │
│  └────────┬────────┘       └──────────┬──────────────┘  │
│           │                           │                 │
│  ┌────────▼───────────────────────────▼──────────────┐  │
│  │             LangchainOpenaiApiServe               │  │
│  │                                                   │  │
│  │  ┌─────────────────────────────────────────────┐  │  │
│  │  │              Graph Registry                 │  │  │
│  │  │                                             │  │  │
│  │  │  ┌───────────┐  ┌───────────┐  ┌──────────┐ │  │  │
│  │  │  │ Graph 1   │  │ Graph 2   │  │ Graph N  │ │  │  │
│  │  │  └───────────┘  └───────────┘  └──────────┘ │  │  │
│  │  └─────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                 LangGraph Workflows                     │
└─────────────────────────────────────────────────────────┘

Component Details¶

FastAPI Application¶

The FastAPI application serves as the web server that handles HTTP requests. It can be:

Created automatically by LangchainOpenaiApiServe
Provided by the user when they want to integrate LangGraph OpenAI Serve with an existing FastAPI application

LangchainOpenaiApiServe¶

This is the core class that connects LangGraph workflows with the OpenAI-compatible API. Its responsibilities include:

Managing the FastAPI application
Registering and managing LangGraph instances
Providing routers for different API endpoints
Handling CORS configuration when needed

Graph Registry¶

The Graph Registry maintains a mapping between model names and LangGraph instances. When a request comes in for a specific model, the registry looks up the corresponding LangGraph workflow to execute. The registry allows:

Registering multiple graphs with different names
Retrieving graphs by name
Listing available graphs

API Routers¶

LangGraph OpenAI Serve provides several FastAPI routers:

Models Router: Handles /v1/models endpoint to list available LangGraph workflows
Chat Completions Router: Handles /v1/chat/completions endpoint for chat interactions
Health Router: Provides a health check endpoint at /health

Schema Models¶

Pydantic models are used for data validation and serialization. These include:

Request Models: Define the structure of API requests
Response Models: Define the structure of API responses
OpenAI Compatible Models: Models that match OpenAI's API schema

Request Flow¶

When a client makes a request to the API, the following sequence of events occurs:

Client Request: A client (like the OpenAI Python SDK) sends a request to the API
FastAPI Router: The appropriate router handles the request based on the endpoint
Request Validation: Pydantic models validate the request data
Graph Selection: The system looks up the requested LangGraph workflow in the registry
Graph Execution: The LangGraph workflow is executed with the provided messages
Response Formatting: The result is formatted according to the OpenAI API schema
Client Response: The response is sent back to the client

Example Flow for Chat Completion¶

Client Request (POST /v1/chat/completions)
    │
    ▼
FastAPI Chat Router
    │
    ▼
Request Validation (ChatCompletionRequest)
    │
    ▼
Graph Selection (get_graph_for_model)
    │
    ▼
Message Conversion (convert_to_lc_messages)
    │
    ▼
Graph Execution (graph.ainvoke or graph.astream_events)
    │
    ▼
Response Formatting
    │
    ▼
Client Response

Streaming vs. Non-Streaming¶

LangGraph OpenAI Serve supports both streaming and non-streaming responses:

Non-Streaming Mode¶

In non-streaming mode: 1. The entire LangGraph workflow is executed 2. The final result is collected 3. A single response is returned to the client

Streaming Mode¶

In streaming mode: 1. The LangGraph workflow is executed with streaming enabled 2. Events from the workflow are captured in real-time 3. Each chunk of generated content is immediately sent to the client 4. The client receives and processes chunks as they arrive

Integration with LangGraph¶

LangGraph OpenAI Serve integrates with LangGraph by:

Accepting compiled LangGraph workflows (graph.compile())
Converting between OpenAI message formats and LangChain message formats
Executing workflows with appropriate parameters (temperature, max_tokens, etc.)
Handling both streaming and non-streaming execution modes

Next Steps¶

Read about integration with LangGraph for more details on how LangGraph workflows are executed
Learn about OpenAI API compatibility to understand how the API matches OpenAI's interface