Architecture Overview¶
This document provides a high-level overview of the LangGraph OpenAI Serve architecture, explaining how the different components work together to provide an OpenAI-compatible API for LangGraph workflows.
System Architecture¶
LangGraph OpenAI Serve consists of several key components:
- FastAPI Application: The web server that handles HTTP requests
- LangchainOpenaiApiServe: The core class that bridges LangGraph and the API
- Graph Registry: A registry that manages LangGraph instances
- API Routers: FastAPI routers for different API endpoints
- Schema Models: Pydantic models for data validation and serialization
Architecture Diagram¶
┌─────────────────────────────────────────────────────────┐
│ HTTP Clients │
│ (OpenAI Python SDK, JavaScript SDK, curl, etc.) │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ FastAPI Application │
│ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Models Router │ │ Chat Completions Router│ │
│ │ /v1/models │ │ /v1/chat/completions │ │
│ └────────┬────────┘ └──────────┬──────────────┘ │
│ │ │ │
│ ┌────────▼───────────────────────────▼──────────────┐ │
│ │ LangchainOpenaiApiServe │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Graph Registry │ │ │
│ │ │ │ │ │
│ │ │ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ │ │
│ │ │ │ Graph 1 │ │ Graph 2 │ │ Graph N │ │ │ │
│ │ │ └───────────┘ └───────────┘ └──────────┘ │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ LangGraph Workflows │
└─────────────────────────────────────────────────────────┘
Component Details¶
FastAPI Application¶
The FastAPI application serves as the web server that handles HTTP requests. It can be:
- Created automatically by LangchainOpenaiApiServe
- Provided by the user when they want to integrate LangGraph OpenAI Serve with an existing FastAPI application
LangchainOpenaiApiServe¶
This is the core class that connects LangGraph workflows with the OpenAI-compatible API. Its responsibilities include:
- Managing the FastAPI application
- Registering and managing LangGraph instances
- Providing routers for different API endpoints
- Handling CORS configuration when needed
Graph Registry¶
The Graph Registry maintains a mapping between model names and LangGraph instances. When a request comes in for a specific model, the registry looks up the corresponding LangGraph workflow to execute. The registry allows:
- Registering multiple graphs with different names
- Retrieving graphs by name
- Listing available graphs
API Routers¶
LangGraph OpenAI Serve provides several FastAPI routers:
- Models Router: Handles
/v1/models
endpoint to list available LangGraph workflows - Chat Completions Router: Handles
/v1/chat/completions
endpoint for chat interactions - Health Router: Provides a health check endpoint at
/health
Schema Models¶
Pydantic models are used for data validation and serialization. These include:
- Request Models: Define the structure of API requests
- Response Models: Define the structure of API responses
- OpenAI Compatible Models: Models that match OpenAI's API schema
Request Flow¶
When a client makes a request to the API, the following sequence of events occurs:
- Client Request: A client (like the OpenAI Python SDK) sends a request to the API
- FastAPI Router: The appropriate router handles the request based on the endpoint
- Request Validation: Pydantic models validate the request data
- Graph Selection: The system looks up the requested LangGraph workflow in the registry
- Graph Execution: The LangGraph workflow is executed with the provided messages
- Response Formatting: The result is formatted according to the OpenAI API schema
- Client Response: The response is sent back to the client
Example Flow for Chat Completion¶
Client Request (POST /v1/chat/completions)
│
▼
FastAPI Chat Router
│
▼
Request Validation (ChatCompletionRequest)
│
▼
Graph Selection (get_graph_for_model)
│
▼
Message Conversion (convert_to_lc_messages)
│
▼
Graph Execution (graph.ainvoke or graph.astream_events)
│
▼
Response Formatting
│
▼
Client Response
Streaming vs. Non-Streaming¶
LangGraph OpenAI Serve supports both streaming and non-streaming responses:
Non-Streaming Mode¶
In non-streaming mode: 1. The entire LangGraph workflow is executed 2. The final result is collected 3. A single response is returned to the client
Streaming Mode¶
In streaming mode: 1. The LangGraph workflow is executed with streaming enabled 2. Events from the workflow are captured in real-time 3. Each chunk of generated content is immediately sent to the client 4. The client receives and processes chunks as they arrive
Integration with LangGraph¶
LangGraph OpenAI Serve integrates with LangGraph by:
- Accepting compiled LangGraph workflows (
graph.compile()
) - Converting between OpenAI message formats and LangChain message formats
- Executing workflows with appropriate parameters (temperature, max_tokens, etc.)
- Handling both streaming and non-streaming execution modes
Next Steps¶
- Read about integration with LangGraph for more details on how LangGraph workflows are executed
- Learn about OpenAI API compatibility to understand how the API matches OpenAI's interface