Scale Your AI Infrastructure

Intelligent load-balancing, model orchestration, API control, and API request conformance to transform multiple AI inferencing and embeddings instances info a unified, high-availability fabric

MIT Licensed Free & Open Source .NET 8.0

Why OllamaFlow?

🎯

Multiple Virtual Endpoints

Create multiple virtual Ollama or OpenAI-compatible endpoints, each mapping to a series of backend instances of Ollama, vLLM, SharpAI, and more!

⚖️

Smart Load Balancing

Distribute requests intelligently across healthy backends using round-robin or random strategies

🔄

Automatic Model Sync

Ensure all Ollama backends have required models with automatic discovery and parallel downloads

🔒

Security and Compliance

Control how your hardware resources are used and ensure API request conformance to confidently scale

❤️

Health Monitoring

Real-time health checks with configurable thresholds ensure requests only go to healthy backends

📊

Reduce Downtime

Seamlessly handle backend failures through healthchecks and request proxying to healthy endpoints automatically

Compatibility with Ollama and OpenAI APIs

OllamaFlow translates Ollama and OpenAI API requests based on what your backend supports, while adding intelligent routing, high availability, and management

Ollama APIs

/api/generate Text generation
/api/chat Chat completions
/api/pull Model pulling
/api/push Model pushing
/api/show Model information
/api/tags List models
/api/ps Running models
/api/embed Generate embeddings
/api/delete Delete models

OpenAI APIs

/v1/completions Completions
/v1/chat/completions Chat completions
/v1/embeddings Generate embeddings

Supported Backends

OllamaFlow works seamlessly with multiple AI inference platforms:

Ollama Local LLM inference
vLLM High-performance inference
SharpAI .NET local inference
OpenAI Cloud AI services
Any OpenAI-Compatible API Flexible integration

Security & Control

Fine-grained security controls to ensure AI endpoints are used in accordance with your objectives, allowing you to confidently scale

🔒

API Restrictions

Enable or disable embeddings and completions APIs, and enforce request parameter compliance - models, temperature, context size, and others

🎛️

Label-Based Control

Force requests to use specific backends by incoming labels to ensure compliance with regulation or to leverage systems with specific attributes

🛡️

Multi-Tenant Isolation

Configure multiple frontends with different security policies to safely serve different teams or customers

API Explorer

Test and Validate Your APIs

The OllamaFlow API Explorer is a web-based tool designed for testing, debugging, and evaluating AI inference APIs in a browser-based experience.

  • Support for Ollama and OpenAI API formats
  • Real-time API testing with streaming support
  • JSON syntax validation
  • Response body and header inspection
  • Load testing capabilities

Key Features

Load Balancing

  • Round-robin and random distribution
  • Request routing based on health
  • Automatic failover
  • Configurable rate limiting

Ollama Model Management

  • Automatic model discovery
  • Intelligent synchronization
  • Dynamic model requirements
  • Parallel downloads

Enterprise Ready

  • Bearer token authentication
  • Comprehensive logging
  • Docker & Compose ready
  • SQLite persistence

Use Cases

Scalable Inference

Perfect for GPU systems and dense CPU systems like those powered by Ampere processors

GPU Cluster Management

Distribute AI workloads across multiple GPU servers for maximum performance and utilization

High Availability

Ensure your AI services stay online 24/7 with automatic failover and health monitoring

Development & Testing

Easily switch between different model configurations and test various deployment scenarios

Cost Optimization

Maximize hardware utilization across your infrastructure by intelligently routing requests

Multi-Tenant Scenarios

Isolate workloads while sharing infrastructure through multiple frontend configurations

Get Started in Minutes

Using Docker

# Pull the image
docker pull jchristn/ollamaflow:v1.1.0

# Run with configuration
docker run -d -p 43411:43411 \
  -v $(pwd)/ollamaflow.json:/app/ollamaflow.json \
  -v $(pwd)/ollamaflow.db:/app/ollamaflow.db \
  jchristn/ollamaflow:v1.1.0

Using .NET

# Clone and build
git clone https://github.com/ollamaflow/ollamaflow.git
cd ollamaflow/src && dotnet build

# Run
cd OllamaFlow.Server/bin/Debug/net8.0
dotnet OllamaFlow.Server.dll

Complete Postman collection included for easy API testing!