File Uploads & RAG
Upload files to Hanzo Chat for context-aware responses using Retrieval-Augmented Generation (RAG).
File Uploads & RAG
Hanzo Chat supports file uploads with Retrieval-Augmented Generation (RAG) for context-aware responses. Upload documents, code, images, and data files — the system indexes them and retrieves relevant context for every message.
Supported File Types
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, TXT, MD, RTF |
| Data | CSV, JSON, JSONL, XML, YAML |
| Code | Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, and more |
| Images | PNG, JPG, GIF, WebP, SVG (for vision models) |
| Archives | ZIP (extracted and indexed) |
How It Works
- Upload — drag and drop files into the chat, or use the attachment button
- Indexing — the RAG API chunks files, creates embeddings, and stores them in PGVector
- Retrieval — when you send a message, relevant chunks are found via semantic search
- Generation — retrieved context is appended to your prompt for accurate, grounded responses
Configuration
Docker Setup
RAG is included in the default Docker Compose setup. The rag-api service handles indexing and retrieval:
# compose.yml (included by default)
services:
rag:
image: ghcr.io/hanzoai/chat-rag-api:latest
environment:
- RAG_OPENAI_API_KEY=${RAG_OPENAI_API_KEY:-${OPENAI_API_KEY}}
- EMBEDDINGS_PROVIDER=${EMBEDDINGS_PROVIDER:-openai}
- EMBEDDINGS_MODEL=${EMBEDDINGS_MODEL:-text-embedding-3-small}
depends_on:
- pgvector
pgvector:
image: pgvector/pgvector:pg16
volumes:
- pgvector_data:/var/lib/postgresql/dataEnvironment Variables
# Embeddings provider (required)
RAG_OPENAI_API_KEY=sk-... # Or use HANZO_API_KEY with Hanzo embeddings
# Provider selection
EMBEDDINGS_PROVIDER=openai # openai, azure, huggingface, huggingfacetei, ollama
EMBEDDINGS_MODEL=text-embedding-3-small
# Chunking parameters
CHUNK_SIZE=1500 # Characters per chunk
CHUNK_OVERLAP=100 # Overlap between chunks
# Database
RAG_PORT=8000
COLLECTION_NAME=hanzo_chat
POSTGRES_DB=rag
POSTGRES_USER=rag
POSTGRES_PASSWORD=your-passwordUsing Hanzo Embeddings
To use Zen embeddings instead of OpenAI:
RAG_OPENAI_API_KEY=${HANZO_API_KEY}
RAG_OPENAI_BASEURL=https://api.hanzo.ai/v1
EMBEDDINGS_MODEL=zen3-embeddingUsage Tips
Resend Files Toggle
By default, uploaded files are queried on every message in the conversation. To control when retrieval happens:
- Open conversation settings
- Toggle Resend Files to off
- Files are only queried when explicitly attached to a message
This is useful for long conversations where you want targeted file queries.
File Management
- Side Panel — view and manage all uploaded files
- Reuse files — attach previously uploaded files to any new conversation
- Storage types — "Host" storage (for RAG) vs "OpenAI" storage (for Assistants)
Best Practices
- Be specific — craft prompts that reference the file content directly
- Chunk size — increase
CHUNK_SIZEfor technical docs, decrease for short Q&A content - Multiple files — upload related files together for cross-document queries
- File limits — configure per-endpoint in
chat.yaml:
fileConfig:
endpoints:
hanzo:
fileLimit: 10 # Max files per message
fileSizeLimit: 25 # MB per file
default:
totalSizeLimit: 100 # MB total per conversation
serverFileSizeLimit: 100 # MB server-wide limit
avatarSizeLimit: 2 # MB for avatarsWhat is RAG?
RAG (Retrieval-Augmented Generation) enhances LLM responses by grounding them on your uploaded documents:
- Retrieval — finds relevant chunks from your files using semantic search
- Augmentation — appends the retrieved context to your prompt
- Generation — the LLM uses both its training and your documents to respond
Benefits:
- Accurate answers grounded in your specific documents
- No hallucination about content in your files
- Up-to-date — works with any documents you upload, regardless of model training cutoff
- Private — your files stay on your infrastructure (self-hosted) or Hanzo's servers (hosted)
How is this guide?
Last updated on