Code Extraction
CodeDox supports both LLM-powered and traditional extraction methods to identify and extract code snippets from documentation.
How It Works
LLM Extraction (Recommended)
- Content Cleaning: Crawl4AI removes navigation, ads, and irrelevant content
- LLM Analysis: AI analyzes the content for code blocks
- Metadata Extraction: Automatically detects:
- Programming language
- Code purpose and description
- Related dependencies
- Framework context
Traditional Extraction (Non-LLM)
CodeDox also supports basic extraction without LLMs: 1. Content Cleaning: Crawl4AI removes navigation, ads, and irrelevant content 2. Pattern Matching: Uses syntax highlighting and code block detection 3. Basic Metadata: Extracts language and basic structure information
Note: While traditional extraction works without LLMs, LLM extraction provides significantly more detailed titles, descriptions, and contextual understanding of the code.
Configuration
Local Models (Free - Recommended)
For cost-free extraction, use local models via Ollama or LM Studio:
# Using Ollama
# Configure in .env
CODE_LLM_API_KEY=ollama
CODE_LLM_PROVIDER=ollama
CODE_LLM_EXTRACTION_MODEL=Qwen3-Coder-30B-A3B-Instruct
CODE_LLM_BASE_URL=http://localhost:11434/v1
Recommended Local Models
- Qwen3-Coder-30B-A3B-Instruct: Excellent code understanding (best results)
- Qwen3-4B-Instruct-2507: Lightweight but effective
Cloud Models
For OpenAI or other cloud providers:
CODE_LLM_API_KEY=sk-...
CODE_LLM_PROVIDER=openai # or anthropic, groq
CODE_LLM_EXTRACTION_MODEL=gpt-4o-mini # Fast and affordable
Cost Optimization
- Content Hash Deduplication: Skips LLM calls for unchanged content during re-crawls
- Local Models: Zero API costs with excellent results using Qwen3 2507 non thinking models
Manual Upload
You can also upload markdown files directly: