LLM Collaboration for Code Generation
-
Updated
Feb 17, 2026 - Python
LLM Collaboration for Code Generation
Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.
🧪 Automated LLM coding benchmarks with Ollama - HumanEval & MBPP evaluation suite with safe execution, comprehensive logging, and detailed analysis tools
Fine-tuning CodeT5 for Python code generation on the MBPP dataset. Features custom TensorFlow training loops, mixed precision, XLA optimization, and distributed multi-GPU strategies.
Add a description, image, and links to the mbpp topic page so that developers can more easily learn about it.
To associate your repository with the mbpp topic, visit your repo's landing page and select "manage topics."