Frequently Asked Questions

What is EnviroLLM?

EnviroLLM gives you the tools to track and optimize resource usage when running models on your own hardware.

What can I do with it?

  • • Monitor CPU, memory, GPU, and power usage in real-time
  • • Benchmark inference speed and measure tokens per second
  • • Compare energy consumption across different models and quantizations
  • • Test models with task-specific prompts (code generation, analysis, creative writing, etc.)
  • • Get automatic model recommendations based on quality, speed, and efficiency
  • • View interactive visualizations comparing energy vs. speed tradeoffs
  • • Export benchmark data to CSV for further analysis

How do I start?

Run one command (no installation needed):

npx envirollm start

Then visit the dashboard to see your metrics in real-time!

Requirements: Node.js and Python 3.7+

What technology stack does EnviroLLM use?

  • Frontend: Next.js, React, TypeScript, Tailwind CSS
  • Backend: Python, FastAPI, PyTorch/TensorFlow
  • CLI: Node.js, TypeScript
  • Deployment: Vercel (Frontend), Railway (Backend)

Which LLM tools does it work with?

The CLI automatically finds most popular LLM setups:

  • • Ollama
  • • LLaMA/LlamaCPP
  • • Python scripts
  • • Text Generation WebUI
  • • KoboldCPP
  • • Oobabooga
  • • LM Studio
  • • GPT4All

Can I contribute?

Absolutely! Everything's available on GitHub.

How do I benchmark models?

Via Web Interface:

  1. Start the backend: npx envirollm start
  2. Visit envirollm.com/optimize
  3. Click "Run Benchmark"
  4. Choose a task preset or write a custom prompt
  5. Select models to compare
  6. View results with energy/speed charts and quality scores

Via CLI:

npx envirollm benchmark --models llama3:8b,phi3:mini

What are task presets?

Task presets are pre-written prompts designed to test different workload types. We provide 7 presets:

  • Explanation: General knowledge and concept explanation
  • Code Generation: Programming tasks with documentation
  • Summarization: Concise information synthesis
  • Long-form Writing: Extended content (travel guides, articles)
  • Analytical Writing: Critical analysis and comparison
  • Data Analysis: SQL queries and technical problems
  • Creative Writing: Fiction and narrative generation

These help you test how models perform across different use cases and enable reproducible comparisons.

Where is my data stored?

All benchmark results are stored locally on your machine at:

~/.envirollm/benchmarks.db

Your data never leaves your machine. You can:

  • • Export to CSV via the web interface
  • • Clean all data with: npx envirollm clean
  • • View the SQLite database directly with any SQLite viewer

How accurate are the energy measurements?

EnviroLLM uses a simplified power estimation model:

  • Base power: 50W (system idle)
  • CPU contribution: CPU usage × 2W
  • GPU power: Direct measurement via NVIDIA APIs (when available)

This provides relative comparisons between models rather than absolute values. The measurements are consistent enough to identify which models are more efficient and track trends over time.

For research-grade accuracy, consider using specialized hardware power meters.

What's LLM-as-a-Judge?

LLM-as-a-Judge uses another LLM (by default, gemma3:1b running locally) to evaluate response quality on a 0-100 scale.

This helps you assess quality-efficiency tradeoffs: a faster, more energy-efficient model might produce lower-quality responses.

When it's used:

  • • Automatically enabled when gemma3:1b is available
  • • Falls back to heuristic scoring (word count, diversity, structure) if not available
  • • Results marked with [J] badge in the UI

Can I benchmark quantization differences?

Yes! This is one of the most useful features. Compare Q4, Q8, and FP16 quantizations of the same model:

  1. Pull the quantization variants you want to test:
    ollama pull llama3:8bollama pull llama3:8b-q8ollama pull llama3:8b-fp16
  2. Select all variants in the benchmark interface
  3. Run the same prompt on all of them
  4. Compare energy consumption, speed, and quality scores

Research shows quantization can reduce energy by up to 45% while maintaining acceptable quality for many tasks.

Why build this?

LLMs are a fascinating technology to me, but running them locally can be a black box. I wanted to create a tool that gives users visibility and control over the environmental impact of their AI experiments. Since I'm not able to impact cloud-based inference, I thought this would be a good way to contribute to more sustainable AI practices.