llm_eval Suite of LLM eval experiments. Benchmarking and other tests. For verifiable domains: Guided outputs (constrained decoding; multiple choice, JSON schema)