-
NEC Laboratories Europe
- Heidelberg, Germany
- @kgashteo
Stars
A library for making RepE control vectors
The official repo for the Dialz Python library - a toolkit for steering vector research.
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models know themselves through automated interpretability.
[ICLR 2025] This is the official implementation for the paper: "Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation"
Large language model and dataset for natural language to first-order logic translation
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
This rep is done for the bachelors project VU - AI.
ACL2023 - AlignScore, a metric for factual consistency evaluation.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"
This is the repository of our EMNLP 2024 Main conference paper "Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues".
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
Apache Wayang(incubating) is the first cross-platform data processing system.
Simple language-driven navigation tasks for studying compositional learning
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
Codebase for "DAVE: Diagnostic benchmark for Audio Visual Evaluation" (NeurIPS 2025 Datasets & Benchmarks)
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
Toolkit for linearizing PDFs for LLM datasets/training
Parsers for clinical trials data from clinicaltrials.gov
Fully open reproduction of DeepSeek-R1
A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs and agents.
A course on aligning smol models.
Benchmarking Benchmark Leakage in Large Language Models