+
Skip to content

rajshah4/LLM-Evaluation

Repository files navigation

Resources for Evaluation of LLMs / Generative AI / RAG

This repository includes the slides and some of the notebooks that are used in my Evaluation and RAG workshops.

Some of the notebooks do require an OpenAI API key.

These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects. y Updated with my Octobert 2025 MLOps workshop

About the workshop

image

Conference Presentations

Generative AI Summit, Austin (Oct 2023) - Slides

ODSC West, San Francisco (Nov 2023) - Slides

Arize Holiday Conference (Dec 2023) - Slides

Data Innovation Conference (Apr 2024) - Slides

ODSC East, Boston (May 2025) - Slides

MLOps Generative AI Summit, Austin (Oct 2025) - Slides

Notebook links

Testing Properties of a System: Guidance AI

Langtest tutorials from John Snow Labs: Colab Notebooks

LLM Evaluation Harness from EleutherAI: Github or Colab notebook

Ragas showing Model as an evaluator: Github or Colab notebook

Ragas using LangFuse: Colab notebook

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: Github

MLFlow Automated Evaluation: Blog

LLM Grader on AWS: Video and Notebook

LLM AutoEval for RunPod by Maxime Labonne Colab

Agno and Langfuse with a Research Agent: Github

Building resilient prompts using an evaluation flywheel Open AI

Videos

Evaluation for Large Language Models and Generative AI - A Deep Dive - YouTube

Constructing an Evaluation Approach for Generative AI Models - YouTube

Large Language Models (LLMs) Can Explain Their Predictions - YouTube & Slides

Practical Lessons in Building Generative AI: RAG and Text to SQL - YouTube

Unit Testing for Natural Language (LLMs) + LMUnit model - YouTube

Other Additional Resources

Josh Tobin's Evaluation talk YouTube

Awesome-LLMOps

LLM Evaluation Tooling Review

Your AI Product Needs Evals

About

Sample notebooks and prompts for LLM evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载