-
Notifications
You must be signed in to change notification settings - Fork 65
Added ragas to compute string metrics for evaluation. #1039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small refactoring requests
if not result.result: | ||
console.print("[yellow] No query execution result available, skipping..", style="italic") | ||
# Evaluate string metrics | ||
if not query.expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is large enough now that it might make sense to separate the specific metric types into separate methods (doc_retrieval metrics, generated_answer metrics)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made these changes.
apps/query-eval/queryeval/driver.py
Outdated
response=result.result, | ||
reference=query.expected, | ||
) | ||
scores = asyncio.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to make compute_string_metrics an async method? Can we just have it be sync and have it wait for processing within?
This is adding a little complexity and I'm unsure why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this as well for a long time. Ragas method to get score is async because of which we have to do this. Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do a variant off https://chatgpt.com/share/67466a08-1488-8007-b4fa-2d2538c74071
Basically do the waiting for async completion inside the compute_string_metrics method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be no waiting as such because do_eval runs one sample at a time. Ragas people have made it an async function to handle large workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless I have made the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I just mean in terms of function APIs
apps/query-eval/queryeval/driver.py
Outdated
metrics.doc_retrieval_precision = len(retrieved_doc_set & expected_doc_set) / len(retrieved_doc_set) | ||
return metrics | ||
|
||
def get_string_metrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: generated_answer_metrics
No description provided.