-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
How are you running AnythingLLM?
Windows Installer (local)
What happened?
So, I've been trying to get agents to work with a local model. I've tried a few: Llama 3.2 Q8, Llama 3.2 Q6 Abliterated, Qwen2.5 32B Q4_K_M. If these models aren't going to figure it out, then any local model on a 24gb card isn't going to work.
I'm using LM Studio to run the models, and I was looking at the logs.
On a brand new chat I said @agent please list documents
This is all the logs from that session:
2024-10-06 00:25:09 [INFO]
Received POST request to /v1/chat/completions with body: {
"model": "qwen2.5-32b-instruct-abliterated-pass2",
"temperature": 0,
"messages": [
{
"content": "You are a program which picks the most optimal function and parameters to call.\n DO NOT HAVE TO PICK A FUNCTION IF IT WILL NOT HELP ANSWER OR FULFILL THE USER'S QUERY.\n When a function is selection, respond in JSON with no additional text.\n When there is no relevant function to call - return with a regular chat text response.\n Your task is to pick a **single** function that we will use to call, if any seem useful or relevant for the user query.\n\n All JSON responses should have two keys.\n 'name': this is the name of the function name to call. eg: 'web-scraper', 'rag-memory', etc..\n 'arguments': this is an object with the function properties to invoke the function.\n DO NOT INCLUDE ANY OTHER KEYS IN JSON RESPONSES.\n\n Here are the available tools you can use an examples of a query and response so you can understand how each one works.\n -----------\nFunction name: rag-memory\nFunction Description: Search against local documents for context that is relevant to the query or store a snippet of text into memory for retrieval later. Storing information should only be done when the user specifically requests for information to be remembered or saved to long-term memory. You should use this tool before search the internet for information. Do not use this tool unless you are explicity told to 'remember' or 'store' information.\nFunction parameters in JSON format:\n{\n \"action\": {\n \"type\": \"string\",\n \"enum\": [\n \"search\",\n \"store\"\n ],\n \"description\": \"The action we want to take to search for existing similar context or storage of new context.\"\n },\n \"content\": {\n \"type\": \"string\",\n \"description\": \"The plain text to search our local documents with or to store in our vector database.\"\n }\n}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"action\":\"search\",\"content\":\"What is AnythingLLM?\"}\nQuery: \"What do you know about Plato's motives?\"\nJSON: {\"action\":\"search\",\"content\":\"What are the facts about Plato's motives?\"}\nQuery: \"Remember that you are a robot\"\nJSON: {\"action\":\"store\",\"content\":\"I am a robot, the user told me that i am.\"}\nQuery: \"Save that to memory please.\"\nJSON: {\"action\":\"store\",\"content\":\"<insert summary of conversation until now>\"}\n-----------\n-----------\nFunction name: document-summarizer\nFunction Description: Can get the list of files available to search with descriptions and can select a single file to open and summarize.\nFunction parameters in JSON format:\n{\n \"action\": {\n \"type\": \"string\",\n \"enum\": [\n \"list\",\n \"summarize\"\n ],\n \"description\": \"The action to take. 'list' will return all files available with their filename and descriptions. 'summarize' will open and summarize the file by the a document name.\"\n },\n \"document_filename\": {\n \"type\": \"string\",\n \"x-nullable\": true,\n \"description\": \"The file name of the document you want to get the full content of.\"\n }\n}\nQuery: \"Summarize example.txt\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"example.txt\"}\nQuery: \"What files can you see?\"\nJSON: {\"action\":\"list\",\"document_filename\":null}\nQuery: \"Tell me about readme.md\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"readme.md\"}\n-----------\n-----------\nFunction name: web-scraping\nFunction Description: Scrapes the content of a webpage or online resource from a provided URL.\nFunction parameters in JSON format:\n{\n \"url\": {\n \"type\": \"string\",\n \"format\": \"uri\",\n \"description\": \"A complete web address URL including protocol. Assumes https if not provided.\"\n }\n}\nQuery: \"What is anythingllm.com about?\"\nJSON: {\"url\":\"https://anythingllm.com\"}\nQuery: \"Scrape https://example.com\"\nJSON: {\"url\":\"https://example.com\"}\n-----------\n-----------\nFunction name: web-browsing\nFunction Description: Searches for a given query using a search engine to get better results for the user query.\nFunction parameters in JSON format:\n{\n \"query\": {\n \"type\": \"string\",\n \"description\": \"A search query.\"\n }\n}\nQuery: \"Who won the world series today?\"\nJSON: {\"query\":\"Winner of today's world series\"}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"query\":\"AnythingLLM\"}\nQuery: \"Current AAPL stock price\"\nJSON: {\"query\":\"AAPL stock price today\"}\n-----------\n\n\n Now pick a function if there is an appropriate one to use given the last user message and the given conversation so far.",
"role": "system"
},
{
"content": "@agent please list documents",
"role": "user"
}
]
}
2024-10-06 00:25:09 [INFO] [LM STUDIO SERVER] Running chat completion on conversation with 2 messages.
2024-10-06 00:25:10 [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
2024-10-06 00:25:10 [INFO]
[LM STUDIO SERVER] [llama-3.2-3b-instruct] Generated prediction: {
"id": "chatcmpl-5r5c0mt7twxl2gxh7dv7r",
"object": "chat.completion",
"created": 1728199509,
"model": "llama-3.2-3b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"action\":\"list\",\"document_filename\":null}"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1014,
"completion_tokens": 10,
"total_tokens": 1024
},
"system_fingerprint": "llama-3.2-3b-instruct"
}
2024-10-06 00:25:10 [INFO] [LM STUDIO SERVER] Client disconnected. Stopping generation..
2024-10-06 00:25:10 [INFO]
Received POST request to /v1/chat/completions with body: {
"model": "qwen2.5-32b-instruct-abliterated-pass2",
"messages": [
{
"content": "You are a helpful ai assistant who can assist the user and use tools available to help answer the users prompts and questions. Tools will be handled by another assistant and you will simply receive their responses to help answer the user prompt - always try to answer the user's prompt the best you can with the context available to you and your general knowledge.",
"role": "system"
},
{
"content": "@agent please list documents",
"role": "user"
}
]
}
2024-10-06 00:25:10 [INFO] [LM STUDIO SERVER] Running chat completion on conversation with 2 messages.
2024-10-06 00:25:10 [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
2024-10-06 00:25:11 [INFO]
[LM STUDIO SERVER] [llama-3.2-3b-instruct] Generated prediction: {
"id": "chatcmpl-bp3dfvnepj6jsve66k8m",
"object": "chat.completion",
"created": 1728199510,
"model": "llama-3.2-3b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I don't have direct access to your files or documents, but I can guide you on how to find them.\n\nCould you please specify which type of documents you are looking for (e.g., Word documents, PDFs, Excel spreadsheets)? Also, where did you save the documents (on a computer, in cloud storage, etc.)?\n\nIf you're using Google Drive, OneDrive, or Dropbox, I can provide instructions on how to find and list your documents."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 88,
"completion_tokens": 95,
"total_tokens": 183
},
"system_fingerprint": "llama-3.2-3b-instruct"
}
2024-10-06 00:25:11 [INFO] [LM STUDIO SERVER] Client disconnected. Stopping generation..
Looking at the instructions and the model is responding exactly as it's told, exactly as all the examples given. But all the examples given are wrong.
The system message gives this example:
Query: "What files can you see?"
JSON: {"action":"list","document_filename":null}
And the model returned this:
"content": "{\"action\":\"list\",\"document_filename\":null}"
I consulted Claude about it and it said the correct response was actually:
{
"name": "document-summarizer",
"arguments": {
"action": "list",
"document_filename": null
}
}
And when I explicitly told the model to use that and it did work. I asked Claude how it knew to do that, and it pointed out the "All JSON responses should have two keys" part, but there's not a single example that actually is provided with that, and it wasn't even clear to me. So I think the agents could work much better on smaller models if that was clear.
If the examples were:
Query: "What files can you see?"
JSON: {"name":"document-summarizer","arguments":{"action":"list","document_filename":null}}
then it would probably work.
Are there known steps to reproduce?
No response
EDIT: note: In all the transactions the logs show both qwen & llama, all these examples came from llama, but it was the same with qwen. Anything just didn't update the model names when I loaded or unloaded in LM Studio.