这是indexloc提供的服务,不要输入任何密码
Skip to content

add vision api support for assistants #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

add vision api support for assistants #80

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Dec 7, 2024

proof of it working

image

image

and the way i got it to work was following this json request example on the openai community forum for what a vision api request should look like when using assistants

https://community.openai.com/t/api-calls-to-v1-threads-thread-id-messages-fails-with-file-and-file-ids/613781/7

example code

// openai
private AssistantRequest assistantRequest;
private Assistant assistant;
private String assistantId;
private ThreadRequest threadRequest;
private com.theokanning.openai.assistants.thread.Thread thread;
private String threadId;
service = new OpenAiService(
        "API_KEY",
        Duration.ofSeconds(30));
assistantRequest = AssistantRequest.builder()
        .model("gpt-4-turbo")
        // add file search tool to assistant
        .tools(Collections.singletonList(new FileSearchTool()))
        .instructions("You are a helpful assistant.").temperature(0D)
        .build();
assistant = service.createAssistant(assistantRequest);
assistantId = assistant.getId();
System.out.println("assistantId:" + assistantId);
threadRequest = ThreadRequest.builder().build();
thread = service.createThread(threadRequest);
threadId = thread.getId();
System.out.println("threadId:" + threadId);
if (label.equalsIgnoreCase("chatgpt")) {
    if (args.length == 0) {
        sender.sendMessage('\r' + "Unknown translation key: command.unknown.argument");
        sender.sendMessage('\r' + label + " " + String.join(" ", args)
                + "Unknown translation key: command.context.here");
        return;
    } else if (args[0].startsWith("https://") && args.length >= 2) {
        String fn = args[0].substring(args[0].lastIndexOf("/") + 1,
                args[0].length());
        System.out.println(fn.substring(fn.lastIndexOf(".") + 1));
        switch (fn.substring(fn.lastIndexOf(".") + 1)) {
        case "jpg":
        case "jpeg":
        case "jpe":
        case "jif":
        case "jfif":
        case "jfi":
        case "png":
        case "gif":
        case "webp":
        case "tiff":
        case "tif":
        case "bmp":
        case "dib":
        case "heif":
        case "heic": {
            List<String> fileIds = new ArrayList<>();
            int i = 0;
            for (String arg : args) {
                if (arg.startsWith("https://")) {
                    i++;
                    fn = arg.substring(arg.lastIndexOf("/") + 1,
                            arg.length());
                    File outFile = new File(
                            System.getProperty("java.io.tmpdir"),
                            "openai4j");
                    outFile.mkdirs();
                    outFile = new File(outFile, fn);
                    System.out.println(fn);
                    byte[] b = null;
                    try {
                        if (!outFile.exists())
                            b = downloadFile(arg, outFile);
                    } catch (ClientProtocolException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                    com.theokanning.openai.file.File file = service
                            .uploadFile("vision", outFile.getPath());
                    // get resource

                    String fileId = file.getId();
                    System.out.println("fileId:" + fileId);
                    fileIds.add(fileId);
                }
            }
            List<ImageContent> imageContents = new ArrayList<>();
            for (String fileId : fileIds) {
                imageContents.add(new ImageContent(
                        new ImageFile(fileId)));
            }
            MessageRequest messageRequest = MessageRequest
                    .builder()
                    // query user to search file
                    .content(
                            String.join(" ", Arrays.copyOfRange(args,
                                    i, args.length)))
                    .content(imageContents).build();
            // add msg to thread
            Message msg = service.createMessage(threadId,
                    messageRequest);

            int before = msg.getCreatedAt();

            // run
            RunCreateRequest runCreateRequest = RunCreateRequest
                    .builder().assistantId(assistantId)
                    .toolChoice(ToolChoice.AUTO).build();
            Run run = service.createRun(threadId, runCreateRequest);
            String runId = run.getId();

            do {
                run = service.retrieveRun(threadId, runId);
            } while (!(run.getStatus().equals("completed"))
                    && !(run.getStatus().equals("failed")));

            List<RunStep> runSteps = service.listRunSteps(threadId,
                    runId, new ListSearchParameters()).getData();

            for (RunStep runStep : runSteps) {
                System.out.println(runStep.getStepDetails());
            }
            service.listMessages(threadId,
                    new MessageListSearchParameters())
                    .getData()
                    .forEach(
                            message -> {
                                if (message.getCreatedAt() > before)
                                    sender.sendMessage('\r' + message
                                            .getContent().get(0)
                                            .getText().getValue());
                            });
            return;
        }
        default: {
            List<String> fileIds = new ArrayList<>();
            int i = 0;
            for (String arg : args) {
                if (arg.startsWith("https://")) {
                    i++;
                    fn = arg.substring(arg.lastIndexOf("/") + 1,
                            arg.length());
                    File outFile = new File(
                            System.getProperty("java.io.tmpdir"),
                            "openai4j");
                    outFile.mkdirs();
                    outFile = new File(outFile, fn);
                    System.out.println(fn);
                    byte[] b = null;
                    try {
                        if (!outFile.exists())
                            b = downloadFile(arg, outFile);
                    } catch (ClientProtocolException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                    com.theokanning.openai.file.File file = service
                            .uploadFile("assistants", outFile.getPath());
                    // get resource

                    String fileId = file.getId();
                    System.out.println("fileId:" + fileId);
                    fileIds.add(fileId);
                }
            }
            List<Attachment> attachments = new ArrayList<>();
            for (String fileId : fileIds) {
                attachments.add(new Attachment(fileId, Collections
                        .singletonList(new FileSearchTool())));
            }
            MessageRequest messageRequest = MessageRequest
                    .builder()
                    // query user to search file
                    .content(
                            String.join(" ", Arrays.copyOfRange(args,
                                    i, args.length)))
                    .attachments(attachments).build();
            // add msg to thread
            Message msg = service.createMessage(threadId,
                    messageRequest);

            int before = msg.getCreatedAt();

            // run
            RunCreateRequest runCreateRequest = RunCreateRequest
                    .builder().assistantId(assistantId)
                    .toolChoice(ToolChoice.AUTO).build();
            Run run = service.createRun(threadId, runCreateRequest);
            String runId = run.getId();

            do {
                run = service.retrieveRun(threadId, runId);
            } while (!(run.getStatus().equals("completed"))
                    && !(run.getStatus().equals("failed")));

            List<RunStep> runSteps = service.listRunSteps(threadId,
                    runId, new ListSearchParameters()).getData();

            for (RunStep runStep : runSteps) {
                System.out.println(runStep.getStepDetails());
            }
            service.listMessages(threadId,
                    new MessageListSearchParameters())
                    .getData()
                    .forEach(
                            message -> {
                                if (message.getCreatedAt() > before)
                                    sender.sendMessage('\r' + message
                                            .getContent().get(0)
                                            .getText().getValue());
                            });
            return;
        }
        }
    }
    String input = String.join(" ", args);
    MessageRequest messageRequest = MessageRequest.builder()
    // query user to search file
            .content(input).build();
    // add msg to thread
    Message msg = service.createMessage(threadId, messageRequest);

    int before = msg.getCreatedAt();

    // run
    RunCreateRequest runCreateRequest = RunCreateRequest.builder()
            .assistantId(assistantId).toolChoice(ToolChoice.AUTO)
            .build();
    Run run = service.createRun(threadId, runCreateRequest);
    String runId = run.getId();

    do {
        run = service.retrieveRun(threadId, runId);
    } while (!(run.getStatus().equals("completed"))
            && !(run.getStatus().equals("failed")));

    List<RunStep> runSteps = service.listRunSteps(threadId, runId,
            new ListSearchParameters()).getData();

    for (RunStep runStep : runSteps) {
        System.out.println(runStep.getStepDetails());
    }
    service.listMessages(threadId, new MessageListSearchParameters())
            .getData()
            .forEach(
                    message -> {
                        if (message.getCreatedAt() > before)
                            sender.sendMessage('\r' + message
                                    .getContent().get(0).getText()
                                    .getValue());
                    });
}

@Lambdua
Copy link
Owner

Lambdua commented Dec 9, 2024

This part of the logic will be merged into MultiMediaContent, and it is recommended to use MultiMediaContent in the future. ImageContent will be removed in subsequent versions.

@ghost
Copy link
Author

ghost commented Dec 9, 2024

as it stands right now, if you do messageRequest = MessageRequest.builder().content(imageContents).content(input).build(); the content get overwritten by the text input variable which makes gpt not see the image you uploaded. to prevent user error, i deprecated that method. you should instead add the text input to the imageContents list. for example imageContents.add(new ImageContent(input)); where the variable input is a String which will make gpt see both your text input as well as the images you uploaded

@ghost
Copy link
Author

ghost commented Dec 9, 2024

image

@Lambdua
Copy link
Owner

Lambdua commented Dec 9, 2024

Huh huh, I'll improve this for the builder method

@Lambdua
Copy link
Owner

Lambdua commented Dec 9, 2024

You can take a look at my most recent commits. Are the relevant methods in the current MessageRequest.builder().build() sufficient for daily use? There are no changes needed, I will release the next version soon

@ghost
Copy link
Author

ghost commented Dec 9, 2024

You can take a look at my most recent commits. Are the relevant methods in the current MessageRequest.builder().build() sufficient for daily use? There are no changes needed, I will release the next version soon

assuming everything works as intended, looks good, thank you

@Lambdua Lambdua closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant