Joy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
-
Updated
Aug 29, 2025 - Python
Joy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
Source code for paper "Multi-Classification In-Vehicle Intrusion Detection System using Packet- and Sequence-Level Characteristics from Time-Embedded Transformer with Autoencoder"
This repository contains code for generating captions for images using a Transformer-based model. The model used is the `VisionEncoderDecoderModel` from the Hugging Face Transformers library, specifically the `nlpconnect/vit-gpt2-image-captioning` model.
A Next-Word Prediction project uses Transformers and GPT-2 for text generation. GPTTokenizer preprocesses input, and the model is fine-tuned. Evaluation measures accuracy, perplexity, and fluency.
Incredibly fast Whisper-large-v3 with speaker diarization
Speech-to-Text (STT) with Whisper
Add a description, image, and links to the transfromers topic page so that developers can more easily learn about it.
To associate your repository with the transfromers topic, visit your repo's landing page and select "manage topics."