-
Stanford
- Stanford, CA
-
01:57
(UTC -07:00) - sayands.github.io
- @debsarkar_sayan
- @sayandsarkar.bsky.social
Stars
[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
🚀 Lightning-fast computer vision models. Fine-tune SOTA models with just a few lines of code. Ready for cloud ☁️ and edge 📱 deployment.
[NeurIPS 2025, Spotlight] Rectified Point Flow: Generic Point Cloud Pose Estimation
Code for "ReSpace: Text-Driven 3D Indoor Scene Synthesis and Editing with Preference Alignment"
[RSS 2025] ROMAN: a view-invariant global localization method that matches objects from different robot views for reliable pose estimation even when a scene is observed from opposite views
A collection of onboarding diagrams of different project online
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
[ICCV 2023] SGAligner: 3D Scene Alignment with Scene Graphs
Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIPS'24]
[ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
PyTorchGeoNodes is a PyTorch module for differentiable shape programs / procedural models in forms of graphs. It can automatically translate Blender geometry node models into PyTorch code. Original…
A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
[CVPR 2025] WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Spurfies: Sparse Surface Reconstruction using Local Geometry Priors
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
🟣 Computer Vision interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
[3DV 2025, Oral] LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Repository for WACV23 paper "Automatically Annotating Indoor Images with CAD Models via RGB-D Scans"
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
PyViz3D is a web-based visualizer for 3D objects and point clouds.
projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
A quick guide (especially) for trending instruction finetuning datasets