I think the script in the Setup of SayCan on a Robot Pick and Place Tabletop Environment loads the pre-trained weights of ViLDs. https://github.com/google-research/google-research/blob/master/saycan/SayCan-Robot-Pick-Place.ipynb > # ViLD pretrained model weights. > !gsutil cp -r gs://cloud-tpu-checkpoints/detection/projects/vild/colab/image_path_v2 . / Are the weights here pre-trained by Google Research? I was curious because I could not find any weights named image_path_v2 in the official ViLD github repository. https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/vild