Feature request
I already made a PR for this, but thought I should put this here anyways.
Currently, PromptEmbedding's random initialization uses torch.nn.Embedding to initialize its embeddings. However, because different models have different embedding spaces each with its own manifold of well-defined vocab embeddings, this naive initialization is highly unlikely to produce embeddings that land on the manifold, which leads to really poor learning empirically. From my testing naive random initialization reduces accuracy by almost a factor of 3.
To help visualize, here's a PCA of Llama 3.1 8b's vocab embeddings vs naively randomly initializing embeddings
VS. if we initialized embeddings by say randomly sampling tokens instead (suggested fix/feature).
I implemented this as a new initialization option called RANDOM_DISCRETE. Hopefully the changes are backward compatible
Your contribution
Here's my PR
#2815