Today we're excited to announce Nomic Embed Text V2, our next-generation embedding model that brings the Mixture of Experts (MoE) architecture to text embeddings on a new expanded multilingual training dataset.
Building on our previous embedding model Nomic Embed Text V1, this release advances the performance and efficiency of Nomic Embed while maintaining our commitment to open source.
Highlights about Nomic Embed Text V2:
• Introduces the first embedding model based on the Mixture-of-Experts architecture
• Supports multilingual applications, being trained on dozens of languages (see image below)
• Strong performance on BEIR and MIRACL benchmarks relative to parameter class
• We're open sourcing the pretraining data, finetuning data, and training code
Nomic Embed Text V2 joins our series of embedding models in general availability for production workloads through the Nomic Atlas Embedding API and is enterprise-ready via our fully secure and compliant Nomic Atlas Enterprise offering.
On the BEIR and MIRACL benchmarks, Nomic Embed Text V2 demonstrates strong performance against current state-of-the-art multilingual embedding models. Our model achieves competitive results while maintaining a significantly smaller parameter footprint through its MoE architecture.
The reason we trained Nomic Embed Text V2 with the MoE architecture: to have a model with great performance that improves speed and reduces memory usage during training and inference by reducing the number of active parameters.
Rather than a dense model which uses all parameters on an input, the MoE architecture dynamically routes to different "experts" - sparse subsets of parameters at each layer - activating, ideally, only the parameters especially needed to process the input. This approach allows for more efficient use of compute when generating embeddings.
In our experiments, we found that alternating MoE layers with 8 experts and top-2 routing provides the optimal balance between performance and efficiency. This results in 475M total parameters in the model, but only 305M active during training and inference.
Research into embedding model architecture has significant practical implications for working with text embeddings in production:
• Lower latency for high-volume applications of embeddings like retrieval
• Reduced deployment costs through more efficient parameter usage
• More accessibility to embeddings in settings with constrained compute
Like its predecessor Nomic Embed Text V1.5, V2 incorporates Matryoshka representation learning, enabling dimension truncation from 768 to 256 dimensions while maintaining embedding quality - further reducing computation and storage costs in production deployments.
Our training data curation involves consistency filtering on a multilingual corpus derived from mC4 and multilingual CC News. This eliminates low-quality or misaligned text pairs from the training set, and yields 1.6 billion high-quality pairs across multiple languages.
As with V1 of Nomic Embed, our training code and data are open-sourced for transparency and reproducibility in the contrastors repository on GitHub.
Embedding models help you build systems for semantic search, RAG, clustering, and more - and Nomic embedding models in particular have a proven track record of both highly performant and efficient relative to other models its size. We wrote about the many applications of embeddings beyond RAG in our blog for you to learn more about why Nomic works on embeddings and how they power multiple aspects of our products.
Nomic Embed Text V2 will be available for use in Nomic Atlas and GPT4All. For commercial inquiries, please contact sales.
Nomic Embed Text V2 is available through multiple popular AI/ML frameworks and platforms.
pip install torch transformers
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v2-moe")
model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v2-moe", trust_remote_code=True)
sentences = ['search_document: Hello!', 'search_document: ¡Hola!']
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
model.eval()
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
pip install sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"nomic-ai/nomic-embed-text-v2-moe",
trust_remote_code=True
)
sentences = ["Hello!", "¡Hola!"]
embeddings = model.encode(sentences, prompt_name="passage")
pip install langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings
model = HuggingFaceEmbeddings(
model_name="nomic-ai/nomic-embed-text-v2-moe",
model_kwargs={'trust_remote_code': True},
encode_kwargs={'normalize_embeddings': True}
)
sentences = ["search_document: Hello!", "search_document: ¡Hola!"]
embeddings = model.embed_documents(sentences)
pip install llama-index-embeddings-huggingface
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
model = HuggingFaceEmbedding(
model_name="nomic-ai/nomic-embed-text-v2-moe",
trust_remote_code=True
)
sentences = ["search_document: Hello!", "search_document: ¡Hola!"]
embeddings = model.get_text_embedding_batch(sentences)
Nomic Embed Text V2 will be powering embeddings and RAG for chatting with LocalDocs in GPT4All soon!
Nomic Embed Text V2 will be available for download and inference with Ollama soon!