Author: Nomic Team

Nomic Embed Text V2: An Open Source, Multilingual, Mixture-of-Experts Embedding Model

Today we're excited to announce Nomic Embed Text V2, our next-generation embedding model that brings the Mixture of Experts (MoE) architecture to text embeddings on a new expanded multilingual training dataset.

Building on our previous embedding model Nomic Embed Text V1, this release advances the performance and efficiency of Nomic Embed while maintaining our commitment to open source.

Highlights about Nomic Embed Text V2:

• Introduces the first embedding model based on the Mixture-of-Experts architecture

• Supports multilingual applications, being trained on dozens of languages (see image below)

• Strong performance on BEIR and MIRACL benchmarks relative to parameter class

• We're open sourcing the pretraining data, finetuning data, and training code

Nomic Embed Text V2 joins our series of embedding models in general availability for production workloads through the Nomic Atlas Embedding API and is enterprise-ready via our fully secure and compliant Nomic Atlas Enterprise offering.

Benchmarks

On the BEIR and MIRACL benchmarks, Nomic Embed Text V2 demonstrates strong performance against current state-of-the-art multilingual embedding models. Our model achieves competitive results while maintaining a significantly smaller parameter footprint through its MoE architecture.

The reason we trained Nomic Embed Text V2 with the MoE architecture: to have a model with great performance that improves speed and reduces memory usage during training and inference by reducing the number of active parameters.

MoE Architecture

Rather than a dense model which uses all parameters on an input, the MoE architecture dynamically routes to different "experts" - sparse subsets of parameters at each layer - activating, ideally, only the parameters especially needed to process the input. This approach allows for more efficient use of compute when generating embeddings.

In our experiments, we found that alternating MoE layers with 8 experts and top-2 routing provides the optimal balance between performance and efficiency. This results in 475M total parameters in the model, but only 305M active during training and inference.

Research into embedding model architecture has significant practical implications for working with text embeddings in production:

• Lower latency for high-volume applications of embeddings like retrieval

• Reduced deployment costs through more efficient parameter usage

• More accessibility to embeddings in settings with constrained compute

Flexible Dimension

Like its predecessor Nomic Embed Text V1.5, V2 incorporates Matryoshka representation learning, enabling dimension truncation from 768 to 256 dimensions while maintaining embedding quality - further reducing computation and storage costs in production deployments.

Multilingual Training

Our training data curation involves consistency filtering on a multilingual corpus derived from mC4 and multilingual CC News. This eliminates low-quality or misaligned text pairs from the training set, and yields 1.6 billion high-quality pairs across multiple languages.

Nomic Embed Text V2 Multilingual Data — *Breakdown of 1.6 billion data pairs used for multilingual contrastive pretraining*

Training Code and Data

As with V1 of Nomic Embed, our training code and data are open-sourced for transparency and reproducibility in the contrastors repository on GitHub.

What Can You Do With Nomic Embed Text V2?

Embedding models help you build systems for semantic search, RAG, clustering, and more - and Nomic embedding models in particular have a proven track record of both highly performant and efficient relative to other models its size. We wrote about the many applications of embeddings beyond RAG in our blog for you to learn more about why Nomic works on embeddings and how they power multiple aspects of our products.

Nomic Embed Text V2 will be available for use in Nomic Atlas and GPT4All. For commercial inquiries, please contact sales.

Integrations

Nomic Embed Text V2 is available through multiple popular AI/ML frameworks and platforms.

Transformers

pip install torch transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v2-moe")
model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v2-moe", trust_remote_code=True)

sentences = ['search_document: Hello!', 'search_document: ¡Hola!']

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
model.eval()
with torch.no_grad():
    model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)

SentenceTransformers

pip install sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "nomic-ai/nomic-embed-text-v2-moe", 
    trust_remote_code=True
)

sentences = ["Hello!", "¡Hola!"]

embeddings = model.encode(sentences, prompt_name="passage")

LangChain

pip install langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings

model = HuggingFaceEmbeddings(
    model_name="nomic-ai/nomic-embed-text-v2-moe",
    model_kwargs={'trust_remote_code': True},
    encode_kwargs={'normalize_embeddings': True}
)

sentences = ["search_document: Hello!", "search_document: ¡Hola!"]

embeddings = model.embed_documents(sentences)

LlamaIndex

pip install llama-index-embeddings-huggingface

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

model = HuggingFaceEmbedding(
    model_name="nomic-ai/nomic-embed-text-v2-moe", 
    trust_remote_code=True
)

sentences = ["search_document: Hello!", "search_document: ¡Hola!"]

embeddings = model.get_text_embedding_batch(sentences)

GPT4All - coming soon

Nomic Embed Text V2 will be powering embeddings and RAG for chatting with LocalDocs in GPT4All soon!

Ollama - coming soon

Nomic Embed Text V2 will be available for download and inference with Ollama soon!