Background image

Author: GPT4All Team & OpenLit Team

Observability For LLMs With GPT4All & OpenLIT Telemetry

LLM Observability & Telemetry with OpenLIT+GPT4All in Python

GPT4All allows anyone to download and run LLMs offline, locally & privately, across various hardware platforms.

We've integrated the GPT4All Python client with OpenLIT, an OpenTelemetry-native tool designed for complete observability over your LLM Stack, from models to GPUs. We hope this can streamline gathering useful technical data to support anyone building applications with LLMs in Python.

Why LLM Observability?

Here are a few benefits that come with building LLM applications with observability:

OpenLIT: Advanced LLM Monitoring

OpenLIT enhances applications that use the GPT4All Python client by providing comprehensive monitoring and observability features, including:

OpenLIT Telemetry

Integrating OpenLIT with GPT4All in Python

Install OpenLIT & GPT4All:

pip install openlit gpt4all          

Initialize OpenLIT in your GPT4All application:

import openlit                                                                 
from gpt4all import GPT4All                                                    
                                                                                
openlit.init()                 
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
    print(model.generate("Why are GPUs fast?", max_tokens=1024))                
                                                                                
# rest of your GPT4All code…    

Optionally, enable GPU monitoring:

openlit.init(collect_gpu_stats=True)                                                                  

That’s it!

The OpenLIT SDK directs the trace directly to your console, which can be useful during development. To forward telemetry data to an HTTP OpenTelemetry endpoint, such as the OpenTelemetry Collector, set the otlp_endpoint parameter with the desired endpoint in openlit.init(). Alternatively, you can configure the endpoint by setting the OTEL_EXPORTER_OTLP_ENDPOINT environment variable as recommended in the OpenTelemetry documentation.

To send telemetry to OpenTelemetry backends requiring authentication, set the otlp_headers parameter with its desired value. Alternatively, you can configure the endpoint by setting the OTEL_EXPORTER_OTLP_HEADERS environment variable as recommended in the OpenTelemetry documentation. You can also refer to the OpenLIT quickstart for GPT4All here.

Making Sense of the Collected Monitoring Data

You have two options for analyzing your monitoring data: self-hosting the OpenLIT UI or sending the collected traces and metrics to your preferred observability tool, such as Grafana or Elastic. The instructions for self-hosting the OpenLIT UI can be found here.

The collected traces and metrics offer a comprehensive overview of system performance across eight crucial areas:

These metrics are invaluable for identifying peak usage times, latency issues, rate limits, and resource allocation. They facilitate performance tuning and cost management. For instance, prompt and completion monitoring allows people to analyze and monitor prompts and responses over time, leading to improvements in prompt structuring and response accuracy.

By providing a detailed breakdown of LLM performance, these metrics ensure consistent operation across different environments, help in budgeting, and aid in troubleshooting issues. Ultimately, this optimizes overall system efficiency.

Get In Touch

If you have any questions, you can reach out to Nomic on Discord. GPT4All also has enterprise offerings for running LLMs in desktops at scale for your business - let us know if you are interested here.

To learn more about OpenLIT, you can visit the OpenLIT github repo & reach out to the OpenLIT team on Slack.

nomic logo
nomic logonomic logo nomic logo nomic logonomic logonomic logo nomic logo nomic logo
“Henceforth, it is the map that precedes the territory” – Jean Baudrillard