Nomic Blog: Scaling Inference Time Compute with On-Device Language Models in GPT4All

Author: GPT4All Team

Scaling Inference Time Compute with On-Device Language Models in GPT4All

Introducing the GPT4All Reasoning System and the GPT4All Reasoner v1 model, bringing cutting-edge on-device inference-time compute capabilities to the GPT4All platform. This Reasoning release improves local AI capabilities, empowering users with advanced features such as Code Interpreter, Tool Calling, and Code Sandboxing — all running securely on your own hardware.

What is On-Device Inference-Time Compute?

Inference-time compute enables LLMs to iterate on their outputs during execution, improving reasoning, accuracy, and context comprehension. Previously limited to server-side LLMs, this technique is now available directly on your laptop, unlocking a new level of on-device AI performance.

How many 'r's are their in the word strawberry?

Key Features in the GPT4All Reasoning System:

Reasoning System and Models: Designed specifically for combining iterative LLM outputs, chain of thought and tool calls for solving harder problems.
Code Interpreter: Execute code inline with your prompts for advanced problem-solving.
Tool Calling: Seamlessly interact with external tools to enhance your workflows.
Code Sandboxing: Run secure, platform-agnostic code tool calls directly on your device.

Secure, Local Code Execution with the Javascript Sandbox

This release introduces the GPT4All Javascript Sandbox, a secure and isolated environment for executing code tool calls. When using Reasoning models equipped with Code Interpreter capabilities, all code runs safely in this sandbox, ensuring user security and multi-platform compatibility.

The Javascript Sandbox is the backbone of tool-based workflows in GPT4All, enabling:

Data analysis: Process datasets locally without exposing them to the cloud.
Advanced Questions: Ask questions that require complex calculations.
Automations: Write and execute scripts that enhance your productivity.

Getting Started with On-Device Reasoning Models

To start using the new capabilities of GPT4All Reasoner v1:

Install the GPT4All App: Download the latest version of the app from our official site.
Go to Models: Open the models menu and select "Reasoning."
Enable Reasoning Mode: These custom-configured LLMs are optimized to work exclusively with GPT4All’s Reasoning system, giving you access to iterative inference-time compute.

Note: GPT4All Reasoner v1 is a modified version of Qwen Coder 7B that works with the GPT4All Reasoning System.

GPT4All Reasoning Examples

The following examples illustrate situations where GPT4All's Reasoning system accomplishes tasks that the underlying 7B parameter post-trained model cannot perform.

How many days left until Christmas?

Counting the number of 'r's in strawberry related queries.

Counting is easy when you can tool call into the code sandbox!

Approximating Integrals

Playing with Prime Numbers

Synthetic Data Generation with a Remote Endpoint

Using Reasoning with Remote Models

You can use the GPT4All Reasoning system in conjunction with any model hosted at an OpenAI-compatible endpoint. To set this up:

Configure a remote hosted model in GPT4All.
Copy a reasoning template into the model.

What's Next and Current Limitations

On-device inference-time scaling improves local LLM capabilities without increasing model size. Any open-source language model can be configured to work with the GPT4All Reasoning system.

The first version of Reasoning demonstrates that small language models, equipped with inference-time compute infrastructure, punch far beyond their parameter class accomplishing tasks usually reserved for larger models.

Subsequent versions will introduce expanded tool-use capabilities and improvements to inference-time iteration algorithms.

We do not provide a comprehensive quantitative evaluation of the system at this time but hope to do so in the near future.