AI LLM Python

How I Use ngrok with Local LLMs (And Why You Should Too)

By Alex Kholodniak

August 2, 2025 7 min read

Last month, I was working on a project that required running a local LLM with ngrok, and I ran into a problem that many of you have probably faced: how do you share your locally running model with teammates or access it from your phone while you’re away from your desk? Using ngrok local LLM setup has completely transformed my development workflow.

After some research and quite a bit of trial and error, I discovered that ngrok local LLM integration is an absolute game-changer for this use case. I’ve been using this setup for several weeks now, and I wanted to share what I’ve learned because it’s made my development workflow so much smoother.

Why I Started Running LLMs Locally

Before we dive into the technical stuff, let me explain why I went down this path in the first place. I was getting tired of API rate limits, wanted more control over my data, and honestly, the cost savings are significant when you’re doing a lot of experimentation.

The problem? Once you have a model running on your local machine, you’re pretty much stuck using it only on that machine. That’s where ngrok comes in.

What Exactly is ngrok for Local LLMs?

If you haven’t used ngrok before, think of it as a secure tunnel from the internet to your local machine. It gives you a public URL that forwards traffic to whatever you’re running locally – in our case, a local LLM.

I use it because it solves these real problems I was facing:

I couldn’t test my LLM integrations on my phone
My teammates couldn’t access the model I was running
I couldn’t demo my work without bringing my entire laptop
Setting up proper deployment just for testing felt like overkill

My Current Local LLM Setup

I’ve tried a few different approaches, and here’s what I’m currently using:

Ollama (My Go-To Choice)

I started with Ollama because it’s ridiculously easy to set up. If you haven’t tried it yet, here’s literally all you need to do:

# This took me about 2 minutes to set up
curl -fsSL https://ollama.ai/install.sh | sh

# I usually run Llama 2, but you can use whatever model you prefer
ollama pull llama2
ollama serve

Ollama runs on port 11434 by default. I keep this running in a terminal tab pretty much all the time now.

LM Studio (When I Want a GUI)

Sometimes I want to tweak settings or try different models quickly. LM Studio is perfect for this – it’s got a clean interface and handles the technical stuff for you. It typically runs on port 1234.

Text Generation WebUI (For the Heavy Lifting)

When I need more control or want to run larger models, I use oobabooga’s text-generation-webui. It’s more complex to set up, but the customization options are incredible:

# Fair warning: this setup took me longer than I expected the first time
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
python server.py --listen --api

This usually runs on port 7860, and I love how much you can configure.

Setting Up ngrok Local LLM Integration

Here’s how I got ngrok working on my system:

Installation

I’m on macOS, so I used Homebrew:

brew install ngrok

If you’re on Windows or Linux, grab it from ngrok.com. The installation is straightforward.

The Authentication Step I Almost Forgot

You need to sign up for an ngrok account (free tier works fine for most use cases) and grab your auth token. Then run:

ngrok config add-authtoken YOUR_AUTHTOKEN

I forgot this step initially and was confused why nothing was working!

Exposing Your ngrok Local LLM

This is where the ngrok local LLM setup really shines. To expose your local LLM, you just need one command:

# For Ollama
ngrok http 11434

# For LM Studio  
ngrok http 1234

# For Text Generation WebUI
ngrok http 7860

When you run this, ngrok gives you output like:

Forwarding    https://abc123.ngrok.io -> http://localhost:11434

That HTTPS URL is your golden ticket – anyone can now access your local LLM through it.

Testing That Everything Works

I always test my setup with a quick curl command. Here’s what I use for Ollama:

curl https://your-ngrok-url.ngrok.io/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "Explain why this setup is so useful",
    "stream": false
  }'

The first time I saw this work, getting a response from my local model through a public URL, I was genuinely excited. It felt like magic.

Tricks I’ve Learned Along the Way

Custom Domains (If You Upgrade)

I eventually upgraded to a paid plan because I wanted consistent URLs. With a custom domain, you can do:

ngrok http 11434 --domain=my-llm.mydomain.com

This is especially useful if you’re building something that needs to remember the webhook URL.

Adding Basic Security

Early on, I realized anyone with my ngrok URL could use my LLM. That’s not always what you want, so I started using basic auth:

ngrok http 11434 --basic-auth="myusername:mypassword"

My ngrok Config File

I got tired of typing the same commands, so I created an ngrok.yml file:

version: "2"
authtoken: YOUR_AUTHTOKEN
tunnels:
  llm:
    proto: http
    addr: 11434
    auth: "myuser:mypass"
    inspect: false

Now I just run ngrok start llm and I’m good to go.

Security Stuff You Should Know

I learned some of this the hard way, so let me save you some trouble:

Always Use Authentication for Anything Important

I made the mistake of leaving a tunnel open without auth once. My LLM got hammered with requests and my laptop fan went crazy. Don’t be like me – always add some form of authentication.

Monitor Your Usage

ngrok has a web interface at http://localhost:4040 that shows all the requests coming through. I keep this open in a browser tab because it’s fascinating to see what’s happening in real-time.

Watch Your Local Resources

Remember that every request through ngrok is using your local machine’s resources. I learned to keep an eye on CPU and memory usage, especially when sharing the URL with others.

Real Examples I Actually Use

Here’s some Python code I use regularly to interact with my ngrok’ed LLM:

import requests

def ask_my_llm(question):
    # I keep this URL in an environment variable
    ngrok_url = "https://my-current-ngrok-url.ngrok.io"
    
    response = requests.post(
        f"{ngrok_url}/api/generate",
        json={
            "model": "llama2",
            "prompt": question,
            "stream": False
        }
    )
    
    return response.json()["response"]

# I use this for quick testing
answer = ask_my_llm("What's the weather like on Mars?")
print(answer)

Problems I’ve Encountered (And How I Fixed Them)

“Connection Refused” Errors

This usually means I forgot to start my LLM server before running ngrok. Simple fix, but I’ve done it more times than I care to admit.

Slow Response Times

Local LLMs can be slow, especially larger models on older hardware. I’ve found that using smaller, quantized models helps a lot. The responses are still good, but much faster.

The Tunnel Keeps Disconnecting

This happens with the free tier after some time. For development, I just restart it. For anything more permanent, the paid plans have persistent tunnels.

When I Don’t Use ngrok

Don’t get me wrong – ngrok is fantastic for development and testing, but I wouldn’t use it for production. For that, I’d go with:

A proper VPS deployment
Docker containers on cloud platforms
Cloudflare Tunnels (free alternative to ngrok)

My Current Workflow

Here’s how I typically work with this setup:

Start my local LLM (usually Ollama)
Fire up ngrok with my config
Test with a quick curl command
Share the URL with teammates or use it in my applications
Monitor usage through the ngrok web interface

It’s become such a natural part of my development process that I barely think about it anymore.

Wrapping Up

Setting up ngrok with local LLMs has genuinely improved how I work with AI models. The ability to quickly share, test, and integrate local models has sped up my development process significantly.

If you’re running local LLMs and haven’t tried this yet, I highly recommend giving it a shot. Start with the basic setup I’ve outlined here, and then experiment with the more advanced features as you get comfortable.

The first time you pull up your local LLM on your phone or share it with a colleague, you’ll understand why I’m so enthusiastic about this approach.

AI Ruby