guides

Why your NAS feels slow over a VPN (it's not your bandwidth)

The speed test says the line is fine, and the share still crawls. Both are true. The tax is round trips times latency, and a worked minute of editing shows exactly where it goes.

Competitor facts checked June 2026; verify before relying on them.

At the studio, the NAS share is fine. Bins open, clips scrub, nobody thinks about storage. Then you take the same project home, connect the VPN, and the same share turns into a different product: folders spin, scrubbing stutters, and a filename search is long enough to make coffee. So you run a speed test, and the speed test comes back fine. The line is fast. The share is slow. Both readings are correct, because the speed test measures the one thing that isn't the problem.

Round trips, not throughput #

SMB is a conversation. Opening a folder is not one request: it is a sequence of opens, attribute checks, and directory listings, each one a message to the server and a wait for the reply. Reading a file adds opens, locks, and a read per chunk. Most of those exchanges depend on the answer to the one before, so the waits stack up in single file.

On a LAN, each of those exchanges costs roughly 0.5 ms, so the chatter is invisible. Through a home VPN, a realistic figure is about 20 ms per round trip (these are the stated model assumptions we use on our performance page, not measurements of your line). Same protocol, same NAS, but every exchange now costs 40 times more, and the protocol makes hundreds of them before any actual media moves. That multiplication is most of the story, and bandwidth barely touches it.

one ask, two wires

modeled June 2026 · assumptions in the text

The ask and the reply are identical on both wires; only the wait changes. At the studio a round trip costs roughly 0.5 ms; through a home VPN the model uses 20 ms, which is 40 times the wait, and SMB strings hundreds of these trips end to end before any media moves. Lane lengths are drawn to that 40-to-1 scale.

One minute of editing, counted #

Our performance page models one ordinary minute at the timeline as six moves: open a project bin of 2,400 items, scrub a hero clip (12 touches, about 48 MB), open a second bin, scrub a B-roll clip (8 touches, about 32 MB), come back to the first clip, then find one clip by name in a library of about 131 K entries.

On the plain-share model (one round trip per 100 directory entries, no persistent client cache, no library index), that minute pays 1,390 wire round trips and about 38 s of waiting at the home-VPN profile. The same minute against local metadata and a block cache pays 20 round trips and about 7.3 s. The labeling matters, so here it is exactly as the site carries it: this is a model, built June 2026, with its assumptions stated; the SMB lane is a category sketch, not a benchmark of any vendor's NAS or client.

the worked minute · waiting, drawn on one time scale

plain SMB share, home-VPN profile 1,390 round trips · ~38 s
local metadata and a block cache 20 round trips · ~7.3 s

modeled June 2026 · category sketch, no vendor benchmarked

The same modeled minute on both lanes, bar lengths proportional to seconds waited: 1,390 round trips and about 38 s on the plain share against 20 round trips and about 7.3 s with local metadata and a block cache. First-touch media pays the wire identically in both lanes; most of the gap is the chatter.

Where the 38 seconds go is the instructive part. The two bin listings cost 24 round trips each. The two first-touch scrubs cost what physics says they must: the round trips plus roughly 80 MB of blocks at line rate, and that cost is identical in both lanes. The revisit of the hero clip costs the share the full price a second time, because nothing was kept. And the name search is the massacre: with no index, the model walks the tree at one round trip per 100 entries, which is 1,310 round trips, about 26 s at 20 ms each, to find one clip.

Why more bandwidth barely helps #

The minute has two cost terms. Bytes divided by bandwidth: the share moves about 128 MB (the 80 MB of scrub data, plus the hero clip's 48 MB a second time on the revisit), call it 10 s at 100 Mbit. And trips times latency: the other ~28 s, pure waiting that happens one reply at a time, in sequence.

latency times chattiness

round trips × round-trip time = waiting

the studio 1,390 × 0.5 ms ≈ 0.7 s invisible inside a minute

home VPN 1,390 × 20 ms ≈ 27.8 s the whole problem

modeled June 2026 · same minute, same trip count

The waiting term, isolated: the same 1,390 trips that hide inside 0.7 s at the studio become 27.8 s through the VPN, before a single block of media moves. Transfer time for the bytes sits on top of both.

Upgrade the home connection to 1 Gbit and the first term drops to about a second. The second term does not move at all, because latency is set by distance and routing, not by the plan you pay for. That is why the upgraded line still feels broken: you bought a wider pipe for a problem that was never about width. The speed test measures the first term and tells you everything is fine. It is measuring the term that was already cheap.

The honest fix menu #

Vendor-neutral, including the ones that don't involve anything we make.

  • fix 01 · proxies

    Cut small editing copies and work against those. Cheap, proven, and real relief for playback. The catch: proxies are not the real media, so conform and grade still touch the originals, and someone owns the proxy pipeline forever.

  • fix 02 · sync-ahead tools

    Dropbox-class sync makes everything local eventually, and local is fast. The architecture is whole-file: Dropbox's own help page says an online-only file will automatically download when you open it (checked June 2026), so opening a 100 GB clip to check one shot means moving 100 GB before the first frame. How the sync-class tools stack up against the streaming-class ones is mapped in the LucidLink alternatives guide.

  • fix 03 · SMB tuning

    Real and bounded. The usual levers are macOS's /etc/nsmb.conf knobs and SMB3 multichannel, and the gains vary by setup. Tuning can trim some chatter, but it cannot remove the protocol's need to round-trip for opens, listings, and locks, and it does nothing about the missing cache or the missing index.

  • fix 04 · a caching gateway

    The industry's convergent answer. Every storage SaaS in this category now ships a cache that runs on hardware the customer supplies, checked June 2026: LucidLink's TeamCache wants what their page calls a commodity on-prem server with SSD storage, Enterprise and quote-only; Suite's Onsite Caching is a listed plan feature pointed at your own SSD or NAS; iconik's Storage Gateway runs on your hardware, packaged even for TrueNAS; Shade's client keeps a block cache and offline pins on your disk. When LucidLink, Suite, iconik, and Shade all sell the same fix, the fix is probably correct: keep the hot blocks and the metadata near the editor.

    What the rent on the managed versions adds up to is a job for the rent-vs-own calculator, and what leaving costs when you want your bytes back is its own post.

whole file vs blocks · one clip, drawn to scale

modeled June 2026 · assumptions in the text

A 100 GB clip as a strip, and the part a three-second look actually needs: three ~4 MB blocks, about 12 MB, or 0.012% of the clip. At true scale the touched sliver would be thinner than the strip's outline, so the marker is drawn about 28 times too wide and the inset is a zoom, not a proportion. A whole-file sync tool moves the entire strip before the first frame; block streaming moves the inset. You can drag the playhead on the performance page and watch the blocks land.

What local metadata and a block cache change #

This is the part where we talk about our own product, flagged as such. JuiceMount keeps file metadata in a local SQLite mirror and file blocks in a local SSD cache, so the chatty half of the protocol never crosses the wire at all. The numbers below are the author's measurements on his own rig (Apple-Silicon Mac to TrueNAS over 10 GbE), with the methodology public, not independent benchmarks.

The waits that replace the round trips
Operation Author-measured
Directory open, 100 K+-entry volume 15–120 msthe same opens take 3–10 s through the raw FUSE mount the cache sits in front of
Filename search, ~131 K entries ~29 ms
Cached revisit reads, off local SSD 226–571 MB/s
Un-pinned read refusal, offline mode 4–67 msinstead of a ~30 s NFS retry hang

Feed those measurements into the same modeled minute and it comes out at the 20 round trips and roughly 7.3 s from earlier, and the honest accounting cuts both ways: the first-touch scrubs still pay the wire in full, once per block, exactly like the share does. The cache only wins on what it already holds. It turns out an editor's minute is mostly revisits, listings, and searches, which is precisely the part a cache and an index can hold.

The physics nobody escapes #

No mount, ours included, beats the line your ISP sells you. Away from the studio, every byte that actually moves arrives through that pipe at that pipe's speed, for every product in this category. The only winning move is to stop paying the wire twice for the same block: pay once on first touch, keep it on local SSD, and answer the listings and searches from an index that never left the laptop. That is the entire trick, and the remote wiring behind it is three Tailscale steps in the docs. Everything else is the speed test telling you the line is fine, and the line agreeing, and the share still spinning.

next step

Drive the worked minute yourself: the performance page's interactive model lets you pick studio, home VPN, or on-the-road profiles and watch the round trips land. If the remote setup is the part you're missing, the docs cover it as three Tailscale steps.

Think the model is unfair to SMB? Discuss on GitHub.

Where these numbers come from