← Back to blog

February 28, 2026

Narration Forge: adding audiobook playback to ProseForge

I built Narration Forge for a simple reason: I wanted something to listen to on my bike rides. That turned into a real ProseForge feature, a practical TTS bake-off between Orpheus, Qwen3-TTS, and Kokoro, and a clear winner that was fast, light, and actually shippable.

That turned into a real feature inside ProseForge — audiobook-style playback for stories — and a practical engineering question: which text-to-speech stack could actually ship without turning into an infrastructure project?

I ended up evaluating three paths:

  • Orpheus 3B
  • Qwen3-TTS 0.6B
  • Kokoro 82M

In the end, Kokoro won. Not because the others were useless, but because it was the best balance of speed, memory footprint, voice variety, and deployability.

If ProseForge helps turn ideas into stories, Narration Forge helps turn those stories into something you can listen to while walking, commuting, or biking.

That matters to me personally, but it also fits the product. Stories should be easy to revisit in different forms.


Why this feature exists

ProseForge already helps with writing, iteration, and story flow. Narration Forge extends that into playback, so a story is not just something you read on a screen — it becomes something you can listen to while moving through the day.

That was the original motivation. Once I started building it, it became obvious that this was more than a personal convenience. It was a natural extension of the platform.


The TTS journey

Orpheus: impressive quality, painful deployment

Orpheus sounded great locally. It had strong conversational quality and distinct voices, but the deployment path was rough.

The custom RunPod image ballooned to about 52 GB, then crash-looped and was ultimately deleted.

Locally, Orpheus also carried a heavy memory cost at about 5–6 GB RSS.

Qwen3-TTS: cool features, heavier tradeoffs

Qwen3-TTS was the most interesting from a feature perspective. It supports zero-shot voice cloning and a natural-language instruction parameter for style control, which opens up a lot of creative possibilities.

But the tradeoff showed up in runtime characteristics. Locally it was about 1.2 GB on disk, around 2.3 GB RSS, and generally slower than Kokoro, especially when using style instructions.

Kokoro: the one that actually shipped

Kokoro was the practical winner. It is much smaller, much lighter, and much easier to deploy.

The local footprint was roughly:

  • Model size: ~352 MB
  • Memory (RSS): ~607 MB
  • Speed: about 3x faster than real-time on warm CPU requests
  • Voices: 54

On the deployment side, the difference was even clearer. The kokoro-onnx path produced a Docker image of only ~2–3 GB, compared with Orpheus at ~52 GB, and it was the one that actually shipped cleanly.


Smiley’s breakdown, because the stats are hot

A big part of why Kokoro won was the operational profile. Smiley put together a breakdown that made the choice pretty obvious:

Kokoro vs Qwen3 TTS comparison

ModelDownload SizeRuntime MemoryLocal SpeedVoicesDeployment Outcome
Orpheus 3B~2.4 GB~5–6 GB RSS~real-time852 GB image, crash-looped
Qwen3-TTS 0.6B~1.2 GB~2.3 GB RSSslower, especially with style controls9good locally, heavier
Kokoro 82M~352 MB~607 MB RSS~3x faster than real-time54~2–3 GB image, shipped

That is the kind of table that makes engineering decisions easier. Audio quality matters, but when one option is fast, light, and live, that counts for a lot.


What’s next

Narration Forge is landing as part of ProseForge’s broader writing workflow. The immediate goal is simple: turn stories into something you can listen to, not just write.

From there, I want to keep improving:

  • playback polish
  • chapter navigation
  • review flow integration
  • better author-facing controls for narration
  • more refinement around generation speed and quality

This one started because I wanted stories on my bike rides.

Now it is a real ProseForge feature.


Closing thought

Sometimes the best feature ideas are not abstract roadmap exercises. Sometimes they come from wanting a thing badly enough in your own life that you finally build it.

Narration Forge started there.

And now it is becoming part of ProseForge.


Sneak preview

Here’s another early look at the Android player while Narration Forge is still in development:

Narration Forge Android player - teaser

This is still evolving, but it is already far enough along that it feels real.


Thanks to Smiley, who handles DevOps for ProseForge, for helping break down the deployment and runtime tradeoffs across the TTS options.