Documentation Index
Fetch the complete documentation index at: https://docs.pflow.run/llms.txt
Use this file to discover all available pages before exploring further.
For the curious. Your AI agent declares cache blocks when a workflow has multiple LLM calls sharing context. This explains what those declarations do and why they reduce cost.
Two cache layers
pflow has two independent cache layers. Don’t confuse them — they solve different problems.| Layer | What it is | How to opt out |
|---|---|---|
| Memo cache | pflow’s local re-execution cache. Skips a node entirely if its inputs match a prior run. | cache: false per node, or --no-cache on pflow run |
| Provider prompt cache | Anthropic / OpenAI / Gemini server-side caching of the system prompt prefix. Reduces input cost; the LLM still runs. | Don’t declare ## Cache / prompt_cache: |
--no-cache only disables the memo layer. Provider prompt caching still fires when declared.
Declaring shared context
The## Cache block sits alongside ## Inputs and ## Steps in a workflow file. It lists stable values that flow into multiple LLM calls — workflow inputs and upstream node outputs:
${var} reference. The prose travels into the cached system prefix verbatim — what you write is what the LLM sees as the cache label.
LLM nodes opt in via prompt_cache:, listing chunks in the same order they appear in the ## Cache block:
TTL
| Value | Meaning | When to use |
|---|---|---|
5m (default) | Provider’s standard cache duration | Most workflows; matches typical run time |
1h | Anthropic 1-hour cache, Gemini 3600s, OpenAI 24h retention | Long-running workflows or reruns within an hour |
1h writes cost roughly 2× the standard write rate on Anthropic, so it pays off when the cached prefix gets read at least 3 times within the hour.
Minimum tokens
Provider caches only fire above a minimum token threshold:| Provider | Threshold |
|---|---|
| Anthropic Sonnet 4.5, Opus 4.1, Sonnet 3.7 | 1024 |
| Anthropic Sonnet 4.6, Haiku 3.5 | 2048 |
| Anthropic Opus 4.5+, Haiku 4.5 | 4096 |
| Gemini Flash | 1024 |
| OpenAI auto-cache | 1024 |
pflow analyze-cache warns when a declared subset is below threshold and suggests including more chunks to cross it.
Batch prefix caching
When a batch node fans out N parallel LLM calls that share a stable prefix (e.g., the same prompt template with${item.X} substituted), pflow can cache that prefix automatically. This requires prewarm: true on the batch node:
prewarm: true, pflow runs the first item synchronously to write the cache, then fans out the remaining N-1 calls in parallel as cache reads. Without prewarm:, all N calls write the cache simultaneously — paying the write cost N times for no read benefit.
pflow run --dry-run and pflow analyze-cache recommend prewarm: true when the savings ratio crosses 5%. The decision is the agent’s — prewarm: true adds one call’s latency to the batch in exchange for ~5-10× cost reduction on the remaining items.
Sub-workflows
Each.pflow.md file declares its own ## Cache block scoped to its own inputs and step outputs. Sub-workflows do not inherit the parent’s cache block — they declare independently so they can run standalone with caching.
When a parent passes a value into a child workflow, both files can cache it independently. If the rendered prose labels are byte-identical across the boundary, the provider’s cache fires across files (incidental, not orchestrated). pflow analyze-cache warns when prose labels diverge for the same logical value.
Discovering opportunities
pflow analyze-cache is the entry point for finding savings:
## Cache block for greenfield workflows. See the analyze-cache reference.
pflow run --dry-run emits a one-line nudge when actionable opportunities exist — silent on optimal plans.
What changes when caching is declared
- The system message your LLM call receives starts with the rendered cache content (prose + values), followed by your
prompt:. - pflow’s memo cache key includes the rendered cache content, so changing a cached chunk’s value invalidates memo entries correctly.
- Trace files record cache token counts (
cache_creation_input_tokens,cache_read_input_tokens) so you can see actual vs predicted ratios viapflow analyze-cache --from-trace.

