Agent commands. Your AI agent uses this node in workflows. You don’t configure it directly.
The LLM node calls AI models using LiteLLM. Use it when a step needs reasoning — summarization, classification, extraction, anything a Python expression can’t handle. It supports many providers (OpenAI, Anthropic, Google, OpenRouter, Ollama, and 100+ more) through a unified interface.
Parameters
| Parameter | Type | Required | Default | Description |
|---|
prompt | str | Yes | - | Text prompt, or path to an external file (e.g., ./prompts/system.md) |
model | str | No | See below | Model identifier |
system | str | No | - | System prompt for behavior guidance |
temperature | float | No | 1.0 | Sampling temperature (0.0-2.0) |
max_tokens | int | No | - | Response-length ceiling. On reasoning models it caps thinking depth but never raises it — see Reasoning depth |
reasoning_effort | str | No | - | xhigh/high/medium/low/minimal/none — how hard the model thinks |
reasoning_max_tokens | int | No | - | Explicit thinking-token budget (mutually exclusive with reasoning_effort) |
images | list | No | [] | Image URLs or file paths for vision models |
output_schema | dict | No | - | JSON Schema for structured output |
prompt_cache | list | No | [] | Names of ## Cache chunks to include as a cached system prefix. See Prompt caching |
prewarm | bool | No | false | On batch nodes: make one short LLM call before the batch dispatches to warm up the provider’s cache, so every item runs at cache-read prices. Warms both declared ## Cache content (when prompt_cache: is set) and the fixed portion of the prompt template (the text before each ${item.X} reference). |
Model resolution
If model is not specified in workflow params, pflow auto-detects based on your configured API keys.
Most users just need an API key:
pflow settings set-env OPENAI_API_KEY "sk-..."
See LLM model settings for the full resolution order and default models per provider.
Reasoning depth
reasoning_effort controls how hard a reasoning model thinks. max_tokens controls how long the response may get. They’re separate dials, and pflow keeps them that way: effort sets the thinking budget, and max_tokens only ever caps it — raising max_tokens to leave room for a longer answer never inflates reasoning spend.
This matters because Anthropic counts thinking and answer against one max_tokens pool and rejects any request where the thinking budget isn’t strictly smaller than max_tokens. pflow always derives the budget to sit under max_tokens, so that rejection can’t happen — whether you set reasoning_effort or an explicit reasoning_max_tokens. If you set both reasoning_max_tokens and a smaller max_tokens, the budget is capped to fit (the explicit budget is a request, not a guarantee).
One consequence on reasoning models: if you omit max_tokens, the provider may cap the visible answer low (LiteLLM defaults it to the thinking budget plus ~4096). Set max_tokens explicitly when you need a long answer from a reasoning model.
The same reasoning_effort value maps to provider-specific knobs under the hood — Anthropic and Gemini 2.5 get a token budget, OpenAI and Gemini 3 get their native effort/level. Models without a reasoning knob ignore the parameter.
Output
| Key | Type | Description |
|---|
response | str or dict | Text response (str), or parsed JSON (dict) when output_schema is set |
llm_usage | dict | Token usage metrics |
error | str | Error message (only present on failure) |
Token usage structure
{
"model": "openai/gpt-5.2",
"input_tokens": 150,
"uncached_input_tokens": 150,
"output_tokens": 89,
"total_tokens": 239,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"input_token_accounting": "total_includes_cache"
}
Model support
These providers are included with pflow - just set your API key:
Always include the provider prefix in the model: field — bare names route inconsistently (Gemini bare names try Vertex; OpenAI bare names usually work but aren’t future-proof).
| Provider | Example models |
|---|
| OpenAI | openai/gpt-5.2, openai/gpt-5.1, openai/gpt-4o |
| Anthropic | anthropic/claude-opus-4-5, anthropic/claude-sonnet-4-5, anthropic/claude-haiku-4-5 |
| Google | gemini/gemini-3.0-pro, gemini/gemini-2.5-flash |
# Set API keys (stored in ~/.pflow/settings.json)
pflow settings set-env OPENAI_API_KEY "sk-..."
pflow settings set-env ANTHROPIC_API_KEY "sk-ant-..."
pflow settings set-env GEMINI_API_KEY "..."
Other providers
LiteLLM is built into pflow and recognizes 100+ providers natively — no plugin install needed. Set the appropriate API key (or omit it for Ollama) and reference the model with its provider prefix.
OpenRouter
pflow settings set-env OPENROUTER_API_KEY "sk-or-..."
### summarize
Summarize content using OpenRouter.
- type: llm
- model: openrouter/anthropic/claude-sonnet-4-5
- prompt: Summarize this
Ollama (local models)
brew install ollama
ollama serve
ollama pull llama3.2
### summarize
Summarize content using a local model.
- type: llm
- model: ollama/llama3.2
- prompt: Summarize this
See the LiteLLM provider list for the full set of supported providers (Mistral, Bedrock, Azure OpenAI, Vertex AI, vLLM, and more).
Image support
For vision-capable models, pass image URLs or local file paths:
### describe
Describe the contents of a photo.
- type: llm
- prompt: What's in this image?
- model: openai/gpt-5.2
- images: ["photo.jpg"]
Supported formats: JPEG, PNG, GIF, WebP, PDF
Images can be:
- Local file paths:
photo.jpg, /path/to/image.png
- URLs:
https://example.com/image.jpg
Examples
Basic prompt
### summarize
Summarize the content from the previous step.
- type: llm
- prompt: Summarize: ${read.content}
- model: openai/gpt-4o-mini
With system prompt
### translate
Translate the input text to Spanish.
- type: llm
- system: You are a translator. Respond only with the translation.
- prompt: Translate to Spanish: ${input.text}
- temperature: 0.3
Structured output
Use output_schema to get guaranteed JSON matching a schema. The schema is passed to the model’s constrained decoding API — the model literally cannot produce non-conforming output.
### extract
Extract named entities from the document.
- type: llm
- prompt: Extract entities from: ${document.content}
- temperature: 0
```yaml output_schema
type: object
properties:
people:
type: array
items:
type: string
places:
type: array
items:
type: string
required:
- people
- places
```
When output_schema is set, response is a dict — downstream templates access fields directly: ${extract.response.people}.
Without output_schema, you can still get JSON by prompting for it. The template system auto-parses JSON strings when you use dot notation: ${extract.response.people}. But the model may not always comply — output_schema is the reliable approach.
Image analysis
### analyze
Analyze the contents of a user-provided image.
- type: llm
- prompt: Describe the main elements in this image
- model: openai/gpt-5.2
- images: ["${file_path}"]
External prompt file
For long or reusable prompts, reference an external file instead of inlining. The file path is relative to the workflow file. Template variables (${var}) inside the file are resolved normally.
### analyze
Analyze source code for issues.
- type: llm
- prompt: ./prompts/code-review.md
Error handling
| Error | Cause | Solution |
|---|
| Unknown model | Model ID not recognized | Run pflow settings llm show to see configured models, or check the LiteLLM provider list for supported model strings |
| API key required | Missing credentials | Set with pflow settings set-env <PROVIDER>_API_KEY <value> or export <PROVIDER>_API_KEY=... |
| Rate limit | Too many requests | Wait and retry automatically (built-in retry) |
The node retries transient failures automatically (3 attempts, 1 second wait).