Skip to main content
Agent commands. Your AI agent uses this node in workflows. You don’t configure it directly.
The LLM node calls AI models using LiteLLM. Use it when a step needs reasoning — summarization, classification, extraction, anything a Python expression can’t handle. It supports many providers (OpenAI, Anthropic, Google, OpenRouter, Ollama, and 100+ more) through a unified interface.

Parameters

ParameterTypeRequiredDefaultDescription
promptstrYes-Text prompt, or path to an external file (e.g., ./prompts/system.md)
modelstrNoSee belowModel identifier
systemstrNo-System prompt for behavior guidance
temperaturefloatNo1.0Sampling temperature (0.0-2.0)
max_tokensintNo-Response-length ceiling. On reasoning models it caps thinking depth but never raises it — see Reasoning depth
reasoning_effortstrNo-xhigh/high/medium/low/minimal/none — how hard the model thinks
reasoning_max_tokensintNo-Explicit thinking-token budget (mutually exclusive with reasoning_effort)
imageslistNo[]Image URLs or file paths for vision models
output_schemadictNo-JSON Schema for structured output
prompt_cachelistNo[]Names of ## Cache chunks to include as a cached system prefix. See Prompt caching
prewarmboolNofalseOn batch nodes: make one short LLM call before the batch dispatches to warm up the provider’s cache, so every item runs at cache-read prices. Warms both declared ## Cache content (when prompt_cache: is set) and the fixed portion of the prompt template (the text before each ${item.X} reference).

Model resolution

If model is not specified in workflow params, pflow auto-detects based on your configured API keys. Most users just need an API key:
pflow settings set-env OPENAI_API_KEY "sk-..."
See LLM model settings for the full resolution order and default models per provider.

Reasoning depth

reasoning_effort controls how hard a reasoning model thinks. max_tokens controls how long the response may get. They’re separate dials, and pflow keeps them that way: effort sets the thinking budget, and max_tokens only ever caps it — raising max_tokens to leave room for a longer answer never inflates reasoning spend. This matters because Anthropic counts thinking and answer against one max_tokens pool and rejects any request where the thinking budget isn’t strictly smaller than max_tokens. pflow always derives the budget to sit under max_tokens, so that rejection can’t happen — whether you set reasoning_effort or an explicit reasoning_max_tokens. If you set both reasoning_max_tokens and a smaller max_tokens, the budget is capped to fit (the explicit budget is a request, not a guarantee). One consequence on reasoning models: if you omit max_tokens, the provider may cap the visible answer low (LiteLLM defaults it to the thinking budget plus ~4096). Set max_tokens explicitly when you need a long answer from a reasoning model. The same reasoning_effort value maps to provider-specific knobs under the hood — Anthropic and Gemini 2.5 get a token budget, OpenAI and Gemini 3 get their native effort/level. Models without a reasoning knob ignore the parameter.

Output

KeyTypeDescription
responsestr or dictText response (str), or parsed JSON (dict) when output_schema is set
llm_usagedictToken usage metrics
errorstrError message (only present on failure)

Token usage structure

{
  "model": "openai/gpt-5.2",
  "input_tokens": 150,
  "uncached_input_tokens": 150,
  "output_tokens": 89,
  "total_tokens": 239,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "input_token_accounting": "total_includes_cache"
}

Model support

These providers are included with pflow - just set your API key: Always include the provider prefix in the model: field — bare names route inconsistently (Gemini bare names try Vertex; OpenAI bare names usually work but aren’t future-proof).
ProviderExample models
OpenAIopenai/gpt-5.2, openai/gpt-5.1, openai/gpt-4o
Anthropicanthropic/claude-opus-4-5, anthropic/claude-sonnet-4-5, anthropic/claude-haiku-4-5
Googlegemini/gemini-3.0-pro, gemini/gemini-2.5-flash
# Set API keys (stored in ~/.pflow/settings.json)
pflow settings set-env OPENAI_API_KEY "sk-..."
pflow settings set-env ANTHROPIC_API_KEY "sk-ant-..."
pflow settings set-env GEMINI_API_KEY "..."

Other providers

LiteLLM is built into pflow and recognizes 100+ providers natively — no plugin install needed. Set the appropriate API key (or omit it for Ollama) and reference the model with its provider prefix.

OpenRouter

pflow settings set-env OPENROUTER_API_KEY "sk-or-..."
### summarize

Summarize content using OpenRouter.

- type: llm
- model: openrouter/anthropic/claude-sonnet-4-5
- prompt: Summarize this

Ollama (local models)

brew install ollama
ollama serve
ollama pull llama3.2
### summarize

Summarize content using a local model.

- type: llm
- model: ollama/llama3.2
- prompt: Summarize this
See the LiteLLM provider list for the full set of supported providers (Mistral, Bedrock, Azure OpenAI, Vertex AI, vLLM, and more).

Image support

For vision-capable models, pass image URLs or local file paths:
### describe

Describe the contents of a photo.

- type: llm
- prompt: What's in this image?
- model: openai/gpt-5.2
- images: ["photo.jpg"]
Supported formats: JPEG, PNG, GIF, WebP, PDF Images can be:
  • Local file paths: photo.jpg, /path/to/image.png
  • URLs: https://example.com/image.jpg

Examples

Basic prompt

### summarize

Summarize the content from the previous step.

- type: llm
- prompt: Summarize: ${read.content}
- model: openai/gpt-4o-mini

With system prompt

### translate

Translate the input text to Spanish.

- type: llm
- system: You are a translator. Respond only with the translation.
- prompt: Translate to Spanish: ${input.text}
- temperature: 0.3

Structured output

Use output_schema to get guaranteed JSON matching a schema. The schema is passed to the model’s constrained decoding API — the model literally cannot produce non-conforming output.
### extract

Extract named entities from the document.

- type: llm
- prompt: Extract entities from: ${document.content}
- temperature: 0

```yaml output_schema
type: object
properties:
  people:
    type: array
    items:
      type: string
  places:
    type: array
    items:
      type: string
required:
  - people
  - places
```
When output_schema is set, response is a dict — downstream templates access fields directly: ${extract.response.people}. Without output_schema, you can still get JSON by prompting for it. The template system auto-parses JSON strings when you use dot notation: ${extract.response.people}. But the model may not always comply — output_schema is the reliable approach.

Image analysis

### analyze

Analyze the contents of a user-provided image.

- type: llm
- prompt: Describe the main elements in this image
- model: openai/gpt-5.2
- images: ["${file_path}"]

External prompt file

For long or reusable prompts, reference an external file instead of inlining. The file path is relative to the workflow file. Template variables (${var}) inside the file are resolved normally.
### analyze

Analyze source code for issues.

- type: llm
- prompt: ./prompts/code-review.md

Error handling

ErrorCauseSolution
Unknown modelModel ID not recognizedRun pflow settings llm show to see configured models, or check the LiteLLM provider list for supported model strings
API key requiredMissing credentialsSet with pflow settings set-env <PROVIDER>_API_KEY <value> or export <PROVIDER>_API_KEY=...
Rate limitToo many requestsWait and retry automatically (built-in retry)
The node retries transient failures automatically (3 attempts, 1 second wait).