> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pflow.run/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM

> Call AI models with prompts and images

<Note>
  **Agent commands.** Your AI agent uses this node in workflows. You don't configure it directly.
</Note>

The LLM node calls AI models using [LiteLLM](https://docs.litellm.ai/). Use it when a step needs reasoning — summarization, classification, extraction, anything a Python expression can't handle. It supports many providers (OpenAI, Anthropic, Google, OpenRouter, Ollama, and 100+ more) through a unified interface.

## Parameters

| Parameter              | Type  | Required | Default   | Description                                                                                                                                                                                                                                                                                                   |
| ---------------------- | ----- | -------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt`               | str   | Yes      | -         | Text prompt, or path to an external file (e.g., `./prompts/system.md`)                                                                                                                                                                                                                                        |
| `model`                | str   | No       | See below | Model identifier                                                                                                                                                                                                                                                                                              |
| `system`               | str   | No       | -         | System prompt for behavior guidance                                                                                                                                                                                                                                                                           |
| `temperature`          | float | No       | `1.0`     | Sampling temperature (0.0-2.0)                                                                                                                                                                                                                                                                                |
| `max_tokens`           | int   | No       | -         | Response-length ceiling. On reasoning models it caps thinking depth but never raises it — see [Reasoning depth](#reasoning-depth)                                                                                                                                                                             |
| `reasoning_effort`     | str   | No       | -         | `xhigh`/`high`/`medium`/`low`/`minimal`/`none` — how hard the model thinks                                                                                                                                                                                                                                    |
| `reasoning_max_tokens` | int   | No       | -         | Explicit thinking-token budget (mutually exclusive with `reasoning_effort`)                                                                                                                                                                                                                                   |
| `images`               | list  | No       | `[]`      | Image URLs or file paths for vision models                                                                                                                                                                                                                                                                    |
| `output_schema`        | dict  | No       | -         | JSON Schema for structured output                                                                                                                                                                                                                                                                             |
| `prompt_cache`         | list  | No       | `[]`      | Names of `## Cache` chunks to include as a cached system prefix. See [Prompt caching](/how-it-works/prompt-caching)                                                                                                                                                                                           |
| `prewarm`              | bool  | No       | `false`   | On batch nodes: make one short LLM call before the batch dispatches to warm up the provider's cache, so every item runs at cache-read prices. Warms both declared `## Cache` content (when `prompt_cache:` is set) and the fixed portion of the prompt template (the text before each `${item.X}` reference). |

### Model resolution

If `model` is not specified in workflow params, pflow auto-detects based on your configured API keys.

Most users just need an API key:

```bash theme={null}
pflow settings set-env OPENAI_API_KEY "sk-..."
```

See [LLM model settings](/reference/cli/settings#llm-model-settings) for the full resolution order and default models per provider.

## Reasoning depth

`reasoning_effort` controls how hard a reasoning model thinks. `max_tokens` controls how long the response may get. They're separate dials, and pflow keeps them that way: effort sets the thinking budget, and `max_tokens` only ever *caps* it — raising `max_tokens` to leave room for a longer answer never inflates reasoning spend.

This matters because Anthropic counts thinking and answer against one `max_tokens` pool and rejects any request where the thinking budget isn't strictly smaller than `max_tokens`. pflow always derives the budget to sit under `max_tokens`, so that rejection can't happen — whether you set `reasoning_effort` or an explicit `reasoning_max_tokens`. If you set both `reasoning_max_tokens` and a smaller `max_tokens`, the budget is capped to fit (the explicit budget is a request, not a guarantee).

One consequence on reasoning models: if you omit `max_tokens`, the provider may cap the *visible* answer low (LiteLLM defaults it to the thinking budget plus \~4096). Set `max_tokens` explicitly when you need a long answer from a reasoning model.

The same `reasoning_effort` value maps to provider-specific knobs under the hood — Anthropic and Gemini 2.5 get a token budget, OpenAI and Gemini 3 get their native effort/level. Models without a reasoning knob ignore the parameter.

## Output

| Key         | Type        | Description                                                            |
| ----------- | ----------- | ---------------------------------------------------------------------- |
| `response`  | str or dict | Text response (str), or parsed JSON (dict) when `output_schema` is set |
| `llm_usage` | dict        | Token usage metrics                                                    |
| `error`     | str         | Error message (only present on failure)                                |

### Token usage structure

```json theme={null}
{
  "model": "openai/gpt-5.2",
  "input_tokens": 150,
  "uncached_input_tokens": 150,
  "output_tokens": 89,
  "total_tokens": 239,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "input_token_accounting": "total_includes_cache"
}
```

## Model support

These providers are included with pflow - just set your API key:

Always include the provider prefix in the `model:` field — bare names route inconsistently (Gemini bare names try Vertex; OpenAI bare names usually work but aren't future-proof).

| Provider  | Example models                                                                           |
| --------- | ---------------------------------------------------------------------------------------- |
| OpenAI    | `openai/gpt-5.2`, `openai/gpt-5.1`, `openai/gpt-4o`                                      |
| Anthropic | `anthropic/claude-opus-4-5`, `anthropic/claude-sonnet-4-5`, `anthropic/claude-haiku-4-5` |
| Google    | `gemini/gemini-3.0-pro`, `gemini/gemini-2.5-flash`                                       |

```bash theme={null}
# Set API keys (stored in ~/.pflow/settings.json)
pflow settings set-env OPENAI_API_KEY "sk-..."
pflow settings set-env ANTHROPIC_API_KEY "sk-ant-..."
pflow settings set-env GEMINI_API_KEY "..."
```

## Other providers

LiteLLM is built into pflow and recognizes 100+ providers natively — no plugin install needed. Set the appropriate API key (or omit it for Ollama) and reference the model with its provider prefix.

### OpenRouter

```bash theme={null}
pflow settings set-env OPENROUTER_API_KEY "sk-or-..."
```

```markdown theme={null}
### summarize

Summarize content using OpenRouter.

- type: llm
- model: openrouter/anthropic/claude-sonnet-4-5
- prompt: Summarize this
```

### Ollama (local models)

```bash theme={null}
brew install ollama
ollama serve
ollama pull llama3.2
```

```markdown theme={null}
### summarize

Summarize content using a local model.

- type: llm
- model: ollama/llama3.2
- prompt: Summarize this
```

See the [LiteLLM provider list](https://docs.litellm.ai/docs/providers) for the full set of supported providers (Mistral, Bedrock, Azure OpenAI, Vertex AI, vLLM, and more).

## Image support

For vision-capable models, pass image URLs or local file paths:

```markdown theme={null}
### describe

Describe the contents of a photo.

- type: llm
- prompt: What's in this image?
- model: openai/gpt-5.2
- images: ["photo.jpg"]
```

Supported formats: JPEG, PNG, GIF, WebP, PDF

Images can be:

* Local file paths: `photo.jpg`, `/path/to/image.png`
* URLs: `https://example.com/image.jpg`

## Examples

### Basic prompt

```markdown theme={null}
### summarize

Summarize the content from the previous step.

- type: llm
- prompt: Summarize: ${read.content}
- model: openai/gpt-4o-mini
```

### With system prompt

```markdown theme={null}
### translate

Translate the input text to Spanish.

- type: llm
- system: You are a translator. Respond only with the translation.
- prompt: Translate to Spanish: ${input.text}
- temperature: 0.3
```

### Structured output

Use `output_schema` to get guaranteed JSON matching a schema. The schema is passed to the model's constrained decoding API — the model literally cannot produce non-conforming output.

````markdown theme={null}
### extract

Extract named entities from the document.

- type: llm
- prompt: Extract entities from: ${document.content}
- temperature: 0

```yaml output_schema
type: object
properties:
  people:
    type: array
    items:
      type: string
  places:
    type: array
    items:
      type: string
required:
  - people
  - places
```
````

When `output_schema` is set, `response` is a dict — downstream templates access fields directly: `${extract.response.people}`.

Without `output_schema`, you can still get JSON by prompting for it. The template system auto-parses JSON strings when you use dot notation: `${extract.response.people}`. But the model may not always comply — `output_schema` is the reliable approach.

### Image analysis

```markdown theme={null}
### analyze

Analyze the contents of a user-provided image.

- type: llm
- prompt: Describe the main elements in this image
- model: openai/gpt-5.2
- images: ["${file_path}"]
```

### External prompt file

For long or reusable prompts, reference an external file instead of inlining. The file path is relative to the workflow file. Template variables (`${var}`) inside the file are resolved normally.

```markdown theme={null}
### analyze

Analyze source code for issues.

- type: llm
- prompt: ./prompts/code-review.md
```

## Error handling

| Error            | Cause                   | Solution                                                                                                                                                         |
| ---------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Unknown model    | Model ID not recognized | Run `pflow settings llm show` to see configured models, or check the [LiteLLM provider list](https://docs.litellm.ai/docs/providers) for supported model strings |
| API key required | Missing credentials     | Set with `pflow settings set-env <PROVIDER>_API_KEY <value>` or `export <PROVIDER>_API_KEY=...`                                                                  |
| Rate limit       | Too many requests       | Wait and retry automatically (built-in retry)                                                                                                                    |

The node retries transient failures automatically (3 attempts, 1 second wait).
