> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pflow.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Batch processing

> Process arrays of items through workflows

<Info>
  **For the curious.** Your AI agent configures batch processing when needed. This explains what happens when you ask to process many items (files, API results, etc.) and what to expect during execution.
</Info>

Batch processing runs a single node multiple times — once for each item in an array. Think for-loop, but declarative: your agent adds a batch config to any node, and pflow handles the looping, concurrency, and error collection.

## When batch processing happens

Your agent uses batch processing when tasks involve:

* Processing each file in a directory listing
* Analyzing each item from an API response
* Running the same LLM prompt on multiple inputs
* Transforming each element in an array

**Example scenario:** When you ask to classify 100 GitHub issues, your agent configures a batch node to process each issue.

## How it works

A `batch` configuration is added to a node:

```markdown theme={null}
## Steps

### list_issues

Fetch issues from the GitHub API.

- type: http
- url: https://api.github.com/repos/owner/repo/issues

### classify

Classify each issue by type.

- type: llm
- prompt: Classify this issue: ${issue.title}
- batch:
    items: ${list_issues.response}
    as: issue
```

This runs the `classify` node once for each issue. The `as: "issue"` creates a template variable `${issue}` that changes with each iteration.

## Configuration options

| Field            | Type     | Required | Default       | Description                                                                          |
| ---------------- | -------- | -------- | ------------- | ------------------------------------------------------------------------------------ |
| `items`          | template | Yes      | -             | Array to iterate over (usually `${previous_node.key}`)                               |
| `as`             | string   | Yes      | -             | Name for the item variable (e.g., `"item"`, `"file"`, `"issue"`)                     |
| `parallel`       | bool     | No       | `false`       | Run items concurrently instead of sequentially                                       |
| `max_concurrent` | int      | No       | `10`          | Maximum parallel items (1-100)                                                       |
| `error_handling` | string   | No       | `"fail_fast"` | `"fail_fast"` or `"continue"`                                                        |
| `max_retries`    | int      | No       | `1`           | Batch item total attempts after an exception escapes the node (`1` = no batch retry) |
| `retry_wait`     | number   | No       | `0`           | Seconds to wait between batch item attempts                                          |

## Sequential vs parallel

### Sequential (default)

Items are processed one at a time, in order:

```markdown theme={null}
### process

Read each file sequentially.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
```

This mode is chosen when:

* Order matters
* Rate limits are strict
* Resources are limited

### Parallel

Multiple items are processed concurrently:

```markdown theme={null}
### process

Read each file in parallel.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    parallel: true
    max_concurrent: 5
```

This mode is chosen when:

* Items are independent
* Speed is important
* API/LLM can handle concurrent requests

<Tip>
  Your agent typically starts with `max_concurrent: 5` for LLM calls to avoid rate limits, increasing gradually based on API tier.
</Tip>

## Error handling

### Fail fast (default)

Execution stops immediately on first error:

```markdown theme={null}
### process

Process each file, stopping on first error.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    error_handling: fail_fast
```

This mode is chosen when:

* Any failure means the whole task is invalid
* Errors should be fixed and re-run from scratch

### Continue on errors

All items are processed, with errors collected:

```markdown theme={null}
### process

Process each file, continuing on errors.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    error_handling: continue
```

This mode is chosen when:

* Partial results are useful
* Some failures are expected
* All errors should be seen before fixing

The node output includes error details in this mode:

```json theme={null}
{
  "results": [...],
  "errors": [
    {
      "index": 3,
      "item": "file3.txt",
      "error": "File not found"
    }
  ]
}
```

## Retries

Failed items can be automatically retried:

```markdown theme={null}
### process

Process each API call with retries and error tolerance.

- type: http
- url: ${call.url}
- batch:
    items: ${api_calls}
    as: call
    parallel: true
    max_retries: 3
    retry_wait: 2
    error_handling: continue
```

This configuration gives each failed item up to 3 total batch attempts, waiting 2 seconds between attempts. Common in scenarios involving:

* Transient API errors
* Rate limit recovery
* Network timeouts

Node-level `retry:` is separate from batch retry and applies inside each node attempt:

```markdown theme={null}
### fetch

Fetch each API call with exponential node backoff.

- type: http
- url: ${call.url}
- retry:
    max: 3
    wait: 0.5
    backoff: exponential
- batch:
    items: ${api_calls}
    as: call
    parallel: true
```

For nodes that return an `"error"` action when exhausted (`llm`, `shell`, `mcp`, `code`, file nodes), batch does not run another item attempt. Attempts multiply only for nodes that re-raise after node retries are exhausted, such as `http`, `claude-code`, or custom nodes using the default fallback.

## What you'll see

During batch execution, pflow shows real-time progress:

```
  fetch-issues... ✓ 2.1s
  classify... 1/8 ✓
  classify... 2/8 ✓
  classify... 3/8 ✗
  ...
  classify... 8/8 ✓ 24.9s
```

Failed items are marked with `✗` and summarized at the end.

When a failed item is large, pflow shows a compact description instead of printing the full item. For example, a record with a label and a long payload is shown with the label, payload size, and a stable reference:

```text theme={null}
Batch 'classify' errors:
  [3] File not found
      item: label='issue-42'; payload=<str 1909 chars sha256=7c9a2b6f1d3e>
```

The original failed input remains available in runtime data and trace files for debugging. Terminal output, MCP output, JSON error responses, and generated reports use the compact form so the actionable error stays visible.

## Output structure

Batch nodes write a special output structure to the shared store:

```json theme={null}
{
  "node_id": {
    "results": [
      {"item": "input1", "response": "..."},
      {"item": "input2", "response": "..."}
    ],
    "count": 3,
    "success_count": 2,
    "error_count": 1,
    "batch_metadata": {
      "parallel": true,
      "timing": {
        "total_items_ms": 24900,
        "avg_item_ms": 3112
      }
    },
    "errors": [
      {"index": 2, "item": {}, "error": "..."}
    ]
  }
}
```

`results` contains only **successful** items — each pairs `item` (the original input) with the inner node's outputs. With `error_handling: continue`, failed items are excluded from `results` and appear only in `errors`. `count` is the total items attempted, `success_count` equals `len(results)`, and `error_count` equals `len(errors)`.

Inside a batch node, `${__index__}` gives the 0-based position of the current item. Index-based access to results (like `${node.results[0].field}`) requires `fail_fast` mode (the default). With `error_handling: continue`, use iteration (`items: ${node.results}`) instead — the validator blocks index access because filtered results don't preserve original positions.

Subsequent nodes can access results:

```markdown theme={null}
### summarize

Summarize all the classifications.

- type: llm
- prompt: Summarize these classifications: ${classify.results}
```

## Examples

### Process files from directory listing

````markdown theme={null}
## Steps

### list

List all markdown files.

- type: shell

```shell command
ls -1 *.md
```

### split

Convert the file listing into a JSON array.

- type: shell
- stdin: ${list.stdout}

```shell command
tr '\n' ',' | jq -Rc 'split(",") | map(select(length > 0))'
```

### read_all

Read each file in parallel.

- type: read-file
- file_path: ${filename}
- batch:
    items: ${split.stdout}
    as: filename
    parallel: true
    max_concurrent: 10
````

### API pagination pattern

````markdown theme={null}
## Steps

### get_pages

Generate a list of page numbers.

- type: shell

```shell command
echo '[1,2,3,4,5]'
```

### fetch_all

Fetch each page of results in parallel.

- type: http
- url: https://api.example.com/items?page=${page}
- batch:
    items: ${get_pages.stdout}
    as: page
    parallel: true
    max_concurrent: 3
````

### Fault-tolerant LLM processing

```markdown theme={null}
### process

Summarize each document with retries and error tolerance.

- type: llm
- prompt: Summarize: ${doc.content}
- model: openai/gpt-4
- batch:
    items: ${documents}
    as: doc
    parallel: true
    max_concurrent: 5
    max_retries: 3
    retry_wait: 2
    error_handling: continue
```

### Per-item configuration

````markdown theme={null}
### compare

Compare quality across model configurations.

- type: llm
- model: ${config.model}
- reasoning_effort: ${config.effort}
- prompt: "Analyze this data: ${config.data}"

```yaml batch
items:
  - data: ${report}
    model: anthropic/claude-opus-4-5
    effort: high
  - data: ${report}
    model: openai/gpt-5.2
    effort: medium
as: config
parallel: true
```
````

Each item can override any node parameter through template variables — not just the prompt. Here `model` and `reasoning_effort` change per item while the prompt template stays the same.

## How your agent chooses settings

**For LLM calls**, your agent typically:

* Starts with `max_concurrent: 5`
* Monitors rate limits and costs
* Uses `retry_wait` for rate limit recovery

**For HTTP requests**, your agent typically:

* Checks API rate limits in documentation
* Uses `max_concurrent` to respect limits
* Adds retries for transient errors

**For file operations**, your agent typically:

* Uses parallel processing for reads (safe)
* Uses sequential mode for writes (avoids race conditions)
* Uses sequential mode when files depend on each other

## Limitations

* **No nested batch** - You can't batch a node that's already in a batch
* **No branching within batch** - Each item follows the same code path
* **Memory usage** - All results are held in memory until batch completes

## Related

* [Template variables](/how-it-works/template-variables) - Understanding `${item}` variables
* [Shell node](/reference/nodes/shell) - Often used to prepare arrays
* [HTTP node](/reference/nodes/http) - API pagination patterns
* [LLM node](/reference/nodes/llm) - Batch prompt processing
