Batch processing

For the curious. Your AI agent configures batch processing when needed. This explains what happens when you ask to process many items (files, API results, etc.) and what to expect during execution.

Batch processing runs a single node multiple times — once for each item in an array. Think for-loop, but declarative: your agent adds a batch config to any node, and pflow handles the looping, concurrency, and error collection.

When batch processing happens

Your agent uses batch processing when tasks involve:

Processing each file in a directory listing
Analyzing each item from an API response
Running the same LLM prompt on multiple inputs
Transforming each element in an array

Example scenario: When you ask to classify 100 GitHub issues, your agent configures a batch node to process each issue.

How it works

A batch configuration is added to a node:

## Steps

### list_issues

Fetch issues from the GitHub API.

- type: http
- url: https://api.github.com/repos/owner/repo/issues

### classify

Classify each issue by type.

- type: llm
- prompt: Classify this issue: ${issue.title}
- batch:
    items: ${list_issues.response}
    as: issue

This runs the classify node once for each issue. The as: "issue" creates a template variable ${issue} that changes with each iteration.

Configuration options

Field	Type	Required	Default	Description
`items`	template	Yes	-	Array to iterate over (usually `${previous_node.key}`)
`as`	string	Yes	-	Name for the item variable (e.g., `"item"`, `"file"`, `"issue"`)
`parallel`	bool	No	`false`	Run items concurrently instead of sequentially
`max_concurrent`	int	No	`10`	Maximum parallel items (1-100)
`error_handling`	string	No	`"fail_fast"`	`"fail_fast"` or `"continue"`
`max_retries`	int	No	`0`	Retry failed items this many times
`retry_wait`	int	No	`1`	Seconds to wait between retries

Sequential vs parallel

Sequential (default)

Items are processed one at a time, in order:

### process

Read each file sequentially.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file

This mode is chosen when:

Order matters
Rate limits are strict
Resources are limited

Parallel

Multiple items are processed concurrently:

### process

Read each file in parallel.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    parallel: true
    max_concurrent: 5

This mode is chosen when:

Items are independent
Speed is important
API/LLM can handle concurrent requests

Your agent typically starts with max_concurrent: 5 for LLM calls to avoid rate limits, increasing gradually based on API tier.

Error handling

Fail fast (default)

Execution stops immediately on first error:

### process

Process each file, stopping on first error.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    error_handling: fail_fast

This mode is chosen when:

Any failure means the whole task is invalid
Errors should be fixed and re-run from scratch

Continue on errors

All items are processed, with errors collected:

### process

Process each file, continuing on errors.

- type: read-file
- file_path: ${file}
- batch:
    items: ${files}
    as: file
    error_handling: continue

This mode is chosen when:

Partial results are useful
Some failures are expected
All errors should be seen before fixing

The node output includes error details in this mode:

{
  "results": [...],
  "errors": [
    {
      "index": 3,
      "item": "file3.txt",
      "error": "File not found"
    }
  ]
}

Retries

Failed items can be automatically retried:

### process

Process each API call with retries and error tolerance.

- type: http
- url: ${call.url}
- batch:
    items: ${api_calls}
    as: call
    parallel: true
    max_retries: 3
    retry_wait: 2
    error_handling: continue

This configuration retries each failed item up to 3 times, waiting 2 seconds between attempts. Common in scenarios involving:

Transient API errors
Rate limit recovery
Network timeouts

What you’ll see

During batch execution, pflow shows real-time progress:

  fetch-issues... ✓ 2.1s
  classify... 1/8 ✓
  classify... 2/8 ✓
  classify... 3/8 ✗
  ...
  classify... 8/8 ✓ 24.9s

Failed items are marked with ✗ and summarized at the end.

Output structure

Batch nodes write a special output structure to the shared store:

{
  "node_id": {
    "results": [
      {"item": "input1", "response": "..."},
      {"item": "input2", "response": "..."}
    ],
    "batch_metadata": {
      "parallel": true,
      "total_items": 8,
      "successful_items": 7,
      "failed_items": 1,
      "timing": {
        "total_duration_ms": 24900,
        "avg_item_duration_ms": 3112
      }
    },
    "errors": [
      {"index": 2, "item": {}, "error": "..."}
    ]
  }
}

Each result pairs item (the original input) with the inner node’s outputs — so downstream nodes always know which output came from which input. When passing ${node.results} to an LLM, it sees both inputs and outputs together. Subsequent nodes can access results:

### summarize

Summarize all the classifications.

- type: llm
- prompt: Summarize these classifications: ${classify.results}

Examples

Process files from directory listing

## Steps

### list

List all markdown files.

- type: shell

```shell command
ls -1 *.md
```

### split

Convert the file listing into a JSON array.

- type: shell
- stdin: ${list.stdout}

```shell command
tr '\n' ',' | jq -Rc 'split(",") | map(select(length > 0))'
```

### read_all

Read each file in parallel.

- type: read-file
- file_path: ${filename}
- batch:
    items: ${split.stdout}
    as: filename
    parallel: true
    max_concurrent: 10

API pagination pattern

## Steps

### get_pages

Generate a list of page numbers.

- type: shell

```shell command
echo '[1,2,3,4,5]'
```

### fetch_all

Fetch each page of results in parallel.

- type: http
- url: https://api.example.com/items?page=${page}
- batch:
    items: ${get_pages.stdout}
    as: page
    parallel: true
    max_concurrent: 3

Fault-tolerant LLM processing

### process

Summarize each document with retries and error tolerance.

- type: llm
- prompt: Summarize: ${doc.content}
- model: gpt-4
- batch:
    items: ${documents}
    as: doc
    parallel: true
    max_concurrent: 5
    max_retries: 3
    retry_wait: 2
    error_handling: continue

How your agent chooses settings

For LLM calls, your agent typically:

Starts with max_concurrent: 5
Monitors rate limits and costs
Uses retry_wait for rate limit recovery

For HTTP requests, your agent typically:

Checks API rate limits in documentation
Uses max_concurrent to respect limits
Adds retries for transient errors

For file operations, your agent typically:

Uses parallel processing for reads (safe)
Uses sequential mode for writes (avoids race conditions)
Uses sequential mode when files depend on each other

Limitations

No nested batch - You can’t batch a node that’s already in a batch
No branching within batch - Each item follows the same code path
Memory usage - All results are held in memory until batch completes

Template variables - Understanding ${item} variables
Shell node - Often used to prepare arrays
HTTP node - API pagination patterns
LLM node - Batch prompt processing

​When batch processing happens

​How it works

​Configuration options

​Sequential vs parallel

​Sequential (default)

​Parallel

​Error handling

​Fail fast (default)

​Continue on errors

​Retries

​What you’ll see

​Output structure

​Examples

​Process files from directory listing

​API pagination pattern

​Fault-tolerant LLM processing

​How your agent chooses settings

​Limitations

​Related

When batch processing happens

How it works

Configuration options

Sequential vs parallel

Sequential (default)

Parallel

Error handling

Fail fast (default)

Continue on errors

Retries

What you’ll see

Output structure

Examples

Process files from directory listing

API pagination pattern

Fault-tolerant LLM processing

How your agent chooses settings

Limitations

Related