For the curious. Your AI agent configures batch processing when needed. This explains what happens when you ask to process many items (files, API results, etc.) and what to expect during execution.
Batch processing runs a single node multiple times — once for each item in an array. Think for-loop, but declarative: your agent adds a batch config to any node, and pflow handles the looping, concurrency, and error collection.
When batch processing happens
Your agent uses batch processing when tasks involve:
- Processing each file in a directory listing
- Analyzing each item from an API response
- Running the same LLM prompt on multiple inputs
- Transforming each element in an array
Example scenario: When you ask to classify 100 GitHub issues, your agent configures a batch node to process each issue.
How it works
A batch configuration is added to a node:
## Steps
### list_issues
Fetch issues from the GitHub API.
- type: http
- url: https://api.github.com/repos/owner/repo/issues
### classify
Classify each issue by type.
- type: llm
- prompt: Classify this issue: ${issue.title}
- batch:
items: ${list_issues.response}
as: issue
This runs the classify node once for each issue. The as: "issue" creates a template variable ${issue} that changes with each iteration.
Configuration options
| Field | Type | Required | Default | Description |
|---|
items | template | Yes | - | Array to iterate over (usually ${previous_node.key}) |
as | string | Yes | - | Name for the item variable (e.g., "item", "file", "issue") |
parallel | bool | No | false | Run items concurrently instead of sequentially |
max_concurrent | int | No | 10 | Maximum parallel items (1-100) |
error_handling | string | No | "fail_fast" | "fail_fast" or "continue" |
max_retries | int | No | 0 | Retry failed items this many times |
retry_wait | int | No | 1 | Seconds to wait between retries |
Sequential vs parallel
Sequential (default)
Items are processed one at a time, in order:
### process
Read each file sequentially.
- type: read-file
- file_path: ${file}
- batch:
items: ${files}
as: file
This mode is chosen when:
- Order matters
- Rate limits are strict
- Resources are limited
Parallel
Multiple items are processed concurrently:
### process
Read each file in parallel.
- type: read-file
- file_path: ${file}
- batch:
items: ${files}
as: file
parallel: true
max_concurrent: 5
This mode is chosen when:
- Items are independent
- Speed is important
- API/LLM can handle concurrent requests
Your agent typically starts with max_concurrent: 5 for LLM calls to avoid rate limits, increasing gradually based on API tier.
Error handling
Fail fast (default)
Execution stops immediately on first error:
### process
Process each file, stopping on first error.
- type: read-file
- file_path: ${file}
- batch:
items: ${files}
as: file
error_handling: fail_fast
This mode is chosen when:
- Any failure means the whole task is invalid
- Errors should be fixed and re-run from scratch
Continue on errors
All items are processed, with errors collected:
### process
Process each file, continuing on errors.
- type: read-file
- file_path: ${file}
- batch:
items: ${files}
as: file
error_handling: continue
This mode is chosen when:
- Partial results are useful
- Some failures are expected
- All errors should be seen before fixing
The node output includes error details in this mode:
{
"results": [...],
"errors": [
{
"index": 3,
"item": "file3.txt",
"error": "File not found"
}
]
}
Retries
Failed items can be automatically retried:
### process
Process each API call with retries and error tolerance.
- type: http
- url: ${call.url}
- batch:
items: ${api_calls}
as: call
parallel: true
max_retries: 3
retry_wait: 2
error_handling: continue
This configuration retries each failed item up to 3 times, waiting 2 seconds between attempts. Common in scenarios involving:
- Transient API errors
- Rate limit recovery
- Network timeouts
What you’ll see
During batch execution, pflow shows real-time progress:
fetch-issues... ✓ 2.1s
classify... 1/8 ✓
classify... 2/8 ✓
classify... 3/8 ✗
...
classify... 8/8 ✓ 24.9s
Failed items are marked with ✗ and summarized at the end.
Output structure
Batch nodes write a special output structure to the shared store:
{
"node_id": {
"results": [
{"item": "input1", "response": "..."},
{"item": "input2", "response": "..."}
],
"batch_metadata": {
"parallel": true,
"total_items": 8,
"successful_items": 7,
"failed_items": 1,
"timing": {
"total_duration_ms": 24900,
"avg_item_duration_ms": 3112
}
},
"errors": [
{"index": 2, "item": {}, "error": "..."}
]
}
}
Each result pairs item (the original input) with the inner node’s outputs — so downstream nodes always know which output came from which input. When passing ${node.results} to an LLM, it sees both inputs and outputs together.
Subsequent nodes can access results:
### summarize
Summarize all the classifications.
- type: llm
- prompt: Summarize these classifications: ${classify.results}
Examples
Process files from directory listing
## Steps
### list
List all markdown files.
- type: shell
```shell command
ls -1 *.md
```
### split
Convert the file listing into a JSON array.
- type: shell
- stdin: ${list.stdout}
```shell command
tr '\n' ',' | jq -Rc 'split(",") | map(select(length > 0))'
```
### read_all
Read each file in parallel.
- type: read-file
- file_path: ${filename}
- batch:
items: ${split.stdout}
as: filename
parallel: true
max_concurrent: 10
## Steps
### get_pages
Generate a list of page numbers.
- type: shell
```shell command
echo '[1,2,3,4,5]'
```
### fetch_all
Fetch each page of results in parallel.
- type: http
- url: https://api.example.com/items?page=${page}
- batch:
items: ${get_pages.stdout}
as: page
parallel: true
max_concurrent: 3
Fault-tolerant LLM processing
### process
Summarize each document with retries and error tolerance.
- type: llm
- prompt: Summarize: ${doc.content}
- model: gpt-4
- batch:
items: ${documents}
as: doc
parallel: true
max_concurrent: 5
max_retries: 3
retry_wait: 2
error_handling: continue
How your agent chooses settings
For LLM calls, your agent typically:
- Starts with
max_concurrent: 5
- Monitors rate limits and costs
- Uses
retry_wait for rate limit recovery
For HTTP requests, your agent typically:
- Checks API rate limits in documentation
- Uses
max_concurrent to respect limits
- Adds retries for transient errors
For file operations, your agent typically:
- Uses parallel processing for reads (safe)
- Uses sequential mode for writes (avoids race conditions)
- Uses sequential mode when files depend on each other
Limitations
- No nested batch - You can’t batch a node that’s already in a batch
- No branching within batch - Each item follows the same code path
- Memory usage - All results are held in memory until batch completes