[AI Agent Pipeline #9] Pipeline Simplification

5 minute read

In the previous article, we covered the retry and rollback mechanism.

This article summarizes improvements discovered while operating the pipeline.

Through the process up to article 8, about 250 pieces of learning content were generated. Initially, I continuously generated content for pipeline testing. As a result, token consumption was fast, and I often used up all tokens 2-3 days before the weekly reset. So now I’m operating by monitoring token usage and running scripts to fill up weekly usage when it looks like tokens will be left over this week.

Meanwhile, while writing the blog series, I reviewed the design documents and processes again, and discovered several improvements in the process.

1. Adding category-parser

While operating the pipeline, I discovered a problem. After all 10 hardcoded categories were processed, the question arose “where are the rest?”

The metadata pipeline introduced in article 2 generates category.yaml and empty markdown files from topic documents. But the category list was hardcoded in the script.

# Previous approach: Category list hardcoded in script
get_topics() {
    local subject=$1
    case "$subject" in
        "javascript-core-concepts")
            echo "01-variables 02-type-system 03-operators ... 10-functions-basic"
            ;;
        ...
    esac
}

The topic document has 42 categories for javascript-core-concepts alone. But the script only had 10. Hardcoding all 169 categories across 10 subjects was unrealistic.

The solution was to automatically extract the category list from topic documents. By parsing the ### N. Title pattern in topic documents, all categories can be extracted. I added a category-parser agent to find this pattern, assign IDs to each category, and save to a manifest file.

{
  "categories": {
    "javascript-core-concepts": ["01-variables", "02-type-system", ..., "42-metaprogramming"]
  }
}

Now the script reads categories from the manifest file. When categories are added to topic documents, just run category-parser.


2. Removing practice-writer

After the pipeline was complete, I tested with topics other than development. I wanted to see if it could be applied to non-programming fields like history, languages, and mathematics.

Document generation and parsing were normal. But looking at the content, the Code Patterns and Experiments sections generated by practice-writer were awkward. Do you need code blocks for learning history or languages? And this app is for mobile. There’s no code editor, no execution environment.

I removed practice-writer.


3. Cascading Refactoring

While writing blog posts for articles 1-8, I reviewed the design documents and processes. In the process, cascading improvements were made.

3.1 Discovering IMPROVEMENT_NEEDED

There was a rule in content-validator’s prompt handling a field called IMPROVEMENT_NEEDED. If the validation score was below 90, it would record which agent needs improvement in the file and restart from that agent. It was a remnant of the initial approach mentioned in article 4 of “each agent independently scoring and directing corrections.”

I thought all of it had been removed as legacy, but checking revealed some remained only in content-validator. The problem was there was no part handling IMPROVEMENT_NEEDED in each agent. Even if content-validator left improvement instructions, there was no logic for other agents to read and process them. It was essentially non-functioning code.

Why didn’t I notice this? Because the backup/rollback mechanism implemented in article 8 was effective. Rolling back to backup and retrying on postcondition failure at each agent stage solved most problems. There were almost no cases of score below threshold when reaching content-validator, and when it did fail, I just started over from the beginning. With so few cases, I didn’t notice the exact cause.

There were two choices. Supplement each agent prompt so IMPROVEMENT_NEEDED works properly, or remove it. Since backup/rollback was already working effectively, I decided to remove it.

3.2 Introducing JSON Schema

Removing IMPROVEMENT_NEEDED wasn’t the end. A new processing flow was needed.

At each agent stage, retry with backup/rollback. But what happens when content-validator fails the score? At this point, it’s not certain if a file backup exists. With content remaining, I needed to determine “why the score failed,” deliver feedback to the relevant agent, and retry.

Previously, LLM responses were output as-is. When I tried to parse the problem agent, the output format was inconsistent, making accurate extraction difficult. Asking Claude, it suggested using --output-format json and --json-schema options to fix the output format.

# Structured output with JSON schema
"$CLAUDE_PATH" -p "$prompt" \
    --output-format json \
    --json-schema "$json_schema" \
    ...

# Simple extraction with jq
local problem_section=$(jq -r ".structured_output.problem_section" "$json_file")
local feedback=$(jq -r ".structured_output.feedback" "$json_file")

I applied JSON Schema to all agent outputs, not just content-validator. As output logs became standardized, parsing worked stably.

3.3 Discovering and Removing content-initiator

Output logs became simpler with JSON Schema applied. Then a problem I had known about but hadn’t paid attention to caught my eye. Logs always started at 2/7. I knew step 1 was being skipped. At the time, I didn’t care much since the result was the same anyway.

This time I decided to resolve it definitively. I verified why it was being skipped, whether it could be removed, and whether there was a difference between what the metadata pipeline generates and what content-initiator generates.

The metadata pipeline was already generating markdown files. Step 1 of the content pipeline, content-initiator, was being skipped due to the precondition check determining “file already exists.” The role of content-initiator was to create an empty markdown file and write basic front matter. It was the same output as the metadata pipeline.

A shell function was sufficient, not an LLM agent. I applied the initialize_markdown_file() function already used in the metadata pipeline to the content pipeline as well.

3.4 Final Retry Flow

This is the retry flow completed through this process.

  1. Each agent stage: Rollback to backup and retry on postcondition failure (max 3 times)
  2. content-validator score below threshold: Parse problem_section and feedback from JSON → Re-execute relevant agent once
  3. Re-validation success: Pipeline complete
  4. Re-validation failure: Call reset_file_with_frontmatter()initialize_markdown_file() → Exit after file initialization
# revision mode when content-validator score below threshold
if [ "$agent_name" = "content-validator" ] && [ $postcond_result -eq 2 ]; then
    local problem_section=$(parse_agent_json "content-validator" "problem_section" "quiz")
    local feedback=$(parse_agent_json "content-validator" "feedback" "")
    local problem_agent=$(get_problem_agent "$problem_section")

    # Re-execute relevant agent once
    execute_revision_with_json "$problem_agent" "$revision_prompt" "$session_id"

    # Re-validate
    if [ $revalidate_result -eq 0 ]; then
        return 0  # Success
    else
        reset_file_with_frontmatter "$file_path"  # Initialize
        return 1
    fi
fi

To prevent infinite loops, retries are limited to once. If it still fails, initialize the file and exit. It starts over from the beginning on the next run.


Conclusion

This article summarized improvements discovered while operating the pipeline.

  • Added category-parser: Dynamically parse from topic documents instead of hardcoded list of 169 categories
  • Removed practice-writer: Determined unnecessary after testing with non-programming topics
  • Cascading refactoring: Removed IMPROVEMENT_NEEDED → Introduced JSON Schema → Removed content-initiator → Completed final flow

The 7 content agents were reduced to 5. Instead of a complex improvement loop based on IMPROVEMENT_NEEDED, a simple fail/retry structure made debugging easier.

The next article will look back on the journey so far and wrap up the series.


This series shares experiences applying the AI-DLC (AI-assisted Document Lifecycle) methodology to an actual project. For more details about AI-DLC, please refer to the Economic Dashboard Development Series.