Hermes Agent System / Case Study
Turning YouTube Research Into Operational Memory
A local-first pipeline that converts useful video research into source-grounded assistant memory, reusable demos, and follow-on implementation opportunities.
Executive Summary
Useful AI and automation lessons were arriving as videos, but raw links are not durable knowledge. This workflow turns source material into searchable, provenance-preserving operating context without wasting tokens or publishing private internals.
Problem
What needed solving
Useful AI and automation lessons were arriving as videos, but raw video links are not durable knowledge. They needed to become searchable, source-grounded operating context without wasting tokens or creating unverified claims.
Context
Why this mattered
Fred is my personal assistant and portfolio system. A recurring workflow emerged: I share a YouTube link, Fred extracts the useful learning, preserves provenance, and decides whether the learning changes future operating procedure.
Constraints
Operating boundaries
- Prefer transcript/caption extraction before paid video-model ingestion.
- Use Gemini only as a minimal fallback for video evidence extraction.
- Keep claims source-grounded and mark video-derived claims as not independently verified unless separately checked.
- Do not store credentials or sensitive account details in artifacts.
- Produce public-safe artifacts that can support portfolio storytelling without exposing private assistant internals.
Build
What I built
- A YouTube learning pipeline that creates structured learning-ledger JSON entries.
- A portfolio-demo artifact for each indexed video showing source, method, actions taken, and caveats.
- Skill updates only when a video changes Fred’s operating procedure, avoiding one-off skill sprawl.
- A deterministic opportunity-finder loop that ranks next automations from accumulated learning artifacts.
Architecture
Workflow
- Input: YouTube URL or video ID from chat.
- Extraction: transcript-first path; Gemini minimal video ingestion only when transcript extraction fails or is blocked.
- Normalization: JSON learning-ledger entry with title, source URL, method, summary, actions taken, caveats, and next actions.
- Portfolio layer: Markdown demo artifact for each video-to-skill run.
- Decision layer: the opportunity finder scans accumulated artifacts and local assets to rank next build candidates.
Security / Operations
Key decisions
- Token control mattered more than convenience, so transcripts are the default path and video-model calls are fallback only.
- Artifacts preserve provenance because video summaries are decision support, not primary truth.
- The system writes demos and ledgers separately: ledgers are machine-readable, demos are human-readable.
- Skill maintenance is gated; Fred patches procedures only when the new source changes how work should be done.
Impact
What changed
- Converted scattered video watching into a repeatable learning loop with ledger, demo, and opportunity outputs.
- Created a portfolio-visible proof point for practical agent operations: ingestion, provenance, cost control, and follow-through.
- Reduced future rediscovery by making indexed learnings searchable and reusable.
Evidence
Source trail
Hermes + Aion UI is Insane (FREE)!
Hermes Agent Curator Guide: FIX Your Agent Skills!
Hermes Agent Full Tutorial for Beginners | Setup Guide
Next
What I’d improve
- Publish selected case studies as Astro pages on mikepatraw.com.
- Add richer cross-source clustering once enough artifacts exist.
- Add a redaction review gate before public deployment.
Public-safe notes
Publishing boundary
- Video-derived claims are treated as source-grounded learning, not independently verified truth unless separately checked.
- Private local paths, secrets, tokens, and account-specific details are excluded or redacted.
- This page is public-safe portfolio material, not a dump of internal assistant logs.