Research OS v2 — Multi-Agent NotebookLM Workflows

The Problem

Why Your Research Hits a Wall

You uploaded 50 papers. You asked NotebookLM to synthesize them. The answer was… fine. But it missed the real gaps. Here's why.

📚

Literature Overload

"I have 80 papers but NotebookLM only takes 50. I uploaded everything and the answers got vaguer, not sharper."

🔍

Gap Detection Failure

"My advisor says the gap is obvious. I've read every paper. I still can't see it. Comparing across 50 sources manually is impossible."

📊

Unstable Synthesis

"Sometimes NotebookLM gives brilliant insights. Other times it hallucinates citations. I can't trust the output without re-reading everything."

🔄

No Progress Tracking

"I don't know what I've already analyzed. Switching topics loses all context. Am I making progress or going in circles?"

Architecture

Research OS v2 — Four Layers

A complete research operating system. Each layer has specialized prompts, tools, and quality controls. They loop until your synthesis is bulletproof.

📡

PERCEIVE Layer 1

Multi-source ingestion → auto-parsing → quality assessment → version management. Feed the system 50+ papers with structured metadata extraction.

P1.1 paper-parse P1.2 quality-gate P1.3 version-diff

↓

🗺️

PLAN Layer 2

Gap detection across 4 categories → hypothesis generation at 3 risk levels → research tree with priority queue and status tracking.

P2.1 gap-detect P2.3 hypothesis-factory P2.4 research-tree

↓

⚡

ACT Layer 3

4-level iterative querying → multi-agent debate (Fighting Arena) → grounded synthesis with source verification. No hallucination survives.

P3.2 deep-dive P3.5 arena-debate P3.6 ground-check

↓

📊

EVALUATE Layer 4

Systematic bias audit (5 types) → 6-dimension confidence scoring → progress dashboard → weekly meta-reflection loop.

P4.1 bias-audit P4.2 confidence P4.3 meta-reflect

Multi-Notebook Hub-Spoke Architecture

🏛️ MASTER HUB

Cross-domain Map · Taxonomy · Gap Tracker

📚 Spoke A

Theory · 15 sources

🔬 Spoke B

Methods · 12 sources

📋 Spoke C

Evidence · 12 sources

🔬 WORKSHOP — Grounded Synthesis Arena

Workflows

Three Complete Closed Loops

Each workflow is a self-contained research engine. Pick the one that fits your time budget and paper count. All use the same OS v2 prompt library.

01

OS v2 Full Pipeline — 50+ Papers

Complete research cycle from ingestion to validated synthesis. ~14 days.

Phase 1 — Perceive

Systematic Ingestion

Pre-process 50 papers through automated quality gates. Extract RQ, methodology, findings, limitations per paper. Sort by relevance × rigor × recency. Distribute to Hub-Spoke notebooks with taxonomy alignment.

P1.1 paper-parse P1.2 quality-gate Taxonomy template

Phase 2 — Plan

Gap Detection & Hypothesis Generation

Run structural gap analysis across 4 categories (theoretical, methodological, empirical, synthesis). Generate hypotheses at 3 risk levels per gap. Build a prioritized research tree with status tracking.

P2.1 gap-detect P2.2 priority-matrix P2.3 hypothesis-factory P2.4 research-tree

Phase 3 — Act

Multi-Agent Investigation

For each hypothesis: targeted querying → cross-Spoke validation → adversarial debate via Fighting Arena → grounding check against sources. Update research tree status. Iterate until all branches resolved.

P3.2 deep-dive P3.3 cross-validate P3.5 arena-debate P3.6 ground-check P3.7 multi-layer synthesis

Phase 4 — Evaluate

Quality Control & Reflection

Bias audit across 5 dimensions. Confidence scoring with 6 factors per claim. Progress dashboard update. Weekly meta-reflection: "Am I making progress or going in circles?" Loop back to Phase 3 for unresolved branches.

P4.1 bias-audit P4.2 confidence-score P4.3 meta-reflect Dashboard template

02

Fighting Arena — Council of Agents

Multi-perspective adversarial debate. Resolves contested questions in 2 hours.

Round 1 — Position Statements

Define the Arena Question

From your research tree, select the most contested question. Each Spoke notebook generates a position using ONLY its sources. Advocate A argues for, Advocate B argues against, Advocate C reviews the raw evidence neutrally.

P3.5 advocate-role Spoke A, B, C

Round 2 — Adversarial Exchange

Structured Debate (3 Clashes)

External LLM (Claude/GPT) moderates. Each advocate: acknowledge strongest opposing claim → attack weakest claim → present counter-evidence → state what would change their mind. Critic finds logical flaws in all positions.

External LLM debate Cross-Spoke verification

Round 3 — Grounded Synthesis

Resolution & Integration

Moderator identifies: what was resolved, what remains contested, what new questions emerged. Workshop notebook grounds every claim in actual sources. Final integrated position with confidence levels and caveats.

P3.6 ground-check P3.7 multi-layer synthesis Workshop notebook

03

FUTURE Pipeline — For Working Professionals

45 minutes/week. Structured research momentum that survives a busy schedule.

Monday

F

Frame

5 min

Tuesday

U

Upload

10 min

Wednesday

T

Target & Think

15 min

Thursday

U

Unpack & Use

10 min

Friday

R

Reflect & Route

5 min

📅

Monthly: E — Evolve & Extend

60 min deep synthesis · Research tree update · Strategic planning

Prompt Library

25+ Prompts. Here Are 2.

Each prompt is battle-tested across the PERCEIVE → PLAN → ACT → EVALUATE cycle. Preview two of the most powerful — unlock the full library.

🔍

Gap Detection

PLAN Layer

Structural Gap Detection Engine

Scans your entire literature base across 4 gap categories. Finds what's missing that you didn't know to look for.

/* Scan all source summaries and perform structured gap analysis */ For each of the 4 categories below, identify gaps in my literature: ## Category 1: Theoretical Gaps - Concepts used but never defined? - Assumptions shared but never tested? - Missing theoretical frameworks? ## Category 2: Methodological Gaps - Overused methods? - Underused methods? - Missing validation? ## Category 3: Empirical Gaps - Understudied populations/contexts? - Correlation without causation tests? - Time periods lacking data? ## Category 4: Synthesis Gaps - Cross-domain contradictions? - Unnamed repeating patterns? - "Elephants in the room"? For each gap, rate: [CRITICAL/HIGH/MEDIUM/LOW] importance [EASY/MODERATE/HARD] feasibility [OVERLOOKED/PARTIALLY_NOTED] novelty

🎯 Used in: PLAN layer ⚡ Output: 5-10 gaps

✨

Grounded Synthesis

ACT Layer

Multi-Layer Synthesis with Confidence Scoring

Produces a synthesis that explicitly marks what's proven, probable, emerging, and unknown — with citations per claim.

/* Produce synthesis of [topic] with 4 confidence layers */ LAYER 1: Well-Established (multiple sources agree) → Confidence: HIGH → Requires: 3+ sources per claim LAYER 2: Probable (2 sources agree, none disagree) → Confidence: MEDIUM → Requires: 2 sources per claim LAYER 3: Emerging (1 strong source suggests) → Confidence: LOW → Flagged for validation LAYER 4: Unknown (gaps identified, no data) → Action: Investigation needed For each claim provide: - [direct quote or summary] - [confidence 0-100] - [potential counter-evidence]

🎯 Used in: ACT + EVALUATE ⚡ Output: Layered brief

Before & After

OS v1 vs. OS v2

Quantified improvements across every dimension of the research process.

Dimension	OS v1 (Typical Use)	OS v2 (This System)	Δ
Notebook Setup	1 notebook, 50 sources crammed in	Hub + 3 Spokes + Workshop	↑ 3x capacity
Gap Detection	Manual, ask once, hope for the best	4-category automated structural analysis	↑ 3-5x gaps found
Perspectives	Single LLM answer (confirmation bias)	Multi-agent adversarial debate	↑ Bias reduction
Synthesis	Trust single output, no verification	4-layer confidence + source grounding	↑ 40-60% accuracy
Bias Checking	None	5-type systematic audit	↑ 70% catch rate
Progress	No tracking, no memory	Dashboard + weekly meta-reflection	↑ Continuous
Time to Insight	Hours of manual comparison	Structured pipeline, smart routing	↑ ~50% faster
Scalability	~20 papers before quality degrades	50-100+ papers with tiered architecture	↑ 5x scale

Stop Prompting.
Start Operating.

Why Your Research Hits a Wall

Literature Overload

Gap Detection Failure

Unstable Synthesis

No Progress Tracking

Research OS v2 — Four Layers

Multi-Notebook Hub-Spoke Architecture

Three Complete Closed Loops

OS v2 Full Pipeline — 50+ Papers

Systematic Ingestion

Gap Detection & Hypothesis Generation

Multi-Agent Investigation

Quality Control & Reflection

Fighting Arena — Council of Agents

Define the Arena Question

Structured Debate (3 Clashes)

Resolution & Integration

FUTURE Pipeline — For Working Professionals

25+ Prompts. Here Are 2.

Structural Gap Detection Engine

Multi-Layer Synthesis with Confidence Scoring

OS v1 vs. OS v2

Get the Full OS v2 Package

Stop Prompting. Start Operating.

Why Your Research Hits a Wall

Literature Overload

Gap Detection Failure

Unstable Synthesis

No Progress Tracking

Research OS v2 — Four Layers

Multi-Notebook Hub-Spoke Architecture

Three Complete Closed Loops

OS v2 Full Pipeline — 50+ Papers

Systematic Ingestion

Gap Detection & Hypothesis Generation

Multi-Agent Investigation

Quality Control & Reflection

Fighting Arena — Council of Agents

Define the Arena Question

Structured Debate (3 Clashes)

Resolution & Integration

FUTURE Pipeline — For Working Professionals

25+ Prompts. Here Are 2.

Structural Gap Detection Engine

Multi-Layer Synthesis with Confidence Scoring

OS v1 vs. OS v2

Get the Full OS v2 Package

Stop Prompting.
Start Operating.