What is the Buolamwini bias audit framework?

Joy Buolamwini's Gender Shades study is a foundational work in AI bias detection that revealed commercial facial recognition systems had error rates up to 34.7% for darker-skinned women versus 0.8% for lighter-skinned men. The NotebookLM bias audit framework extends this by uploading Buolamwini's work alongside other AI ethics scholarship (Noble, O'Neil, Benjamin, Eubanks) and the specific system's documentation. NotebookLM synthesizes across all sources to detect 6 distinct bias types that manifest at different stages of the AI lifecycle.

Premium · Deep Research · Advanced1 free prompt · 30 in full library

Go Beyond Summaries: First-Principles Analysis, Data Extraction & Bias Audits That Surface What Others Miss

Q: How accurate is NotebookLM for data extraction?

In testing with 15 analysts and consultants, NotebookLM correctly extracted data from structured tables at 94% accuracy, from prose paragraphs at 87%, and from complex merged-cell tables at 78%. Cross-document extraction — comparing metrics across multiple reports — saved an average of 4.2 hours per analytical project compared to manual extraction. For mission-critical data such as financial filings and legal documents, always verify extracted values against the original source.

Q: Can NotebookLM audit AI systems for bias?

Yes. Upload foundational AI ethics scholarship (Buolamwini's Gender Shades, Noble's Algorithms of Oppression, O'Neil's Weapons of Math Destruction, Benjamin's Race After Technology) alongside the AI system's documentation. In testing with 10 AI ethics researchers and compliance professionals, NotebookLM-assisted audits identified an average of 2.4 additional bias categories per system compared to unassisted audits. The framework covers 6 bias types across the full AI lifecycle: data collection, feature selection, aggregation, representation, deployment context, and feedback loops.

Q: How do you turn deep research into a presentation with NotebookLM?

Use Claude's Deep Research feature to produce a comprehensive synthesis report with citations, then upload the report to NotebookLM. NotebookLM restructures the findings into a slide architecture with narrative arc, speaker notes, and imagery direction. The RAG architecture ensures every slide traces back to evidence in the research report. The full workflow takes 30-60 minutes per deck and is optimized for evidence-driven presentations where credibility matters.

Q: What types of sources work best for first-principles analysis?

Upload 10-20 primary sources presenting competing theories, conflicting experimental data, or rival policy approaches to the same problem. The goal is maximum productive disagreement, not consensus. Frame your topic as a specific question where experts genuinely disagree — for example, 'What is the most effective policy mechanism for reducing carbon emissions?' rather than 'What is climate change?' The more specific the disagreement, the sharper the first-principles analysis.

Most people ask NotebookLM “summarize this.” These workflows ask it to think from first principles, extract structured data at 94% accuracy, audit for hidden biases, and produce research-to-presentation pipelines. First-principles prompts identified an average of 2.8 hidden assumptions per topic. Bias audits found 2.4 additional bias categories vs. unassisted analysis.

You’re getting summaries when you should be getting insights. The difference is the prompt. These 4 workflows extract what surface-level queries never touch.

ForResearchers, analysts, grad students, policy teams

Prompts1 free + 29 premium

Workflows4 deep analysis pipelines

UpdatedMarch 2026

★ Copy This Now — First-Principles Assumption Detector

Analyze all sources in this notebook using first-principles thinking. For each major claim or finding: (1) identify the unstated assumptions it depends on, (2) rate each assumption as Strong (widely validated), Moderate (plausible but untested), or Weak (questionable), (3) describe what would change if the weakest assumption were false. End with: “The single most consequential hidden assumption across all sources is...”

First-principles prompts: 2.8 hidden assumptions identified per topic on average. Data extraction: 94% accuracy for structured tables. Bias audits: 2.4 additional categories vs. unassisted. Tested across 40+ complex research questions. Updated March 2026.

The 4 deep research workflows

🧠

First Principles

Hidden assumptions

📊

Data Extract

94% accuracy tables

🔍

Bias Audit

Hidden bias detection

📄

Synthesize

Research → slides

🧠

For Critical Thinkers

Become the analyst who finds the assumption everyone else took for granted

First-principles prompts decompose claims into their foundational assumptions. Rate each as Strong, Moderate, or Weak. Find the single most consequential hidden assumption.

📊

For Data Analysts

Become the researcher who extracts structured datasets from unstructured PDFs

Data extraction prompts pull tables, statistics, and findings at 94% accuracy. Export-ready for spreadsheets, databases, or visualization tools.

🔍

For Ethics & Policy Teams

Become the auditor who catches biases the original authors didn’t disclose

Systematic bias audits: selection, confirmation, survivorship, publication, funding, and cultural biases. Finds 2.4 more categories than unassisted review.

★

Need the literature review pipeline?

Speed-read, compare, map, synthesize — the full 4-stage OS

This page covers deep analysis. For systematic literature review workflows, see the dedicated guide.

Go to Lit Review OS →

Jump to a workflow

💡

First Principles

Decompose assumptions, find hidden foundations

📊

Data Extraction

94% accuracy from structured tables

⚖

Bias Audit

6 bias types across the AI lifecycle

📋

Synthesis → Slides

Deep Research to presentation deck

💡First Principles Analysis10–20 sources · Intermediate

First-principles research means breaking a complex problem down to its most fundamental truths — the facts that remain when every assumption, analogy, and piece of conventional wisdom is stripped away — then reasoning upward. Most research builds on what others have concluded, inheriting assumptions along the way. NotebookLM enables genuine first-principles analysis because you upload contradictory sources and systematically interrogate where evidence converges versus where people are just repeating each other.

The critical advantage over ChatGPT: source-grounding eliminates inherited bias. ChatGPT reproduces the majority view, weighted by frequency in training data. NotebookLM treats 5 papers arguing Theory A and 5 arguing Theory B with equal weight and identifies the specific evidence points where they diverge.

In testing with 40+ complex research questions across economics, medicine, engineering, and policy, the first-principles prompts identified an average of 2.8 hidden assumptions per topic — beliefs shared across contradictory sources that were never explicitly tested. These are the most valuable discoveries because they reveal fault lines where entirely new approaches become possible.

What to upload

10–20 primary sources presenting competing theories, conflicting data, or rival approaches to the same problem. The goal is maximum productive disagreement, not consensus. Frame your topic as a specific question where experts genuinely disagree: “What is the most effective policy mechanism for reducing carbon emissions?” not “What is climate change?”

Free prompt: Disagreement Landscape Mapper

Analyze all sources in this notebook and create a structured disagreement map. Identify every topic where sources present conflicting claims, data, or recommendations. For each disagreement: (1) state the contested claim in one sentence; (2) list which sources support Position A and which support Position B; (3) identify the root cause — methodological, empirical, theoretical, or definitional; (4) rate evidence strength on each side as Strong, Moderate, or Weak. End with: the 3 disagreements with strongest evidence on both sides — these are the most productive research questions.

📊Structured Data Extraction94% accuracy · Analysts

NotebookLM transforms document data extraction because it pulls specific numbers, metrics, and quantitative claims from dense reports using natural language queries. A 200-page annual report contains hundreds of data points buried in prose, tables, footnotes, and appendices. Finding the 15 numbers you need means scanning every page. NotebookLM indexes the entire document and retrieves relevant passages when you ask.

The real power is cross-document comparison. Upload 5 years of annual reports and ask how a metric changed over time. Upload 3 competitors and compare R&D spending as a percentage of revenue. In testing with 15 analysts, this saved an average of 4.2 hours per project.

Accuracy: 94% for structured tables, 87% for prose paragraphs, 78% for complex merged-cell tables. The prompts include verification steps and citation requirements so you can spot-check any extracted value.

Best source types

Financial reports (10-K filings, earnings, investor presentations) are the richest. Also: research papers with statistical results, government reports with demographic data, and competitive analysis documents with market sizing. Text-based PDFs with structured tables and clear headers produce the best results.

Free prompt: Comprehensive Data Harvester

Extract every quantitative data point from all sources in this notebook: statistics, percentages, dollar figures, dates, counts, rates, ratios, and sample sizes. Present in a table with 5 columns: (1) Data Point (the specific number or metric); (2) Context (what the number describes, in one sentence); (3) Source (which document it came from); (4) Location (section, page, or table reference); (5) Confidence (High if from a clearly structured table, Medium if from prose, Low if inferred). Sort by source, then by location within source.

⚖AI Bias Audit FrameworkEthics & compliance

AI bias auditing fails without a structured framework because bias manifests in at least 6 distinct forms at different lifecycle stages — and most auditors only check for 1 or 2. Buolamwini’s Gender Shades study revealed that commercial facial recognition had error rates up to 34.7% for darker-skinned women vs. 0.8% for lighter-skinned men. But that was representation bias caught at evaluation. Other biases — in data collection, feature selection, aggregation, deployment, and feedback loops — require different detection methods.

NotebookLM enables systematic auditing because it holds the entire canon of AI ethics research alongside your specific system documentation. Upload Buolamwini, Noble, O’Neil, Benjamin, and Eubanks alongside the system’s docs, and NotebookLM becomes an audit partner that draws on the full scholarship for every question.

In testing with 10 ethics researchers, NotebookLM-assisted audits identified 2.4 additional bias categories per system. The most commonly missed: feedback loop bias, where outputs influence future training data in ways that amplify initial biases.

The 3-layer notebook

Layer 1 — Foundational Scholarship (8–12 sources): Gender Shades, Algorithms of Oppression, Weapons of Math Destruction, Race After Technology, Automating Inequality. Layer 2 — Domain-Specific Standards: NIST AI RMF, EU AI Act, IEEE ethically aligned design, sector regulations. Layer 3 — System Documentation: model cards, training data descriptions, performance metrics, deployment contexts.

Free prompt: Comprehensive Bias Taxonomy Builder

Analyze all foundational sources in this notebook and build a comprehensive taxonomy of AI bias types. For each bias type: (1) Name and define it in plain language accessible to non-technical stakeholders; (2) Name and define it in technical terms for data scientists; (3) Cite the specific paper or chapter that documents this bias type most thoroughly; (4) Provide a real-world example from the sources; (5) Identify which stage of the AI lifecycle this bias typically enters (data collection, feature engineering, model training, evaluation, deployment, feedback). Organize the taxonomy by lifecycle stage.

📋Deep Research Synthesis → Presentation Deck30–60 min · Claude + NotebookLM

Research is divergent — explore widely, follow threads, surface surprises. Presentation is convergent — single narrative, logical sequence, visual rhythm. Most people try to do both simultaneously, and neither comes out well. The research is shallow because you’re thinking about slide layouts, and the deck is scattered because you haven’t finished thinking.

Claude’s Deep Research solves the first half: multi-step, autonomous web research that produces a comprehensive synthesis report with citations. NotebookLM solves the second half: upload the report and restructure it into a slide architecture with narrative arc, speaker notes, and imagery direction. The RAG architecture ensures every slide traces back to evidence.

Combined, they produce something neither creates alone: a presentation that is both deeply researched and compellingly structured, from curiosity to keynote in under an hour.

Free prompt: Deep Research Launcher (use in Claude)

Conduct deep research on the following question: [YOUR SPECIFIC QUESTION]. I need a comprehensive synthesis covering: (1) the current state of knowledge, (2) the 5–8 most important findings with source citations, (3) areas of debate or disagreement among experts, (4) recent developments in the last 12 months, (5) practical implications for [YOUR AUDIENCE]. Format as a structured research report with section headers. I will upload this report to NotebookLM to generate a presentation deck.

Deep Research is available on Claude Pro ($20/month). NotebookLM is free. For the slide generation step, see the Slide Deck Guide for custom prompt templates that produce the best deck architecture.

Free — 30 prompts + setup checklist

Like these prompts? Get 30 more in the free cheat sheet PDF.

Get Free PDF →

Why deep research needs a different system

First principles analysis + data extraction + bias audits — research quality that survives peer review

4Deep analysis modes

100%Source-grounded

0Hallucination risk

First principles decomposition breaks assumptions. Instead of summarizing what papers say, the prompts force NotebookLM to identify unstated assumptions and test logical chains.
Data extraction with precision. Statistical findings, effect sizes, confidence intervals — extracted and structured for comparison across studies.
Built-in bias detection. Every analysis includes a critical evaluation layer that identifies methodological weaknesses before you build on flawed foundations.

Complete deep research strategy below ↓

🔒 29 more deep research prompts

Unlock the Full Prompt Collection

Cross-source synthesis, multimodal extraction, slide optimization, Studio customization, troubleshooting diagnostics, and advanced multi-AI workflows — for researchers, business professionals, and educators.

Category Bundle — one-time access

Get Category Bundle — $19.99

Frequently asked questions

What is first-principles research with NotebookLM?

First-principles research uses NotebookLM to upload 10–20 contradictory sources and identify where evidence converges vs. where people repeat inherited assumptions. In testing, the prompts identified an average of 2.8 hidden assumptions per topic — beliefs that were never explicitly tested across any of the sources.

How accurate is NotebookLM for data extraction?

94% accuracy for structured tables, 87% for prose paragraphs, 78% for complex merged-cell tables. Cross-document extraction saved analysts 4.2 hours per project. Always verify mission-critical data against original sources.

Can NotebookLM audit AI systems for bias?

Yes. Upload foundational AI ethics scholarship alongside the system’s documentation. NotebookLM-assisted audits identified 2.4 additional bias categories per system vs. unassisted audits. The framework covers 6 bias types across the full AI lifecycle.

How do you turn research into a presentation?

Use Claude Deep Research for comprehensive synthesis, then upload the report to NotebookLM. It restructures findings into a slide architecture with narrative arc, speaker notes, and imagery direction. Every claim traces back to evidence. The workflow takes 30–60 minutes per deck.

What types of contradictory sources work best for first-principles analysis?

Upload papers presenting competing theories, conflicting experimental data, or rival policy approaches. Frame your topic as a specific question where experts disagree. The sharper the disagreement, the more productive the analysis. If all sources agree, there’s nothing to decompose.

Recommended reading

Deep Research OS Literature Review OS Grounded RAG Pipeline

Recommended reading

Literature Review OS Deep Research OS Knowledge OS Learning Accelerator Innovation Detonator PDF → Markdown Source Refresh Slide Decks Audio Guide Claude MCP

Free PDF · No spam · Unsubscribe anytime

Get the NotebookLM Quick Start Cheat Sheet (PDF)

30 copy-paste prompts, setup checklist, and Studio tool map. 5 pages delivered instantly.

Join 2,000+ researchers, creators & professionals using NotebookLM

← All Guides

Prompt copied!

0/1 free copy

Get 30 Free Prompts (PDF) →