This chapter corresponds to code in the ScholaRAG repository

Documentation & Writing

Transform your RAG-assisted research into publishable documentation. This chapter covers structuring systematic reviews, generating PRISMA diagrams, managing bibliographies, and preparing publication-ready materials with RAG assistance.

📋 Prerequisites

✓ Completed Stage 6 (Research conversations and analysis)
✓ Research notes with verified citations
✓ Key findings and evidence organized

Structuring Your Literature Review

A systematic review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) format. Here's the standard structure:

PRISMA Systematic Review Outline

1. Title & Abstract (250-300 words)

Abstract Structure:

• Background: Why is this review needed?
• Objective: What is your research question?
• Methods: PRISMA, databases, inclusion criteria
• Results: Number of papers, main findings
• Conclusions: Implications and significance

2. Introduction

3.1 Rationale: What gap does this review address?

3.2 Objectives: Specific research questions (PICO/SPIDER)

3. Methods (Most Critical Section)

• 4.1 Protocol: Pre-registration (PROSPERO, OSF)
• 4.2 Eligibility Criteria: Inclusion/exclusion with justification
• 4.3 Information Sources: Databases searched (with dates)
• 4.4 Search Strategy: Full Boolean queries
• 4.5 Study Selection: Screening process (PRISMA flow)
• 4.6 Data Collection: Extraction process and tools used
• 4.7 Risk of Bias: Quality assessment method

⚠️ Disclose AI Usage: State that you used an AI-assisted RAG system for paper screening and data extraction, with human oversight and validation.

4. Results

• 5.1 Study Selection: PRISMA flow diagram with numbers
• 5.2 Study Characteristics: Table of included studies
• 5.3 Risk of Bias: Quality assessment results
• 5.4 Synthesis: Organized by themes or outcomes

5. Discussion

• Summary of Evidence: What did you find?
• Limitations: Of included studies and your review
• Implications: For practice, policy, research

6. Conclusion

Concise summary of main findings and implications. Future research directions.

Writing with RAG Assistance

Use your RAG system to help draft sections of your review. Here are effective prompts for each section:

Methods Section

Prompt your RAG system:

"Generate a Methods section for my systematic review. Include:
- Databases: [Semantic Scholar, OpenAlex, arXiv]
- Search dates: [2010-01-01 to 2024-12-31]
- Search strategy: [Your Boolean query]
- Inclusion criteria: [List your criteria]
- Screening process: [Describe PRISMA workflow]
- Total papers: [N identified, N screened, N included]

Format in PRISMA style."

Results Section

Prompt your RAG system:

"Synthesize findings on [specific theme]:
1. How many papers discuss this theme?
2. What are the main findings? (with citations)
3. Are there contradictions or consensus?
4. Organize by sub-themes if applicable.

Create a summary table with: Theme | Key Finding | Supporting Papers"

Discussion Section

Prompt your RAG system:

"Compare my findings to existing literature:
1. What are the main patterns across studies?
2. Which findings are well-established (cited in 5+ papers)?
3. Where are the contradictions or gaps?
4. What are the limitations mentioned by authors?
5. What future research directions are suggested?"

PRISMA Flow Diagram

The PRISMA flow diagram visualizes your systematic review process. Here's how to generate it:

Generate Your PRISMA Diagram

The easiest way to create a publication-ready PRISMA diagram is using Mermaid Live Editor:

📊 Steps to Create Your PRISMA Diagram

Step 1: Copy the Mermaid code above

Select the PRISMA flow diagram code shown in the example above

Step 2: Open Mermaid Live Editor

Visit https://mermaid.live

Step 3: Paste and edit with your numbers

Replace the example numbers with your actual counts:

A[Records Identified<br/>n = YOUR_NUMBER]
B[Records Screened<br/>n = YOUR_NUMBER<br/>XX duplicates removed]
...

Step 4: Export as PNG or SVG

Click "Actions" → "PNG" or "SVG" to download. SVG is recommended for publications (scalable, 300+ DPI quality).

💡 Pro Tip: Automate with Python (Advanced)

If you need to generate many PRISMA diagrams programmatically, you can use Python with the Graphviz or Matplotlib libraries. However, for most researchers, the Mermaid Live Editor approach above is simpler and produces publication-quality results.

See GitHub examples/ for Python automation scripts.

Citation & Bibliography Management

Export your citations in standard formats for reference managers:

📚 Export BibTeX

python export_bibliography.py \
  --format bibtex \
  --output references.bib

# Import into LaTeX, Overleaf

📑 Export RIS

python export_bibliography.py \
  --format ris \
  --output references.ris

# Import into EndNote, Zotero

📄 Export APA

python export_bibliography.py \
  --format apa \
  --output references.docx

# Word document with formatted references

🌐 Export HTML

python export_bibliography.py \
  --format html \
  --output references.html

# Interactive reference list with DOI links

Supplementary Materials

Journals often require supplementary materials for systematic reviews. Prepare these files:

Supplementary Files Checklist

📊

S1: Complete Search Strategies

Full Boolean queries for each database with dates

📋

S2: Inclusion/Exclusion Criteria (Detailed)

Full documentation with examples and edge cases

📑

S3: List of Included Studies

All papers with full citations and DOIs

📈

S4: Data Extraction Forms

Template used for extracting data from papers

⚙️

S5: RAG System Configuration

Document AI tools used, models, and validation process

✅

S6: PRISMA Checklist (Completed)

Official 27-item PRISMA 2020 checklist

Generate Supplementary Materials

# Generate all supplementary files at once
python generate_supplementary.py \
  --config rag_config.yaml \
  --output supplementary/

# Creates:
# - S1_search_strategies.pdf
# - S2_criteria_detailed.pdf
# - S3_included_studies.xlsx
# - S4_data_extraction_form.xlsx
# - S5_rag_system_config.pdf
# - S6_prisma_checklist.pdf

Reproducibility Package

Make your research fully reproducible by providing a complete reproducibility package:

📦 Reproducibility Package Contents

✓ Code Repository: GitHub link to your RAG system setup

✓ Configuration Files: Exact settings (rag_config.yaml)

✓ Data Files: CSV of all papers (with metadata)

✓ Search Logs: Complete search history with dates

✓ Documentation: Step-by-step guide to reproduce

✓ Docker Image (Optional): Containerized environment

Preparing for Publication

Before Submission

✓ Verify all citations are accurate
✓ Check PRISMA flow diagram numbers match text
✓ Complete PRISMA 2020 checklist
✓ Prepare supplementary materials
✓ Disclose AI tool usage in Methods
✓ Have co-authors review
✓ Proofread for formatting consistency

Common Reviewer Questions

Q: "How did you ensure AI didn't miss relevant papers?"

A: Describe validation, manual review, and spot-checking process

Q: "Can this be reproduced?"

A: Point to reproducibility package, public repository, exact versions

Q: "What about AI biases?"

A: Explain human oversight, verification steps, and limitations addressed

AI Disclosure Statement

Complete transparency about AI use is essential for credibility and reproducibility. ScholaRAG uses AI at multiple stages—disclosure should reflect this accurately.

⚠️ Important: Distinguish AI Roles

ScholaRAG uses AI differently across stages:

• Stages 1-3: AI assists protocol development (search queries, criteria)
• Stages 4-5: AI assists screening (recommendations, not decisions)
• Stages 6-7: AI assists synthesis (analysis, not interpretation)

Your disclosure must clarify that humans made all final decisions.

Complete Disclosure Template

Use this comprehensive template for your Methods section. Replace bracketed text with your specifics:

2.3 Study Selection (AI-Augmented PRISMA Screening)

We followed PRISMA 2020 guidelines with AI augmentation to enhance efficiency while maintaining methodological rigor.

AI System Configuration:

• Model: Claude Sonnet 4.5 (Anthropic, version 2025-01-22)
• Architecture: Retrieval-Augmented Generation (RAG) with ChromaDB vector database
• Training: System configured with pre-defined inclusion/exclusion criteria, calibrated on [20] papers jointly screened by all reviewers

Screening Process:

1. Title/Abstract Screening (n=[1,247]):

• AI provided recommendations (Include/Exclude) with justifications
• Two independent reviewers ([XX, YY]) made final decisions
• Reviewers could see AI justifications but were not bound by them
• Conservative rule: Any uncertainty → proceed to full-text
• Inter-rater reliability (human-human): Cohen's kappa = [0.87]
• AI-human agreement: [98.6%] ([18] overrides, all documented)

2. Full-Text Review (n=[264]):

• AI extracted relevant sections (methods, population, outcomes)
• Two reviewers independently assessed eligibility
• Conflicts resolved through discussion (n=[12]) or third reviewer (n=[3])
• Final included: n=[137]

Validation:

• Manual re-screening of [100] randomly selected AI exclusions: [0] false negatives
• Blind comparison (reviewers unaware of AI recommendations): kappa = [0.89]
• Supplementary Table S7 provides complete AI-human decision comparison

Data Extraction and Synthesis (Stages 6-7):

The [137] included papers were processed using the RAG system for automated data extraction (verified by human reviewers), thematic analysis, and cross-study synthesis with citation tracking. All extracted data were spot-checked against source documents ([25%] random sample showed [98%] accuracy).

Ethical Considerations:

This approach was reviewed by [institution] as not involving human subjects. We maintain that AI augmentation enhances consistency, improves efficiency (~40 hours saved), maintains rigor (qualified human reviewers made all decisions), and ensures transparency (complete logs and code publicly available).

Reproducibility:

Complete methodology including AI configuration, prompts, screening logs, and source code: https://github.com/HosungYou/ScholaRAG

Addressing Reviewer Criticisms

Expect these criticisms when using AI in systematic reviews. Prepare your responses:

🔴 Criticism 1: "AI might miss relevant papers"

Your Response:

"We implemented a conservative screening approach:

1. High recall: AI configured to flag any potentially relevant papers
2. Human verification: All AI exclusions reviewed by researchers
3. Validation: Manual check of random sample (n=100) found zero false negatives
4. Traditional backup: Two independent reviewers for all borderline cases

See Supplementary Table S7.2 for false negative analysis."

🟠 Criticism 2: "AI has inherent biases"

Your Response:

"We acknowledge this and took mitigation steps:

1. Diverse prompt testing: Multiple phrasings tested for consistency
2. Blind validation: Human reviewers blinded to AI recommendations (n=50 subset), kappa=0.89
3. Bias monitoring: Tracked AI recommendation patterns, found no systematic bias
4. Transparency: All AI decisions and justifications publicly available

See Supplementary Figure S7.1 for AI confidence score distribution."

🟡 Criticism 3: "This isn't a 'real' systematic review"

Your Response:

"Our approach fully adheres to PRISMA 2020:

✓ Pre-registered protocol
✓ Comprehensive search strategy
✓ Dual independent screening (humans made final decisions)
✓ Quality assessment by qualified reviewers
✓ Transparent reporting

AI augmentation enhanced efficiency while maintaining rigor. Analogous to: spell-checkers don't invalidate writing. We argue AI-augmented screening, when properly validated, improves consistency and reduces reviewer fatigue while maintaining human oversight."

🔵 Criticism 4: "Results cannot be reproduced"

Your Response:

"We provide complete reproducibility package:

1. Exact model: Claude Sonnet 4.5 version 2025-01-22 with API timestamp
2. Configuration: All prompts, parameters, embedding settings
3. Logs: Complete screening decisions with AI scores and human overrides
4. Code: Open-source under MIT license
5. Docker: Containerized environment for exact replication
6. Fallback: Traditional 'human-only' protocol also provided

Repository: github.com/HosungYou/ScholaRAG

🟣 Criticism 5: "AI use raises ethical concerns"

Your Response:

"We consulted ethics board (IRB approved as non-human-subject research). Key principles maintained:

1. Transparency: Full AI disclosure in methods
2. Accountability: Humans responsible for all decisions
3. Privacy: No patient data processed by AI
4. Fairness: AI assisted, didn't replace human judgment
5. Beneficence: Faster evidence synthesis benefits patients/policy

We argue that NOT using validated AI assistance may be less ethical: increases researcher burden, delays important findings, wastes funding on repetitive tasks."

Supplementary Materials: AI Validation Tables

Include these validation tables in your supplementary materials to demonstrate AI reliability:

Table S7.1: AI vs Human Screening Decisions

Screening Stage	Total	AI Include	AI Exclude	Human Override	Agreement (%)
Title/Abstract	1,247	264	983	18 (1.4%)	98.6%
Full-text	264	142	122	5 (1.9%)	98.1%

Table S7.2: False Negative Analysis

Validation Sample	Papers Checked	False Negatives	False Negative Rate
Random AI exclusions	100	0	0%

Method: Two senior researchers independently re-screened 100 randomly selected papers that AI recommended for exclusion. No papers were incorrectly excluded, confirming high specificity of AI screening.

✅ Best Practice: Validation Strategy

The most convincing evidence for reviewers is your validation data. Always include: (1) AI-human agreement rates, (2) false negative analysis, (3) blind comparison study, and (4) confidence score distributions. This demonstrates you've thoroughly validated your AI-augmented approach.

Final Publishing Checklist

Conclusion: From Research to Publication

You've now completed the full ScholaRAG workflow—from defining your research question (Stage 1) through building your RAG system (Stages 2-5), conducting research conversations (Stage 6), and finally writing up your findings (Stage 7).

🎉 Congratulations!

You've learned how to leverage AI-assisted RAG systems for systematic literature reviews while maintaining academic rigor, transparency, and reproducibility. Your systematic review is now ready for submission. Good luck with your publication!

Research Conversation