Code Review Guide

✅ For Contributors & Code Reviewers

Best practices, testing strategies, and contribution guidelines for ScholaRAG. Follow this guide when submitting PRs or reviewing code.

Code Quality Standards

Python Code Style

PEP 8 compliance: Use black formatter and flake8 linter
Type hints: All function signatures must include type annotations
Docstrings: Use Google-style docstrings for all public methods
Error handling: Never use bare except: - always catch specific exceptions

# ✅ Good
def screen_paper(self, title: str, abstract: str) -> Dict[str, Any]:
    """
    Screen a single paper using Claude API.

    Args:
        title: Paper title
        abstract: Paper abstract

    Returns:
        Dictionary with screening decision and reasoning

    Raises:
        ValueError: If abstract is empty
        APIError: If Claude API call fails
    """
    if not abstract or not abstract.strip():
        raise ValueError("Abstract cannot be empty")

    try:
        response = self.client.messages.create(...)
        return self._parse_response(response)
    except anthropic.APIError as e:
        raise APIError(f"Claude API failed: {e}")

# ❌ Bad
def screen_paper(self, title, abstract):
    if not abstract:
        return {"error": "no abstract"}
    try:
        response = self.client.messages.create(...)
        return response
    except:  # Too broad!
        return None  # What went wrong?

Testing Requirements

Unit Tests

Required

Test individual functions in isolation

Example: Test deduplication logic with known duplicates

Integration Tests

Recommended

Test script end-to-end with mock data

Example: Run 03_screen_papers.py with sample CSV

Manual Testing

Required

Test with real project before PR

Example: Run full pipeline with example research question

Writing Unit Tests

# tests/test_deduplication.py
import pytest
from scripts.deduplicate import Deduplicator

def test_exact_doi_match():
    """Test that papers with same DOI are deduplicated"""
    papers = [
        {"doi": "10.1234/abc", "title": "Paper A"},
        {"doi": "10.1234/abc", "title": "Paper A (duplicate)"}
    ]

    dedup = Deduplicator()
    result = dedup.remove_duplicates(papers)

    assert len(result) == 1
    assert result[0]["title"] == "Paper A"

def test_title_similarity():
    """Test that similar titles are detected"""
    dedup = Deduplicator()

    title1 = "AI-Powered Chatbots for Language Learning"
    title2 = "AI Powered Chatbots for Language Learning"  # Minor difference

    assert dedup.is_duplicate_title(title1, title2, threshold=0.9)

def test_different_titles():
    """Test that different titles are not duplicates"""
    dedup = Deduplicator()

    title1 = "Machine Learning in Healthcare"
    title2 = "Deep Learning for Medical Diagnosis"

    assert not dedup.is_duplicate_title(title1, title2, threshold=0.9)

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_deduplication.py

# Run with coverage report
pytest --cov=scripts --cov-report=html

# Run only fast tests (skip slow integration tests)
pytest -m "not slow"

Common Code Smells to Avoid

Hardcoded values

Makes code inflexible and hard to test

❌ Bad

def fetch_papers(self):
    url = "https://api.semanticscholar.org/graph/v1/paper/search"
    params = {"limit": 100}  # Hardcoded!

✅ Good

def fetch_papers(self):
    url = "https://api.semanticscholar.org/graph/v1/paper/search"
    limit = self.config.get('retrieval_settings.limit', 100)  # Configurable
    params = {"limit": limit}

Missing error context

Makes debugging impossible

❌ Bad

try:
    df = pd.read_csv(file_path)
except Exception as e:
    print(f"Error: {e}")
    sys.exit(1)

✅ Good

try:
    df = pd.read_csv(file_path)
except FileNotFoundError:
    print(f"❌ File not found: {file_path}")
    print("   Run 02_deduplicate.py first to generate this file")
    sys.exit(1)
except pd.errors.EmptyDataError:
    print(f"❌ File is empty: {file_path}")
    print("   Check if previous stage succeeded")
    sys.exit(1)

Overly complex functions

Hard to test and maintain

❌ Bad

def process_papers(self, papers: List[Dict]) -> List[Dict]:
    # 200 lines of code doing multiple things...
    # Fetching, deduplicating, screening, downloading all in one function!

✅ Good

def process_papers(self, papers: List[Dict]) -> List[Dict]:
    papers = self.deduplicate(papers)
    papers = self.screen(papers)
    papers = self.download_pdfs(papers)
    return papers

# Each method is 20-30 lines and testable independently

Ignoring config.yaml schema

Breaks user projects silently

❌ Bad

# Reading config without validation
threshold = config['ai_prisma_rubric']['decision_confidence']['auto_include']
# KeyError if field missing!

✅ Good

# Safe config reading with defaults
threshold = config.get('ai_prisma_rubric', {}).get(
    'decision_confidence', {}
).get('auto_include', 90)

# Even better: validate at load time
def validate_config(self):
    required_fields = ['project_type', 'research_question']
    for field in required_fields:
        if field not in self.config:
            raise ValueError(f"Missing required field: {field}")

How to Extend ScholaRAG

Adding a New Database Source

Step 1: Add database to config schema

# templates/config_base.yaml
databases:
  open_access:
    # ... existing databases
    pubmed:  # New database
      enabled: false
      email: ""  # Required for PubMed API

Step 2: Implement fetch method in 01_fetch_papers.py

def fetch_from_pubmed(self, query: str) -> List[Dict]:
    """Fetch papers from PubMed using Entrez API"""
    # Implementation details...
    pass

Step 3: Update fetch loop

if self.config['databases']['open_access']['pubmed']['enabled']:
    pubmed_papers = self.fetch_from_pubmed(query)
    self.save_results(pubmed_papers, 'pubmed_results.csv')

Step 4: Update documentation
- • Update prompts/02_query_strategy.md to mention PubMed
- • Update ARCHITECTURE.md dependency map
- • Add example to README.md

Step 5: Write tests

def test_fetch_from_pubmed():
    fetcher = PaperFetcher(project_path)
    papers = fetcher.fetch_from_pubmed("machine learning")
    assert len(papers) > 0
    assert all('title' in p for p in papers)

Adding a New Screening Criterion

Step 1: Update config schema

ai_prisma_rubric:
  sub_criteria:
    # ... existing criteria
    study_quality:  # New criterion
      description: "Methodological rigor"
      scoring_rubric: |
        100: RCT with blinding
        75: RCT without blinding
        50: Quasi-experimental
        0: Observational

Step 2: Update screening prompt in 03_screen_papers.py

prompt = f"""
Rate this paper on study quality (0-100):
100 = RCT with blinding
75 = RCT without blinding
...

Title: {title}
Abstract: {abstract}

Respond with: {{"study_quality": score, "reasoning": "..."}}
"""

Step 3: Update aggregation logic

final_score = (
    response['population'] * 0.2 +
    response['intervention'] * 0.3 +
    response['outcomes'] * 0.3 +
    response['study_quality'] * 0.2  # New criterion
)

Debugging Tips

🐛 Script fails with cryptic error

Check logs/ directory for detailed error messages
Run script with python -v scripts/XX.py for verbose output
Add print statements to identify which line fails
Check if config.yaml has all required fields
Verify previous stage completed successfully

🐛 Papers not appearing in RAG results

Check if PDFs downloaded: ls data/pdfs/ | wc -l
Verify ChromaDB created: ls data/chroma/
Test embedding generation: python -c "from openai import OpenAI; client = OpenAI(); print(client.embeddings.create(...))"
Reduce retrieval_k in config.yaml to see if any results appear
Try exact phrase from paper title as query

🐛 AI screening too lenient/strict

Check project_type in config.yaml (knowledge_repository vs systematic_review)
Verify thresholds: auto_include and auto_exclude values
Review AI reasoning in data/02_screening/excluded.csv
Adjust thresholds in config.yaml and re-run 03_screen_papers.py
If needed, modify prompt in 03_screen_papers.py for clearer instructions

Ready to Contribute?

Head to the GitHub repository to find open issues, submit PRs, or start discussions about new features.

View on GitHub →

Quick Start

Code Review Guide

Code Quality Standards

Python Code Style

Testing Requirements

Unit Tests

Integration Tests

Manual Testing

Writing Unit Tests

Running Tests

Common Code Smells to Avoid

Hardcoded values

Missing error context

Overly complex functions

Ignoring config.yaml schema

Pull Request Checklist

Before Submitting PR

Code Quality

Testing

Documentation

Backward Compatibility

How to Extend ScholaRAG

Adding a New Database Source

Adding a New Screening Criterion

Debugging Tips

🐛 Script fails with cryptic error

🐛 Papers not appearing in RAG results

🐛 AI screening too lenient/strict

Ready to Contribute?