Code Review Guide

โœ… For Contributors & Code Reviewers

Best practices, testing strategies, and contribution guidelines for ScholaRAG. Follow this guide when submitting PRs or reviewing code.

Code Quality Standards

Python Code Style

  • PEP 8 compliance: Use black formatter and flake8 linter
  • Type hints: All function signatures must include type annotations
  • Docstrings: Use Google-style docstrings for all public methods
  • Error handling: Never use bare except: - always catch specific exceptions
# โœ… Good
def screen_paper(self, title: str, abstract: str) -> Dict[str, Any]:
    """
    Screen a single paper using Claude API.

    Args:
        title: Paper title
        abstract: Paper abstract

    Returns:
        Dictionary with screening decision and reasoning

    Raises:
        ValueError: If abstract is empty
        APIError: If Claude API call fails
    """
    if not abstract or not abstract.strip():
        raise ValueError("Abstract cannot be empty")

    try:
        response = self.client.messages.create(...)
        return self._parse_response(response)
    except anthropic.APIError as e:
        raise APIError(f"Claude API failed: {e}")

# โŒ Bad
def screen_paper(self, title, abstract):
    if not abstract:
        return {"error": "no abstract"}
    try:
        response = self.client.messages.create(...)
        return response
    except:  # Too broad!
        return None  # What went wrong?

Testing Requirements

Unit Tests

Required

Test individual functions in isolation

Example: Test deduplication logic with known duplicates

Integration Tests

Recommended

Test script end-to-end with mock data

Example: Run 03_screen_papers.py with sample CSV

Manual Testing

Required

Test with real project before PR

Example: Run full pipeline with example research question

Writing Unit Tests

# tests/test_deduplication.py
import pytest
from scripts.deduplicate import Deduplicator

def test_exact_doi_match():
    """Test that papers with same DOI are deduplicated"""
    papers = [
        {"doi": "10.1234/abc", "title": "Paper A"},
        {"doi": "10.1234/abc", "title": "Paper A (duplicate)"}
    ]

    dedup = Deduplicator()
    result = dedup.remove_duplicates(papers)

    assert len(result) == 1
    assert result[0]["title"] == "Paper A"

def test_title_similarity():
    """Test that similar titles are detected"""
    dedup = Deduplicator()

    title1 = "AI-Powered Chatbots for Language Learning"
    title2 = "AI Powered Chatbots for Language Learning"  # Minor difference

    assert dedup.is_duplicate_title(title1, title2, threshold=0.9)

def test_different_titles():
    """Test that different titles are not duplicates"""
    dedup = Deduplicator()

    title1 = "Machine Learning in Healthcare"
    title2 = "Deep Learning for Medical Diagnosis"

    assert not dedup.is_duplicate_title(title1, title2, threshold=0.9)

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_deduplication.py

# Run with coverage report
pytest --cov=scripts --cov-report=html

# Run only fast tests (skip slow integration tests)
pytest -m "not slow"

Common Code Smells to Avoid

Hardcoded values

Makes code inflexible and hard to test

โŒ Bad
def fetch_papers(self):
    url = "https://api.semanticscholar.org/graph/v1/paper/search"
    params = {"limit": 100}  # Hardcoded!
โœ… Good
def fetch_papers(self):
    url = "https://api.semanticscholar.org/graph/v1/paper/search"
    limit = self.config.get('retrieval_settings.limit', 100)  # Configurable
    params = {"limit": limit}

Missing error context

Makes debugging impossible

โŒ Bad
try:
    df = pd.read_csv(file_path)
except Exception as e:
    print(f"Error: {e}")
    sys.exit(1)
โœ… Good
try:
    df = pd.read_csv(file_path)
except FileNotFoundError:
    print(f"โŒ File not found: {file_path}")
    print("   Run 02_deduplicate.py first to generate this file")
    sys.exit(1)
except pd.errors.EmptyDataError:
    print(f"โŒ File is empty: {file_path}")
    print("   Check if previous stage succeeded")
    sys.exit(1)

Overly complex functions

Hard to test and maintain

โŒ Bad
def process_papers(self, papers: List[Dict]) -> List[Dict]:
    # 200 lines of code doing multiple things...
    # Fetching, deduplicating, screening, downloading all in one function!
โœ… Good
def process_papers(self, papers: List[Dict]) -> List[Dict]:
    papers = self.deduplicate(papers)
    papers = self.screen(papers)
    papers = self.download_pdfs(papers)
    return papers

# Each method is 20-30 lines and testable independently

Ignoring config.yaml schema

Breaks user projects silently

โŒ Bad
# Reading config without validation
threshold = config['ai_prisma_rubric']['decision_confidence']['auto_include']
# KeyError if field missing!
โœ… Good
# Safe config reading with defaults
threshold = config.get('ai_prisma_rubric', {}).get(
    'decision_confidence', {}
).get('auto_include', 90)

# Even better: validate at load time
def validate_config(self):
    required_fields = ['project_type', 'research_question']
    for field in required_fields:
        if field not in self.config:
            raise ValueError(f"Missing required field: {field}")

Pull Request Checklist

Before Submitting PR

Code Quality

  • Run black . to format code
  • Run flake8 . and fix all warnings
  • Add type hints to all function signatures
  • Write docstrings for all public methods

Testing

  • Write unit tests for new functions
  • Run pytest and ensure all tests pass
  • Test with real project end-to-end
  • Check coverage: pytest --cov (aim for >80%)

Documentation

  • Update ARCHITECTURE.md if file dependencies changed
  • Update relevant prompts/*.md if user workflow affected
  • Update config_base.yaml if new config fields added
  • Add entry to RELEASE_NOTES_vX.X.X.md

Backward Compatibility

  • Ensure existing config.yaml files still work
  • Add migration guide if breaking changes
  • Test with both project_type modes (knowledge_repository + systematic_review)
  • Verify old scripts still run with new changes

How to Extend ScholaRAG

Adding a New Database Source

  1. Step 1: Add database to config schema
    # templates/config_base.yaml
    databases:
      open_access:
        # ... existing databases
        pubmed:  # New database
          enabled: false
          email: ""  # Required for PubMed API
  2. Step 2: Implement fetch method in 01_fetch_papers.py
    def fetch_from_pubmed(self, query: str) -> List[Dict]:
        """Fetch papers from PubMed using Entrez API"""
        # Implementation details...
        pass
  3. Step 3: Update fetch loop
    if self.config['databases']['open_access']['pubmed']['enabled']:
        pubmed_papers = self.fetch_from_pubmed(query)
        self.save_results(pubmed_papers, 'pubmed_results.csv')
  4. Step 4: Update documentation
    • โ€ข Update prompts/02_query_strategy.md to mention PubMed
    • โ€ข Update ARCHITECTURE.md dependency map
    • โ€ข Add example to README.md
  5. Step 5: Write tests
    def test_fetch_from_pubmed():
        fetcher = PaperFetcher(project_path)
        papers = fetcher.fetch_from_pubmed("machine learning")
        assert len(papers) > 0
        assert all('title' in p for p in papers)

Adding a New Screening Criterion

  1. Step 1: Update config schema
    ai_prisma_rubric:
      sub_criteria:
        # ... existing criteria
        study_quality:  # New criterion
          description: "Methodological rigor"
          scoring_rubric: |
            100: RCT with blinding
            75: RCT without blinding
            50: Quasi-experimental
            0: Observational
  2. Step 2: Update screening prompt in 03_screen_papers.py
    prompt = f"""
    Rate this paper on study quality (0-100):
    100 = RCT with blinding
    75 = RCT without blinding
    ...
    
    Title: {title}
    Abstract: {abstract}
    
    Respond with: {{"study_quality": score, "reasoning": "..."}}
    """
  3. Step 3: Update aggregation logic
    final_score = (
        response['population'] * 0.2 +
        response['intervention'] * 0.3 +
        response['outcomes'] * 0.3 +
        response['study_quality'] * 0.2  # New criterion
    )

Debugging Tips

๐Ÿ› Script fails with cryptic error

  1. Check logs/ directory for detailed error messages
  2. Run script with python -v scripts/XX.py for verbose output
  3. Add print statements to identify which line fails
  4. Check if config.yaml has all required fields
  5. Verify previous stage completed successfully

๐Ÿ› Papers not appearing in RAG results

  1. Check if PDFs downloaded: ls data/pdfs/ | wc -l
  2. Verify ChromaDB created: ls data/chroma/
  3. Test embedding generation: python -c "from openai import OpenAI; client = OpenAI(); print(client.embeddings.create(...))"
  4. Reduce retrieval_k in config.yaml to see if any results appear
  5. Try exact phrase from paper title as query

๐Ÿ› AI screening too lenient/strict

  1. Check project_type in config.yaml (knowledge_repository vs systematic_review)
  2. Verify thresholds: auto_include and auto_exclude values
  3. Review AI reasoning in data/02_screening/excluded.csv
  4. Adjust thresholds in config.yaml and re-run 03_screen_papers.py
  5. If needed, modify prompt in 03_screen_papers.py for clearer instructions

Ready to Contribute?

Head to the GitHub repository to find open issues, submit PRs, or start discussions about new features.

View on GitHub โ†’