๐ ๏ธ Tools & Technologies
Why ScholaRAG uses specific tools and technologies.
ChromaDB - The Vector Database
ChromaDB is a vector database - a special kind of database that understands meaning, not just exact matches. We covered what a vector database is earlier, but why ChromaDB specifically?
Why ChromaDB?
- Easy to use: No complex database setup - just install and run
- Runs locally: Your data stays on your computer (privacy!)
- Python-friendly: Integrates seamlessly with research scripts
- Fast semantic search: Find similar papers in milliseconds
๐ก In ScholaRAG:
ChromaDB stores all your papers as "embeddings" (meaning vectors). When you ask a question, it finds the most relevant papers based on conceptual similarity, not keyword matching.
Example: Searching "learning outcomes" will also find papers about "educational achievement" and "student performance" - without you specifying those exact terms!
Claude AI - The Screening Assistant
Claude is Anthropic's AI assistant - think of it as your tireless research assistant who can read hundreds of papers, apply screening criteria, and explain its reasoning.
Why Claude for screening?
- Large context window: Can read entire papers (200,000+ tokens)
- Strong reasoning: Applies complex inclusion/exclusion criteria accurately
- Explains decisions: Shows why a paper was included or excluded
- Consistent: Doesn't get tired or biased like human reviewers can
โ ๏ธ AI as Assistant, Not Replacement
Claude helps with initial screening and organization, but researchers should always review final decisions. AI accelerates the process; you maintain the quality.
๐ก In ScholaRAG:
Claude runs the 02_screening and 03_eligibility stages, applying your PRISMA criteria to hundreds of papers in minutes instead of weeks.
OpenAI - The Embedding Engine
OpenAI (the company behind ChatGPT) provides the embedding models that convert your papers into semantic vectors. Think of it as the translator that turns text into meaning-coordinates.
Why OpenAI embeddings?
- Industry standard: text-embedding-3-small is fast, accurate, and affordable
- Semantic quality: Captures nuanced meaning and context
- Multilingual: Works across languages (useful for international research)
- Well-documented: Easy to integrate and troubleshoot
How embeddings work:
The vector contains 1536 numbers representing the paper's meaning in "semantic space"
๐ก In ScholaRAG:
Script 04 uses OpenAI to create embeddings for all your papers, then stores them in ChromaDB for fast semantic search during the RAG stage.
GitHub - The Code Repository
GitHub is where we store and share code. Think of it as Google Drive for programmers - but with powerful features like version history, collaboration, and automatic backups.
Why GitHub?
- Version control: Every change is tracked - you can go back to any previous version
- Collaboration: Multiple researchers can work on the same project
- Open source: Share your methods with the research community
- Documentation: README files, wikis, and issue tracking built-in
Key GitHub concepts:
- Repository (repo): A project folder containing all files and history
- Commit: A saved snapshot of your changes (like "Save Version")
- Clone: Download a copy of a repository to your computer
- Fork: Create your own copy to customize
- Pull: Download latest updates from the repository
๐ก In ScholaRAG:
The ScholaRAG code lives on GitHub at github.com/HosungYou/researcherRAG. You clone it to your computer, customize for your research, and can contribute improvements back to the community.
Git - The Version Control System
Git is the underlying technology that powers GitHub. While GitHub is the website, Git is the tool that tracks changes. Think: Git = the engine, GitHub = the car.
Why Git?
- Time machine: Go back to any previous version of your code
- Safe experimentation: Try changes in "branches" without breaking your main code
- Accountability: See who changed what and when
- Industry standard: Used by virtually all software developers
Basic Git workflow:
git clone [repository-url]# Download project
git pull# Get latest updates
git add .# Stage your changes
git commit -m "message"# Save snapshot
git push# Upload to GitHub
โ ๏ธ For researchers:
You don't need to master Git to use ScholaRAG! Basic commands (clone, pull) are enough to get started. Think of it like using Word - you don't need to know how spell-check works internally.
Python Libraries (Packages)
Python libraries are pre-built tools that add functionality to Python. Think of them as specialized kitchen appliances - you don't build a blender from scratch, you just use one!
Key libraries in ScholaRAG:
- anthropic: Communicates with Claude AI
- openai: Creates semantic embeddings
- chromadb: Vector database for semantic search
- requests: Fetches data from academic APIs
- pandas: Organizes data in tables (like Excel, but in Python)
- python-dotenv: Reads API keys from .env files
Installing libraries:
pip install anthropic openai chromadb
This command downloads and installs all necessary tools automatically!
๐ก In ScholaRAG:
All required libraries are listed in requirements.txt. Just run pip install -r requirements.txt and everything installs automatically!