Skip to content

Commit 47861a3

Browse files
authored
Merge pull request #583 from scholarly-python-package/claude/create-claude-md-hfTx6
Add CLAUDE.md developer documentation
2 parents 9269ff3 + 1f7b1a3 commit 47861a3

1 file changed

Lines changed: 88 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# CLAUDE.md
2+
3+
## Project Overview
4+
5+
`scholarly` is a Python module for retrieving author and publication data from Google Scholar programmatically. It parses HTML responses from Google Scholar and returns structured data.
6+
7+
- **Language:** Python 3.8+
8+
- **License:** Unlicense (public domain)
9+
- **Current version:** 1.7.11
10+
- **PyPI:** `pip3 install scholarly`
11+
12+
## Repository Structure
13+
14+
```
15+
scholarly/ # Main package
16+
_scholarly.py # Core API: search, fill methods (_Scholarly class)
17+
_navigator.py # HTTP session management, proxy routing
18+
_proxy_generator.py # Proxy service integrations (ScraperAPI, Bright Data, FreeProxy)
19+
author_parser.py # HTML parsing for author profiles
20+
publication_parser.py # HTML parsing for publications
21+
data_types.py # TypedDict definitions (Author, Publication, etc.)
22+
test_module.py # Full test suite (unittest-based)
23+
docs/ # Sphinx documentation
24+
scripts/ # Helper scripts
25+
.github/workflows/ # CI/CD pipelines
26+
```
27+
28+
## Setup
29+
30+
```bash
31+
pip3 install -e . # Editable install for development
32+
pip3 install -r requirements.txt # Runtime dependencies
33+
pip3 install -r requirements-dev.txt # Dev dependencies (sphinx, coverage)
34+
```
35+
36+
## Testing
37+
38+
```bash
39+
python3 -m unittest -v test_module.py
40+
```
41+
42+
- Uses Python `unittest` framework (not pytest)
43+
- Test classes: `TestScholarly`, `TestLuminati`, `TestScraperAPI`, `TestTorInternal`, `TestScholarlyWithProxy`
44+
- **6 of 17 test cases require premium proxy services** (ScraperAPI or Bright Data credentials). These are skipped when credentials are unavailable.
45+
- Coverage: `coverage run test_module.py && coverage report`
46+
47+
## Linting
48+
49+
Uses **flake8** only. No black, isort, mypy, or pre-commit hooks.
50+
51+
```bash
52+
flake8
53+
```
54+
55+
Configuration (`.flake8`):
56+
- Max line length: **127**
57+
- Max complexity: 10
58+
- Selected rules: E9, E111, F63, F7, F82, F401
59+
- Ignored: E261, E265
60+
- Excluded: `scholarly/__init__.py`, `docs/conf.py`
61+
62+
## CI/CD (GitHub Actions)
63+
64+
- **pythonpackage.yml** — Main CI: runs on Ubuntu, macOS, Windows with Python 3.8. Triggers on push/PR to `main`/`develop`, plus scheduled runs.
65+
- **lint.yaml** — Flake8 linting (called by main workflow).
66+
- **proxytests.yml** — Proxy-dependent tests, runs on push to `main` only (uses GitHub secrets).
67+
- **codeql-analysis.yml** — Security scanning on push/PR to `main`/`develop`.
68+
- **publish-to-pypi.yml** — Publishes to PyPI on tagged commits.
69+
70+
## Contributing Conventions
71+
72+
- **Base branch for PRs:** `develop` (not `main`)
73+
- **Create an issue first** before submitting a PR
74+
- **Commit message style:** imperative mood, concise
75+
- Bug fixes: `Fix <description>` or `Handle <condition>`
76+
- Features: `Add <description>`
77+
- Tests: `Add a unit test to <description>` or `Test that <description>`
78+
- Version bumps: `Bump version to X.Y.Z`
79+
- **Tests:** add tests for new features; ensure existing tests pass
80+
- **Docs:** verify documentation consistency; build with `cd docs && make html`
81+
82+
## Key Architecture Notes
83+
84+
- `_Scholarly` is the main singleton API class (instantiated as `scholarly` in `__init__.py`)
85+
- Google Scholar responses are parsed via `BeautifulSoup` in the parser modules
86+
- Anti-bot circumvention relies on proxy rotation (`_proxy_generator.py`) and user-agent spoofing
87+
- `_navigator.py` manages the HTTP session and handles retries, redirects, and CAPTCHA detection
88+
- Data types are `TypedDict` subclasses defined in `data_types.py`

0 commit comments

Comments
 (0)