Skip to content

Commit d5e27a2

Browse files
Merge branch 'develop' into patch-1
2 parents 0765945 + 8a0e780 commit d5e27a2

16 files changed

Lines changed: 140 additions & 61 deletions

.github/CONTRIBUTING.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,13 @@ Additionally, if you are interesting in contributing to the codebase, submit a p
1616

1717
## How to contribute
1818

19-
1. Create a fork of `scholarly-python-package/scholarly` repository.
20-
2. If you add a new feature, try to include tests in already existing test cases, or create a new test case if that is not possible.
21-
3. Make sure the unit tests pass before raising a PR. For all the unit tests to pass, you typically need to setup a premium proxy service such as `ScraperAPI` or `Luminati` (`Bright Data`). If you do not have an account, you may try to use `FreeProxy`. Without a proxy, 6 out of 17 test cases will be skipped.
22-
4. Check that the documentatation is consistent with the code. Check that the documentation builds successfully.
23-
5. Submit a PR, with `develop` as your base branch.
24-
6. After an initial code review by the maintainers, the unit tests will be run with the `ScraperAPI` key stored in the Github repository. Passing all tests cases is necessary before merging your PR.
19+
1. Create a fork of `scholarly-python-package/scholarly` repository. Make sure that "Copy the main branch only" is **not** checked off.
20+
2. After cloning your fork and checking out into the develop branch, run `python setup.py --help-commands` for more info on how to install dependencies and build. You may need to run it with `sudo`.
21+
3. If you add a new feature, try to include tests in already existing test cases, or create a new test case if that is not possible. For a comprehensive output, run `python -m unittest -v test_module.py`
22+
4. Make sure the unit tests pass before raising a PR. For all the unit tests to pass, you typically need to setup a premium proxy service such as `ScraperAPI` or `Luminati` (`Bright Data`). By default, `python setup.py install` will get `FreeProxy`. Without a proxy, 6 out of 17 test cases will be skipped.
23+
5. Check that the documentatation is consistent with the code. Check that the documentation builds successfully.
24+
6. Submit a PR, with `develop` as your base branch.
25+
7. After an initial code review by the maintainers, the unit tests will be run with the `ScraperAPI` key stored in the Github repository. Passing all tests cases is necessary before merging your PR.
2526

2627

2728
## Build Docs

.github/workflows/codespell.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Codespell configuration is within pyproject.toml
2+
---
3+
name: Codespell
4+
5+
on:
6+
push:
7+
branches: [develop]
8+
pull_request:
9+
branches: [develop]
10+
11+
permissions:
12+
contents: read
13+
14+
jobs:
15+
codespell:
16+
name: Check for spelling errors
17+
runs-on: ubuntu-latest
18+
19+
steps:
20+
- name: Checkout
21+
uses: actions/checkout@v4
22+
- name: Annotate locations with typos
23+
uses: codespell-project/codespell-problem-matcher@v1
24+
- name: Codespell
25+
uses: codespell-project/actions-codespell@v2

CHANGELOG.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
### Bugfixes
99
- Fix pprint failures on Windows #413.
1010
- Thoroughly handle 1000 or more publications that are available (or not) according to public access mandates #414.
11-
- Fix errors in `download_mandates_csv` that may occassionally occur for agencies without a policy link #413.
11+
- Fix errors in `download_mandates_csv` that may occasionally occur for agencies without a policy link #413.
1212

1313
## Changes in v1.6.3
1414

@@ -35,7 +35,7 @@
3535

3636
### Features
3737
- Download table of funding agencies as a CSV file with URL to the funding mandates included
38-
- Downlad top-ranking journals in general, under sub-categories and in different languages as a CSV file
38+
- Download top-ranking journals in general, under sub-categories and in different languages as a CSV file
3939

4040
### Bugfixes
4141
- #392
@@ -58,7 +58,7 @@
5858
## Changes in v1.5.0
5959
### Features
6060
- Fetch the public access mandates information from a Scholar profile and mark the publications whether or not they satisfy the open-access mandate.
61-
- Fetch an author's organization identifer from their Scholar profile
61+
- Fetch an author's organization identifier from their Scholar profile
6262
- Search for all authors affiliated with an organization
6363
- Fetch homepage URL from a Scholar profile
6464
### Enhancements

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,4 +52,4 @@ keywords:
5252
citation-index scholarly-articles
5353
citation-analysis scholar googlescholar
5454
license: Unlicense
55-
version: 1.5.0
55+
version: 1.7.11

CODE_OF_CONDUCT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ permalink: /coc.html
88
We as members, contributors, and leaders pledge to make participation in our
99
community a harassment-free experience for everyone, regardless of age, body
1010
size, visible or invisible disability, ethnicity, sex characteristics, gender
11-
identity and expression, level of experience, education, socio-economic status,
11+
identity and expression, level of experience, education, socioeconomic status,
1212
nationality, personal appearance, race, religion, or sexual identity
1313
and orientation.
1414

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ This means your code that uses an earlier version of `scholarly` is guaranteed t
5353

5454
## Tests
5555

56-
To check if your installation is succesful, run the tests by executing the `test_module.py` file as:
56+
To check if your installation is successful, run the tests by executing the `test_module.py` file as:
5757

5858
```bash
5959
python3 test_module

docs/quickstart.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ or use ``pip`` to install from github:
1616
1717
pip install git+https://github.com/scholarly-python-package/scholarly.git
1818
19+
or use ``conda`` to install from ``conda-forge``:
20+
21+
.. code:: bash
22+
23+
conda install -c conda-forge scholarly
24+
1925
or clone the package using git:
2026

2127
.. code:: bash

pyproject.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
11
[build-system]
22
requires = ["setuptools", "wheel"]
33
build-backend = "setuptools.build_meta"
4+
5+
[tool.codespell]
6+
# Ref: https://github.com/codespell-project/codespell#using-a-config-file
7+
skip = '.git*'
8+
check-hidden = true
9+
ignore-regex = '\b(assertIn|Ewha Womans|citeseerx.ist.psu.edu\S*)\b'
10+
# ignore-words-list = ''

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ arrow
22
beautifulsoup4
33
bibtexparser
44
deprecated
5-
fake_useragent
5+
fake-useragent
66
free-proxy
77
httpx
88
python-dotenv

scholarly/_proxy_generator.py

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -109,15 +109,15 @@ def SingleProxy(self, http=None, https=None):
109109
110110
:param http: http proxy address
111111
:type http: string
112-
:param https: https proxy adress
112+
:param https: https proxy address
113113
:type https: string
114114
:returns: whether or not the proxy was set up successfully
115115
:rtype: {bool}
116116
117117
:Example::
118118
119119
>>> pg = ProxyGenerator()
120-
>>> success = pg.SingleProxy(http = <http proxy adress>, https = <https proxy adress>)
120+
>>> success = pg.SingleProxy(http = <http proxy address>, https = <https proxy address>)
121121
"""
122122
self.logger.info("Enabling proxies: http=%s https=%s", http, https)
123123
proxy_works = self._use_proxy(http=http, https=https)
@@ -136,7 +136,8 @@ def _check_proxy(self, proxies) -> bool:
136136
:rtype: {bool}
137137
"""
138138
with requests.Session() as session:
139-
session.proxies = proxies
139+
# Reformat proxy for requests. Requests and HTTPX use different proxy format.
140+
session.proxies = {'http':proxies['http://'], 'https':proxies['https://']}
140141
try:
141142
resp = session.get("http://httpbin.org/ip", timeout=self._TIMEOUT)
142143
if resp.status_code == 200:
@@ -161,7 +162,7 @@ def _check_proxy(self, proxies) -> bool:
161162
def _refresh_tor_id(self, tor_control_port: int, password: str) -> bool:
162163
"""Refreshes the id by using a new Tor node.
163164
164-
:returns: Whether or not the refresh was succesful
165+
:returns: Whether or not the refresh was successful
165166
:rtype: {bool}
166167
"""
167168
try:
@@ -189,11 +190,12 @@ def _use_proxy(self, http: str, https: str = None) -> bool:
189190
:returns: whether or not the proxy was set up successfully
190191
:rtype: {bool}
191192
"""
192-
if http[:4] != "http":
193+
# Reformat proxy for HTTPX
194+
if http[:4] not in ("http", "sock"):
193195
http = "http://" + http
194196
if https is None:
195197
https = http
196-
elif https[:5] != "https":
198+
elif https[:5] not in ("https", "socks"):
197199
https = "https://" + https
198200

199201
proxies = {'http://': http, 'https://': https}
@@ -365,8 +367,8 @@ def _get_webdriver(self):
365367
def _get_chrome_webdriver(self):
366368
if self._proxy_works:
367369
webdriver.DesiredCapabilities.CHROME['proxy'] = {
368-
"httpProxy": self._proxies['http'],
369-
"sslProxy": self._proxies['https'],
370+
"httpProxy": self._proxies['http://'],
371+
"sslProxy": self._proxies['https://'],
370372
"proxyType": "MANUAL"
371373
}
372374

@@ -381,8 +383,8 @@ def _get_firefox_webdriver(self):
381383
if self._proxy_works:
382384
# Redirect webdriver through proxy
383385
webdriver.DesiredCapabilities.FIREFOX['proxy'] = {
384-
"httpProxy": self._proxies['http'],
385-
"sslProxy": self._proxies['https'],
386+
"httpProxy": self._proxies['http://'],
387+
"sslProxy": self._proxies['https://'],
386388
"proxyType": "MANUAL",
387389
}
388390

@@ -432,7 +434,7 @@ def _handle_captcha2(self, url):
432434
self.logger.info("Google thinks we are DOSing the captcha.")
433435
raise e
434436
except (WebDriverException) as e:
435-
self.logger.info("Browser seems to be disfunctional - closed by user?")
437+
self.logger.info("Browser seems to be dysfunctional - closed by user?")
436438
raise e
437439
except Exception as e:
438440
# TODO: This exception handler should eventually be removed when
@@ -483,6 +485,10 @@ def _new_session(self, **kwargs):
483485
# ScraperAPI requests to work.
484486
# https://www.scraperapi.com/documentation/
485487
init_kwargs["verify"] = False
488+
if 'proxies' in init_kwargs:
489+
proxy=init_kwargs['proxies']['https://']
490+
del init_kwargs['proxies']
491+
init_kwargs['proxy'] = proxy
486492
self._session = httpx.Client(**init_kwargs)
487493
self._webdriver = None
488494

@@ -498,7 +504,7 @@ def _close_session(self):
498504
self.logger.warning("Could not close webdriver cleanly: %s", e)
499505

500506
def _fp_coroutine(self, timeout=1, wait_time=120):
501-
"""A coroutine to continuosly yield free proxies
507+
"""A coroutine to continuously yield free proxies
502508
503509
It takes back the proxies that stopped working and marks it as dirty.
504510
"""

0 commit comments

Comments
 (0)