Skip to content

Commit 4878b57

Browse files
authored
Merge pull request #87 from OpenRefine/trnstlntk-howto-in-readme
Add user how-to to README.md
2 parents 9aac087 + 9f7eeeb commit 4878b57

1 file changed

Lines changed: 75 additions & 10 deletions

File tree

README.md

Lines changed: 75 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,79 @@
1-
Commons extension
2-
=================
1+
# Wikimedia Commons Extension for OpenRefine
2+
<img align="right" width="160" src="https://upload.wikimedia.org/wikipedia/commons/4/4a/Commons-logo.svg">
3+
This extension provides several helpful functionalities for OpenRefine users who want to edit (structured data of) **media files** (images, videos, PDFs...) on **[Wikimedia Commons](https://commons.wikimedia.org)**. For more info, documentation and how-tos about OpenRefine for Wikimedia Commons, see **https://commons.wikimedia.org/wiki/Commons:OpenRefine**.
34

4-
This is an OpenRefine extension for Wikimedia Commons.
5-
It works with OpenRefine 3.6+.
5+
Features included in this extension:
6+
* Start an OpenRefine project by loading file names from one or more **Wikimedia Commons categories** (including category depth)
7+
* Add **columns** with Commons categories and/or M-ids of each file name
8+
* File names will already be **reconciled** when starting the project
9+
* A few dedicated **GREL commands** allow basic processing and extraction of Wikitext: `extractFromTemplate` and `value.extractCategories`
610

11+
It works with **OpenRefine 3.6.x and later versions of OpenRefine**. It is not compatible with OpenRefine 3.5.x or earlier. *(OpenRefine supports editing Wikimedia Commons from version 3.6; this is not possible in earlier versions.)*
712

8-
Building it
9-
-----------
13+
*This extension was first released in October 2022. It has been funded by a [Wikimedia project grant](https://meta.wikimedia.org/wiki/Grants:Project/CS%26S/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine).*
14+
15+
## How to use this extension
16+
17+
### Install this extension in OpenRefine
18+
19+
Download the .zip file of the [latest release of this extension](https://github.com/OpenRefine/CommonsExtension/releases).
20+
Unzip this file and place the unzipped folder in your OpenRefine extensions folder. [Read more about installing extensions in OpenRefine's user manual](https://docs.openrefine.org/manual/installing#installing-extensions).
21+
22+
<img width="600" src="https://upload.wikimedia.org/wikipedia/commons/2/26/OpenRefine_-_Commons_Extension_-_location_to_install.png">
23+
24+
When this extension is installed correctly, you will now see the additional option 'Wikimedia Commons' when starting a new project in OpenRefine.
25+
26+
### Start an OpenRefine project from one or more Wikimedia Commons categories
27+
28+
After installing this extension, click the 'Wikimedia Commons' option to start a new project in OpenRefine. You will be prompted to add one or more [Wikimedia Commons categories](https://commons.wikimedia.org/wiki/Commons:Categories).
29+
30+
<img src="https://upload.wikimedia.org/wikipedia/commons/5/53/OpenRefine_-_Commons_Extension_-_start_project_from_categories.png">
31+
32+
There's no need to type the Category: prefix.
33+
34+
You can specify category depth by typing or selecting a number in the input field after each category. Depth `0` means only files from the current category level; depth `1` will retrieve files from one sub-category level down, etc.
35+
36+
Next, in the project preview screen (`Configure parsing options`), you can choose to also include a column with each file's M-id (unique [MediaInfo identifier](https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo#MediaInfo_Entity)) and/or Commons categories.
37+
38+
File names will already be reconciled when your project starts.
39+
40+
When you load larger categories (thousands of files) in a new project, OpenRefine will start slowly and will give you a memory warning. [This is a known issue](https://github.com/OpenRefine/CommonsExtension/issues/72). Wait for a bit; the project will eventually start. The Commons Extension has been tested with a project of more than 450,000 files.
41+
42+
### GREL commands to extract data from Wikitext
43+
44+
The Wikimedia Commons Extension also enables two dedicated GREL commands, which help to extract specific information from the Wikitext of Wikimedia Commons files. *(GREL, General Refine Expression Language, is a dedicated scripting language used in OpenRefine for many flexible data operations. For a general reference on using GREL in OpenRefine, see https://docs.openrefine.org/manual/grelfunctions.)*
45+
46+
Firstly, retrieve the Wikitext from a list of Commons files in your project. In the column menu of the reconciled file names' column, select `Edit column` > `Add column from reconciled values...` and select `Wikitext` in the resulting dialog window.
47+
48+
From this new column with Wikitext, you can now extract values and categories as described below. Start by selecting `Edit column` > `Add column based on this column...` in the column menu. In the next dialog window, you can use various specific GREL commands:
49+
50+
#### Extract values from template parameters: `extractFromTemplate`
51+
52+
<img width="600" src="https://upload.wikimedia.org/wikipedia/commons/b/be/OpenRefine_-_Commons_Extension_-_GREL_extractFromTemplate.png">
53+
54+
Use the following syntax:
55+
56+
```
57+
extractFromTemplate(value, "BHL", "source")[0]
58+
```
59+
60+
where you replace `BHL` with the name of the template (without curly brackets) and `source` with the parameter from which you want to extract the value. This GREL syntax will return the first (and usually the only) value of said parameter, e.g. `https://www.flickr.com/photos/biodivlibrary/10329116385`.
61+
62+
#### Extract Wikimedia Commons categories: `value.extractCategories`
63+
64+
<img width="600" src="https://upload.wikimedia.org/wikipedia/commons/0/0d/OpenRefine_-_Commons_Extension_-_GREL_value.extractCategories.png">
65+
66+
Use the following syntax:
67+
68+
```
69+
value.extractCategories().join('#')
70+
```
71+
72+
This GREL syntax will return all categories mentioned in the Wikitext, separated by the `#` character, which you can then use to split the resulting cell further as needed.
73+
74+
## Development
75+
76+
### Building from source
1077

1178
Run
1279
```
@@ -15,13 +82,11 @@ mvn package
1582

1683
This creates a zip file in the `target` folder, which can then be [installed in OpenRefine](https://docs.openrefine.org/manual/installing#installing-extensions).
1784

18-
Developing it
19-
-------------
85+
### Developing it
2086

2187
To avoid having to unzip the extension in the corresponding directory every time you want to test it, you can also use another set up: simply create a symbolic link from your extensions folder in OpenRefine to the local copy of this repository. With this setup, you do not need to run `mvn package` when making changes to the extension, but you will still to compile it with `mvn compile` if you are making changes to Java files, and restart OpenRefine if you make changes to any files.
2288

23-
Releasing it
24-
------------
89+
### Releasing it
2590

2691
- Make sure you are on the `master` branch and it is up to date (`git pull`)
2792
- Open `pom.xml` and set the version to the desired version number, such as `<version>0.1.0</version>`

0 commit comments

Comments
 (0)