You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This extension provides several helpful functionalities for OpenRefine users who want to edit (structured data of) **media files** (images, videos, PDFs...) on **[Wikimedia Commons](https://commons.wikimedia.org)**. For more info, documentation and how-tos about OpenRefine for Wikimedia Commons, see **https://commons.wikimedia.org/wiki/Commons:OpenRefine**.
3
4
4
-
This is an OpenRefine extension for Wikimedia Commons.
5
-
It works with OpenRefine 3.6+.
5
+
Features included in this extension:
6
+
* Start an OpenRefine project by loading file names from one or more **Wikimedia Commons categories** (including category depth)
7
+
* Add **columns** with Commons categories and/or M-ids of each file name
8
+
* File names will already be **reconciled** when starting the project
9
+
* A few dedicated **GREL commands** allow basic processing and extraction of Wikitext: `extractFromTemplate` and `value.extractCategories`
6
10
11
+
It works with **OpenRefine 3.6.x and later versions of OpenRefine**. It is not compatible with OpenRefine 3.5.x or earlier. *(OpenRefine supports editing Wikimedia Commons from version 3.6; this is not possible in earlier versions.)*
7
12
8
-
Building it
9
-
-----------
13
+
*This extension was first released in October 2022. It has been funded by a [Wikimedia project grant](https://meta.wikimedia.org/wiki/Grants:Project/CS%26S/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine).*
14
+
15
+
## How to use this extension
16
+
17
+
### Install this extension in OpenRefine
18
+
19
+
Download the .zip file of the [latest release of this extension](https://github.com/OpenRefine/CommonsExtension/releases).
20
+
Unzip this file and place the unzipped folder in your OpenRefine extensions folder. [Read more about installing extensions in OpenRefine's user manual](https://docs.openrefine.org/manual/installing#installing-extensions).
When this extension is installed correctly, you will now see the additional option 'Wikimedia Commons' when starting a new project in OpenRefine.
25
+
26
+
### Start an OpenRefine project from one or more Wikimedia Commons categories
27
+
28
+
After installing this extension, click the 'Wikimedia Commons' option to start a new project in OpenRefine. You will be prompted to add one or more [Wikimedia Commons categories](https://commons.wikimedia.org/wiki/Commons:Categories).
You can specify category depth by typing or selecting a number in the input field after each category. Depth `0` means only files from the current category level; depth `1` will retrieve files from one sub-category level down, etc.
35
+
36
+
Next, in the project preview screen (`Configure parsing options`), you can choose to also include a column with each file's M-id (unique [MediaInfo identifier](https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo#MediaInfo_Entity)) and/or Commons categories.
37
+
38
+
File names will already be reconciled when your project starts.
39
+
40
+
When you load larger categories (thousands of files) in a new project, OpenRefine will start slowly and will give you a memory warning. [This is a known issue](https://github.com/OpenRefine/CommonsExtension/issues/72). Wait for a bit; the project will eventually start. The Commons Extension has been tested with a project of more than 450,000 files.
41
+
42
+
### GREL commands to extract data from Wikitext
43
+
44
+
The Wikimedia Commons Extension also enables two dedicated GREL commands, which help to extract specific information from the Wikitext of Wikimedia Commons files. *(GREL, General Refine Expression Language, is a dedicated scripting language used in OpenRefine for many flexible data operations. For a general reference on using GREL in OpenRefine, see https://docs.openrefine.org/manual/grelfunctions.)*
45
+
46
+
Firstly, retrieve the Wikitext from a list of Commons files in your project. In the column menu of the reconciled file names' column, select `Edit column` > `Add column from reconciled values...` and select `Wikitext` in the resulting dialog window.
47
+
48
+
From this new column with Wikitext, you can now extract values and categories as described below. Start by selecting `Edit column` > `Add column based on this column...` in the column menu. In the next dialog window, you can use various specific GREL commands:
49
+
50
+
#### Extract values from template parameters: `extractFromTemplate`
where you replace `BHL` with the name of the template (without curly brackets) and `source` with the parameter from which you want to extract the value. This GREL syntax will return the first (and usually the only) value of said parameter, e.g. `https://www.flickr.com/photos/biodivlibrary/10329116385`.
This GREL syntax will return all categories mentioned in the Wikitext, separated by the `#` character, which you can then use to split the resulting cell further as needed.
73
+
74
+
## Development
75
+
76
+
### Building from source
10
77
11
78
Run
12
79
```
@@ -15,13 +82,11 @@ mvn package
15
82
16
83
This creates a zip file in the `target` folder, which can then be [installed in OpenRefine](https://docs.openrefine.org/manual/installing#installing-extensions).
17
84
18
-
Developing it
19
-
-------------
85
+
### Developing it
20
86
21
87
To avoid having to unzip the extension in the corresponding directory every time you want to test it, you can also use another set up: simply create a symbolic link from your extensions folder in OpenRefine to the local copy of this repository. With this setup, you do not need to run `mvn package` when making changes to the extension, but you will still to compile it with `mvn compile` if you are making changes to Java files, and restart OpenRefine if you make changes to any files.
22
88
23
-
Releasing it
24
-
------------
89
+
### Releasing it
25
90
26
91
- Make sure you are on the `master` branch and it is up to date (`git pull`)
27
92
- Open `pom.xml` and set the version to the desired version number, such as `<version>0.1.0</version>`
0 commit comments