I am a hater of Wikipedia haters. It drives me up the wall when my kids' history teachers ban them from using it. I use it all the time, even for topics I pretend to have mastered.
Horsing around with Wikipedia
I’m writing a book about international law in Cairo. One chapter is a study of a huge probate case (Antun Yusuf ‘Abd al-Messih) that began in 1885. As I was investigating the background of Antun’s debtors, I started reading about Ali Pasha Sherif, whose luxury lifestyle was funded by loans of up to £100,000. I remembered Ali Pasha as one of the slaveholding elites described in Eve Troutt Powell’s 2003 A Different Shade of Colonialism.
Computational methods are key to my work on this book. I’m trying (so far without great success) to describe the basic producers of the wealth that fueled litigation of the sort that I’m examining. To that end, I’m working to tie all my datasets into Wikidata, which is the leading linked data hub. Wikidata is fuelled in part by Wikipedia, and I was pleased to see that Ali Pasha has an entry.
Like many entries about Middle East topics, it’s quixotic. Its earliest iteration is a paraphrase of a page from an old AOL site by a California enthusiast of “Egyptian Arabian Sport Horses.” Not surprisingly, it has more to say about Ali’s horses than the man himself. But it said that he was Egypt’s foreign minister; of course I immediately thrilled to the idea of international legal precedent established by the personal debts of a colonized foreign secretary.
To the credit of my kids’ history teachers, Wikipedia was wrong: Ali Pasha was never Egypt’s minister (though his brother was Ottoman foreign minister). To my credit, Wikipedia is no longer wrong: I corrected the error and tidied up the page. Thanks to the enthusiast who produced the original text, I learned some things about the Egyptian horse market in the 1870s and 80s, and I came better to understand Ali’s debts to Antun. And as I tried to figure out who had and had not been Egypt’s foreign minister, I discovered another enthusiast: عادل, who has made thousands and thousands and thousands of contributions to Arabic wikipedia, including an incredible set of pages listing Egyptian Ministers, among them a complete page on the foreign ministers.
Listing foreign ministers
In a burst of enthusiasm, I got down to work replicating Adil’s Arabic ministers list in English. I made it to 1930 before I ran out of steam. I was worn down by a few factors:
- my own desire to move on to the next thing before finishing the task at hand
- MediaWiki’s table syntax, which is a bit intricate to do by hand (I realize now I should just have used a convertor)
- missing Wikipedia entries for many of the foreign ministers of Egypt (the topic of the rest of this section)
- a sense that I should fill out this dataset in a more portable format (which is the topic of the rest of this post)
Wikipedia entries on Middle East topics are a treasure trove of resources, especially if you work your way through the “Languages” links at the bottom of the left hand menu. The foreign ministers list looks very different in English, Arabic, and German. The Persian version is based on the Arabic page before Adil upgraded it, while the Russian version appears to be its own animal. Considering the importance of Egypt to France (and vice versa), it’s remarkable that there is no French page.
As I worked to mirror the Arabic list in English, I found that several ministers had Arabic pages without English counterparts. I made brief, rough stub pages for
None of these is great work, but they’re enough for others to build out if and when they feel the need.
As is often the case, the fine folks who volunteer to keep Wikipedia tidy were cautious to accept thinly referenced new entries. Their position is reasonable enough. At the same time, the effort to expand Wikipedia beyond dead white men sometimes founders on the lack of easily accessible, English-language secondary sources. Wikipedia editors tend to reflexively expect reference to such sources to verify notability. It’s easy enough when there’s a Times obituary (as there is for Tigrane Pasha). It’s more difficult if you build your page entirely on a paragraph in Zirikli’s al-A’lam, which is what I usually do.
Things get easier when there’s an entry in Arabic wikipedia that you can mirror. Best of all is if the subject is an author who can be found in VIAF: The Virtual International Authority File and WorldCat. Linking to those authority controls usually makes the case for notability clear. In addition, these catalogues provide an easy list of books to add to the Wikipedia entry.
It takes a bit of time to make a new page, however, and there were dozens of such pages required to fill out the list of Egypt’s foreign ministers. Many of these men were less notable, and I didn’t feel up to doing the digging.
Closed versus open formats
While I was searching up the lesser-known names on the list, I came across another amazing Wikipedia contribution from Adil: in 2016 he created a page for every Egyptian cabinet from 1878 to the present. Our brilliant friend produced tables listing every minister and ministry. He drew his information from encyclopedic works by Mohammad al-Gawadi, Younan Labib Rizk, and others, and he cites these sources scrupulously.
When I saw these lists, I thought that I could approach this problem differently. Instead of spending a month writing Wikipedia pages, I could spend a few days converting these well-formatted and standardized tables into a format that I could upload to Wikidata.
What is the difference between Wikipedia and Wikidata? Wikipedia is written by and for human readers. While there are automatic bots that crawl through the pages to standardize spelling and dates and categorize pages, almost all of the work is done “by hand.” Wikidata, on the other hand, is written for and (to a large extent) by automatic readers. It is an elaborately structured dataset based on a fascinating category structure (or ontology) that attempts to describe all existence.
I won’t bore the few readers who have made it this far with rhapsodies about Wikidata. Suffice it to say that this dataset nourishes all sorts of automated reasoning about our world. Those committed to combatting Eurocentrism do well to contribute data to this pool, which tilts very, very white, male, and wealthy. Egyptian ministerial data is not the ideal remedy for that problem. It is an improvement on what’s there, however. Even more important, any information that is already digitized, cleaned, and structured is fairly easy to upload. And so that’s what I resolved to do.
This undertaking is not purely altruistic. I expect to be able to use it to support my own research as well. Wikidata has a query service that uses the powerful SPARQL language. It’s fun for trivia: here’s a query for the first names of everyone (for whom a first name and birthplace is recorded) who was born in Cairo (press play to get it to run). It can also be fun for more serious questions, so long as the dataset is relatively complete. That’s what I set out to do with the ministers.
Documenting the process
The next part of this post is fairly dry. I’m recording details so that others searching for a “how to” can learn from my attempts, and so that I can remember what I did. If you’re not interested in the step-by-step, I urge you to jump to the end.
Preparing the data for OpenRefine
Copy table contents from Arabic foreign ministers pages. It would be possible to automate this, but I did it manually in fifteen minutes.
Paste tables into spreadsheet, figure out odd parts (mostly having to do with post-1952, when there were frequent prime minister changes but few minister changes), then paste results into plain text editor as tab separated values. Keep working till you have one line per ministry.
Create new table in OpenRefine.
Parse dates in OpenRefine until they are standardized. The easiest way to do this is to split the date into several columns (by space). Then
Reconcile Prime Minister names with Wikidata. This is easy, because they’re already there.
Make a new column by fetching English labels (Property Len). Add “Cabinet” to the end. Fill down.
Filter wizara column by ordinal numbers (al-ula, etc), then add English equivalent to Cabinet column.
Some foreign ministers' terms cover two cabinets. Filter for these, clean up end dates, etc, then delete the excess rows.
Reconcile Arabic cabinet names against Wikidata. Straightforward enough, because they are already there from the Wikipedia pages, but they are empty.
Add description, using “Egyptian government formed in " plus year in start column.
Add head of state column, manually with khedives, reconcile, then fill down.
Add follows, followed by, reconcile.
Map to schema.
Cut and paste all of the cabinet tables from the various cabinet pages, in sequence, into spreadsheet.
Filter for prime Minsters, Add column with them.
Match them with their cabinets (by hand)
Note: Image from Wikimedia Commons