Sunday, September 17, 2017

#Wikimedia and its #BLP approach


There is a huge controversy about the policies about the "Biographies of Living People". Central in all this is that there is no such policy at Wikidata. Many seasoned Wikipedians are of the opinion that using data in Wikipedia is a violation of its BLP policy as a consequence. At the same time there are seasoned Wikidatans who oppose a BLP policy similar to the one at Wikipedia. The problem is that Wikidata does need a BLP policy but it needs to be different for various reasons.

  • An item in Wikidata can be really rudimentary; Marian Latour, a Dutch author, was created because she won an award. This is allowed in Wikidata but the limited information is probably a violation of the English BLP policy. This information came from the Dutch Wikipedia
  • The initial data of Wikidata were the interwiki links. This was a huge improvement for the Wikipedias and there are still many items that have no statements. This is used as an argument not to accept information from Wikidata.
  • Wikidata data is retrieved from a Wikipedia, information like "who won an award". Given the BLP policy of that Wikipedia is should be faultless but it often is not due to disambiguation issues. 
The first issue refers to a red link on the Dutch Wikipedia. When the red link is associated with the Wikidata item, there will not be a new disambiguation issue when a different Marian Latour is introduced. Currently there is only one Marian Latour known to Wikidata.
The second issue is one where Wikidata statistics indicate that slowly but surely is adding statements. They also prove that there is still so much to do...
The third issue is the main one. When an article is linked to Wikidata, articles in other languages should link to the same item or to a red link. Solving these issues requires coexistence and preferably collaboration. 

What we need in a Wikipedia is the ability to link a blue or red link to a Wikidata item. Obviously changing links is either blatantly obvious like for Manuel Echeverria or it requires a source. Technically the necessary change in the MediaWiki software may be "opt in" so that only people who care about this approach to quality make use of it. 

As far as I am concerned, when some Wikipedians find fault elsewhere and do not reflect on this proposal and the improvements it brings them, that is fine. What is relevant is that this approach allows for the best Wikidata practices and at the same time improves the BLP quality in all Wikimedia projects.
Thanks,
       GerardM

Saturday, September 09, 2017

The Manuel Echeverría "revenge"

When there are mistakes in a Wikipedia, it follows that once information is copied from that Wikipedia these mistakes find their way into Wikidata. So Manuel Echeverria did not receive the Xavier Villaurrutia AwardManuel Echeverría did.

So the edit that made Mr Echeverria a recipient of the award was reverted. I fixed things by using the Spanish Wikipedia as a resource instead. The dates were added when people received the award and a few missing people in Wikidata are now known as well.

I cannot be bothered to fix the English Wikipedia. There is no structural solution at this time and as far as I am concerned, there is no interest in one that has been proposed.

There is one additional reason why a solution would be advantageous; reverting edits is a hostile act when edits are made with the best intentions. By actively linking red links and black links to Wikidata, such reversions will become unnecessary.

The problem is that Wikipedians need to understand a problem that as far as they are concerned is elsewhere, and is only caused by the lack of quality of their project. It is with grim satisfaction that I know it serves them well.
Thanks,
     GerardM

Saturday, September 02, 2017

#Wikimedia - Where I make a stand / where I stand for

I was told that my priorities are not the shared priorities of our movement; this by a pivotal person in the WMF. I consider this a personal affront and I will spell out what I stand for and where I make a stand. When you want to personally verify the veracity of my commitment; read my blog and check out my involvement. I have blogged for over 10 years and the basics/citations are all there to find. I consider my position very much in line with what our movement is there for.

==Share in the sum of all knowledge==
This is the overarching aim of our movement. At this time we are congratulating ourselves with what we have achieved so far. There is a lot to celebrate particularly for the English reading world.

===Everything but English===
Given that only 40% of the world population can read English, our successes need to be measured for what we do for all the people in the world. I do not care for good intentions, I care for what can be observed. Financially there is no break down available on the amount spend on English versus the amount spend on all the rest. This is imho a diversity issue as potent as the gender gap. All the arguments why "English first" are structurally no different from any other "my group first" arguments. Just compare the amounts given to US American chapters versus the Indian chapter. In addition you may or may not consider the cost of the software that is developed with English Wikipedia in mind.

===Internationalisation and localisation===
I have searched briefly for "internationalisation" in the 2030 strategy papers. Could not find it. It is however the bedrock of Wikipedia. It is vital for any and all of the individual features of MediaWiki.

When you consider Wikimedia partners like the Internet Archive and their Open Library, we do not even consider how much we will to achieve when together we reach out to the other 60% as well. Our internationalisation platform is open to our open source partners and translatewiki.net is in my opinion a strategic resource.

===Partners===
The successes of our GLAM partnerships prove collaboration serves mutual interests. There are plans to improve Commons, a key part is the Wikidatification that will open up Commons, not only in English but also in any and all other languages. Where we could make more of a difference is help where our partners indicate what is relevant to them. We can show them the effect of the cooperation in any language. At this time what we show is limited to images. This is something we should expand on.

====Internet Archive====
The Internet Archive provides a vital service to our Wikipedias. Its Wayback Machine allows us to proof that references that used to be on the Internet existed. Effectively it is an import tool when the aim is to prevent misinformation. Its Open Library has two parts. The part I am interested in is making free e-books available to readers. We would do better when we collaborate just a bit more and help them with their internationalisation and localisation.

====OCLC====
The libraries of this world collaborate in the OCLC and share their links in one system; the Virtual International Authority File. In its WorldCat sytem, the idea is that people can find books in the library near to them. Thanks to the references to local libraries, it is always possible to know if a book, an author is known in whatever country. Important is for us to improve cooperation and the visibility of this collaboration for our readers and editors.

===Bringing things together===
I have helped bring data from Wikidata, OCLC and Open Library together. I am seeking the disambiguation of Open Library content using existing links to the Library of Congress to the VIAF and consequently to Wikidata. I am adding award winners because they provide arguments what articles to write or improve. Currently I am adding Dutch literature awards to show the Dutch National Library that this information exists and can be used. Recently I added botanical awards to show a group of botanists how small tasks like this add relevance.

===Outspoken stuff===
  • I am not a Wikipedian and consequently arguments specific to any Wikipedia are problematic, mostly irresponsible.
  • I care about diversity; issues around the gender gap do get extra attention from me but it is a secondary consideration.
  • I care about usability and use Reasonator and tools like Petscan and Awarder. The necessity to use Reasonator for so many years is proof perfect that usability does not have much of a priority. Having seen previous attempts at usability, I will consider it once it is available.
  • I expect that there will be more use for our data. Quality is key and collaboration on a meta scale is what will make this possible.
  • Wikidata is particularly useful in English. Theoretically other languages may profit from its multilingual nature. Institutional (WMF) interest is needed to improve this use of Wikidata. 
  • While I respect many efforts of the WMF, I find that its concentration on English Wikipedia has a very negative effect on a micro scale. It is not all bad but it is this division of labour and money that prevents us from having the most bang for our buck.
Thanks,
      GerardM

PS I resent that I felt the need to write this blogpost.

Sunday, August 27, 2017

#Wikidata - surge of new items

Lately there has been a surge of new items coming into Wikidata. They must be quite good when you consider the number of statements. The items with no statements are mainly part of the original load, the Wikipedia articles, and their number is slowly but surely decreasing (1.35% the last month).

With more items in Wikidata, there is more data to support, to edit. As it is, limits are put on the amount of edits. This can be appreciated because of the current performance problems but it is obvious that as this upward trend continues, more people and more data will come to Wikidata to edit as well as to query.

There is plenty of data waiting in the wings to be added. The big challenge is promoting the data that is of use and will enable more collaboration both with people and with organisations.
Thanks,
      GerardM

Saturday, August 26, 2017

#OpenLibrary - Charles Horn and its other volunteers

There are several reasons why Open Library and Internet Archive deserve attention. They provide downloadable books in many language and their Wayback machine comes to the rescue when links in references in Wikipedia go stale. Have a look at the presentation from Wikimania 2017 (from11:46).

The Internet Archive is officially one of the partners of the Wikimedia Foundation. When you ask who in the Wikimedia Foundation is the goto person for contacts with Internet Archive, there is no answer. It is as if there is no structure in contacts with our partners even when it plays dividends to collaborate in a more structured way. When you consider the "Coleman Boat" it is just as if the macro elements are totally missing and it is left for the micro elements to make the difference.

Macro effects of collaboration with the Open Library would be:
  • references are made to downloadable eBooks from Wikipedia - People read books
  • localisation are made at translatewiki.net - People read books in "other" languages 
  • books at Open Library are in Wikidata - links to eBooks are available
  • identifiers are widely shared and widely curated -  work of volunteers has the biggest impact
At a micro level, collaboration is happening. Charles Horn, a volunteer at Open Library is a stellar example. Charles added identifiers to Wikidata and VIAF in the Open Library database. He provided us with a large file of redirects and was instrumental in removing multiple identifiers to Open Library for authors.  He recently produced a Wikidata query to find duplicates and the Wikidata community was made aware of this maintenance work. 

Many of the macro opportunities become possible when conditions at Open Library are met. One big issue is the need for disambiguation and de-duplication. This is not helped with the massive amounts of data involved and the lack of data on the individual author level. While individuals like Charles have an immense effect, it is in the collaboration on a macro level where even bigger differences can be made. Consider; many books include identifiers like an ISBN or a link to the Library of Congress. So it is possible to leverage a tool developed at the Wikimedia Foundation to retrieve associated meta data or to find associated data at the OCLC.

It takes just a bit of friendly prodding from the macro people at the associated organisations, some reassurance that there is support for these efforts and there will be a lot of talent at the micro level making a big difference. Cooperation and coordination is what the organisations are to provide and we will share more of the knowledge that is available to all who come looking.
Thanks,
       GerardM

Sunday, August 20, 2017

#Wikidata - Martin Reints and {{Authority control}}

Martin Reints received the Herman Gorter Award in 1993. There is a Wikipedia article about him and consequently he was known in Wikidata. There was no "authority control" information for Mr Reints in Wikidata yet and this was quickly remedied.

The most interesting part is that the VIAF registration for Mr Reints already included a link to Wikidata. Proof perfect that librarians are actively working on keeping their house in order. There was an Open Library entry for Mr Reints and the Dutch article had a link to the DBNL-website for Dutch language authors.

Open Library I found is very much about books. Their data on the books they have is great; identifiers like ISBN-10 or ISBN-13 and links to the online catalog of the Library of Congress. This makes a lookup at the OCLC for identifiers of all the authors easy and disambiguation becomes more effective.

Wikidata is very much about data. You can query Wikidata for all the winners of the Herman Gorter Award and it the results you can add the links to VIAF or to the Open Library. This ability to query makes all kinds of applications possible like: "what books written by authors who won the Nobel Prize are available in your library?"
Thanks,
      GerardM

Saturday, August 19, 2017

#OpenLibrary and winners of the Herman Gorter Award

If you want to know if the Open Library is of relevance in other languages, you have to do some research. I wanted to find out if there are publications by the authors who won the prestigious Herman Gorter Award?

This award was conferred from 1945 to 2002 often to multiple authors. The first author not known to Open Library is H. C. ten Berge. He received the Herman Gorter award in 1964. There were several authors where Wikidata did not have a link yet for Open Library.

Now consider this: what if we could query Wikidata for all the authors and their publications in Open Library? 

Just a little bit more metadata about books, publications is what we need.. It is not really a big deal, only a few million additional records..

Many if not most of the books at Open Library have links to authorities like the Library of Congress. This makes it possible to link these books through the OCLC to "your library system". It knows about authors and that is what makes it possible to use tools in stead of people to enrich Wikidata and open up all that is in the Open Library for all of us.
Thanks,
       GerardM