Showing posts with label datascience. Show all posts
Showing posts with label datascience. Show all posts

Jun 5, 2020

DATASET: Text Analysis Challenge - Detect Looted Art - GLAMhack2020

Download  Provenance Texts Dataset: CSV 

Text Analysis Challenge: Detect Looted Art.

Help Automate Analysis, Flagging and Ranking of Museum Art Provenance Texts by the Probability of a Hidden History

The Question: How to sift through the millions of objects in museums to identify top priorities for intensive research by humans? 

The Goal: Automatically Classify and Rank 60,000+ art provenance texts by probability that further research will turn up a deliberately concealed history of looting, forced sale, theft or forgery. 

The Challenge: Analyse texts quickly for Red Flags, quantify, detect patterns, classify, rank, and learn. Whatever it takes to produce a reliable list of top suspects

For this challenge several datasets will be provided.

1) DATASET: 60,000+ art provenance texts for analysis

(example)[Pierre Matisse Gallery, New York, New York], by 1932, to M. Gutmann, 1936. Maurice Wertheim, by 1937, bequest, to Fogg Art Museum, 1951.;NOTE: Provenance derived from "Degas to Matisse: The Maurice Wertheim Collection," John O'Brian, Harry N. Abrams, New York, 1988.1951.76 Collector, Paris, Said to have been bought directly from artist, 1918.;Maurice Wertheim, Purchased from the Valentine Gallery, 1937, Bequest to Fogg Art Museum, 1951.1951.52 Guillaume, Paris, France, 1925, 1935. By 1930, per caption of photo, Paul Guillaume dining room dated c. 1930, Georgel 2006;Maurice Wertheim, 1937, Bequest to Fogg Art Museum, 1951.1951.51 owner, Paris, sold, [through Hôtel Drouot, Paris, June 16, 1906, no 30.]. Dr. Alfred Wolff, Munich, (1912). Sir Michael Sadler, England, (1912). [De Hauke & Co., New York], sold, to A. Conger Goodyear, New York, (1929-1937) sold, [through Wildenstein && Co., New York];to Maurice Wertheim, New York (1937-1951) bequest, to Fogg Art Museum, 1951.1951.49

2) DATASET: 1000 Red Flag Names


Bignou, EtienneBignouEtienneEtienne Bignou
Billiet, DirectorBillietDirectorDirector Billiet
Binder, Dr. Moritz JuliusBinderDr. Moritz JuliusDr. Moritz Julius Binder
Bing CollectionBing CollectionBing Collection
Birtschansky, ZacharieBirtschanskyZacharieZacharie Birtschansky
Bisson, E.BissonE.E. Bisson
Blanc, PierreBlancPierrePierre Blanc
Bleye, WilliBleyeWilliWilli Bleye
Bloch, Dr. VitaleBlochDr. VitaleDr. Vitale Bloch
Bloch-Bauer Collection
Bloch-Bauer Collection
Bloch-Bauer Collection
Bode, Dr.BodeDr.Dr. Bode
Bodenschatz, General KarlBodenschatzGeneral KarlGeneral Karl Bodenschatz
Boedecker, AlfredBoedeckerAlfredAlfred Boedecker
Boehler, Julius, Jr.BoehlerJuliusJulius Boehler
Boehler, Julius, Sr.BoehlerJuliusJulius Boehler
Boehm, Dr. FranzBoehmDr. FranzDr. Franz Boehm
Boehmer, BernhardBoehmerBernhardBernhard Boehmer

3) DATESET:  Red Flag Words or Phrases

likelyprivate collector?transfertelephone
probablyanonymous[removedto at least
possiblyart marketuntil at least
maybeunidentifiedby 19
?unknownbefore 19
property of a European collectoraccording to
private collection
property of a lady

You're the doctor and the texts are your patients! Who's in good health and who's sick? How sick? With what disease? What kind of tests and measurements can we perform on the texts to help us to reach a diagnosis? What kind of markers for should we look for? What kind of patterns? 

What digital methods can we use - and put into practice during the Hackathon - to "diagnose" these texts and prioritize them for "treatment" (ie, additional provenance research)?

  • IDENTIFY Red Flag Names and Words in each Text?
  • COUNT Red Flag Names and Words in each Text?
  • CHARACTERIZE each Text (number of words? sentiment? completeness v gaps? other features to be identified that may be useful)?
  • ANALYZE for patterns, links and networks?
  • CALCULATE probability that the provenance conceals a Nazi-era history that will prove problematic if investigated in detail
  • RANK according to urgency for further in-depth provenance research 

(Voyant-Tools Whitelist containing names: 
keywords-14ca7131716f24c62c6529fcc143bbd2  )

What might a successful result look like?

  • A list of 50 provenances from the DATASET ranked most likely to conceal looted art
  • A color-coded evaluation of each provenance (RED, ORANGE, GREEN) by likelihood of concealing looted art
  • Instructions how to analyze the provenances with the tools, functions or code to use  (for example, how to use Voyant-Tools to count all the Red Flag Names and inject the result back into the spreadsheet)
  • Ideas for going further....

Triage: "assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients or casualties."

Link to Glamhack2020 project for participants

Help us to test and improve and test and improve the code!

Link to Code on Github

Issues to resolve:
- While the word list counts and general name extraction seem to work pretty well, reconciliation with the list of 1000 Red Flag Names still needs work. To test: A tighter tolerance combined with a more complete listing of alias might help.
- The extraction of transaction years after 1900 gives interesting though incomplete indicators. The decision to exclude dates in parentheses (as they are sometimes biographical dates and not transaction dates) needs to be reviewed and refined. 
(note: The texts are from multiple sources applying multiple formats and deliberately entered exactly as is)

Results RAW (with uncorrected errors for analysis)

Apr 27, 2020

Tracing the Nazi extermination of Jewish art collectors with Wikidata Sparql Queries

How to represent the impact of the Holocaust?

Inhumanly huge numbers defy our capacity to understand.

How to depict both the fates of individuals and the larger context, without losing sight of either?

In this next series of posts we attempt to find a way to show what happened to Jewish art collectors and their world during and after the Holocaust.

Timeline from Wikidata Query

As Jewish art collectors, dealers, artists, curators, historians  and museum personnel were being targeted for persecution, robbery and murder by Germany's Nazi government, covetous eyes fixed upon their precious art collections. 

How can we document and visualise this massive double movement: the persecuted people on one hand and their possessions on the other?

We will begin with the people. Those who did not manage to flee, and who ended up murdered in Nazi camps or ghettos.

Each and every one of these individuals had a story: a family, friends, business and social relations, activities, passions, beliefs, enthusiasms, achievements, foibles; a life filled with events and people and - in the case of the individuals whose stories we trace here -  art.

How can we gather the huge amount of information and comprehend how it all fits together and what it means? 

This is not a small challenge.

One of the best tools for dealing with large amounts of linked information, such as relationships between people, places, things and events, is Wikidata.

Why Wikidata?


Because Wikidata is open yet structurally rigorous where it matters.
Wikidata has built into it a linked data structure that can be read by both humans and machines. 

Wikidata can be used for querying not only what is referenced in Wikidata itself, but also for linking to information that is held outside of Wikidata which shares a reference or authority file.


For the art world and the Holocaust, Wikidata has remarkably rich data. Though still a work in progress with much that remains to be done, Wikidata is already far more reliable and complete in the topics that concern us than any other database, authority file or linked dataset that I know of.


The task of telling the story of the Jewish art world and what happened to the people and the artworks during and after the Holocaust is too big for any one person or institution. There are so many people and events and places and objects, so many photos and documents, so many sources, so many languages. It is a job for every individual of good faith who wants to contribute to the sum of our knowledge.


Because Wikidata is driven and maintained and enriched by very clever and hardworking people, and whatever innovations and advances are achieved become available to all of us, for free, where amazingly powerful tools cost only the effort of learning how to use them.

In this next series of posts, I hope to share some ideas and practical tips for using Wikidata Sparql queries to better understand the fates of Jewish art collectors and their collections, both collectively and individually. 

I hope that Holocaust scholars, art historians, provenance researchers, families and their advocates will find some of this useful, and that Sparql mavens will engage with the queries to improve them for the benefit of all.

(My apologies in advance to people who actually know how to write Sparql queries, and my thanks in advance for improving upon these amateurish efforts.)


Sparql Query for above image

#With pictures

#art collectors, art dealers, art historians, curators, museum directors, restorers, galleries...
#died in Auschwitz-Birkenau Q7341
#died in Theresienstadt Q160175

SELECT ?item ?itemLabel ?pic ?datedied ?placediedLabel ?placedied ?birth ?place_birth ?VIAF_ID ?GND_ID ?Library_of_Congress_authority_ID ?ULAN_ID ?child ?childLabel ?ownedby ?ownedbyLabel ?depicts ?depictsLabel ?depictedby ?depictedbyLabel ?countryLabel ?ownerof ?ownerofLabel ?spouse ?employer ?employerLabel ?spouseLabel ?mother ?motherLabel ?father ?fatherLabel ?sibling ?siblingLabel ?sigperson ?sigpersonLabel ?party ?partyLabel ?partner ?partnerLabel WHERE {
{ ?item wdt:P106 wd:Q1792450.} UNION { ?item wdt:P31 wd:Q1007870. } UNION { ?item wdt:P106 wd:Q173950.} UNION { ?item wdt:P921 wd:Q328376.} UNION { ?item wdt:P106 wd:Q10732476.} UNION { ?item wdt:P106 wd:Q446966.} UNION { ?item wdt:P106 wd:Q22132694.} UNION { ?item wdt:P106 wd:Q674426.}

SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
{?item wdt:P20 wd:Q7341.} UNION {?item wdt:P20 wd:Q160175.}
  ?item wdt:P18 ?pic.
OPTIONAL { ?item wdt:P127 ?ownedby. }
OPTIONAL { ?item wdt:P570 ?datedied. }
OPTIONAL { ?item wdt:P20 ?placedied. }
OPTIONAL { ?item wdt:P180 ?depicts. }
OPTIONAL { ?item wdt:P921 ?plunder. }
OPTIONAL { ?item wdt:P1830 ?ownerof. }
OPTIONAL { ?item wdt:P108 ?employer. }
OPTIONAL { ?item wdt:P569 ?birth. }
OPTIONAL { ?item wdt:P40 ?child. }
OPTIONAL { ?item wdt:P214 ?VIAF_ID. }
OPTIONAL { ?item wdt:P19 ?place_birth. }
OPTIONAL { ?item wdt:P244 ?Library_of_Congress_authority_ID. }
OPTIONAL { ?item wdt:P227 ?GND_ID. }
OPTIONAL { ?item wdt:P245 ?ULAN_ID. }
OPTIONAL { ?item wdt:P26 ?spouse. }
OPTIONAL { ?item wdt:P27 ?country. }
OPTIONAL { ?item wdt:P3342 ?sigperson. }
OPTIONAL { ?item wdt:P102 ?party. }
OPTIONAL { ?item wdt:P1327 ?partner. }
OPTIONAL { ?item wdt:P25 ?mother. }
OPTIONAL { ?item wdt:P22 ?father. }
OPTIONAL { ?item wdt:P3373 ?sibling. }
OPTIONAL { ?item wdt:P1299 ?depictedby. }
OPTIONAL { ?item wdt:P39 ?position. }
FILTER (YEAR(?datedied) >= 1933 )
LIMIT 20000


Link to Sparql Query

Permalink to this post:

Next Posts in this series:

Tracing Jewish Art Collectors and other #LostArtPeople by Place of Death

How information about Jewish art  collectors who died in the Holocaust goes missing in the semantic web of linked data.

Open Art Data