Showing posts with label open data. Show all posts
Showing posts with label open data. Show all posts

Jun 5, 2020

DATASET: Text Analysis Challenge - Detect Looted Art - GLAMhack2020

Download  Provenance Texts Dataset: CSV 

Text Analysis Challenge: Detect Looted Art.

Help Automate Analysis, Flagging and Ranking of Museum Art Provenance Texts by the Probability of a Hidden History

The Question: How to sift through the millions of objects in museums to identify top priorities for intensive research by humans? 

The Goal: Automatically Classify and Rank 60,000+ art provenance texts by probability that further research will turn up a deliberately concealed history of looting, forced sale, theft or forgery. 

The Challenge: Analyse texts quickly for Red Flags, quantify, detect patterns, classify, rank, and learn. Whatever it takes to produce a reliable list of top suspects

For this challenge several datasets will be provided.

1) DATASET: 60,000+ art provenance texts for analysis

(example)[Pierre Matisse Gallery, New York, New York], by 1932, to M. Gutmann, 1936. Maurice Wertheim, by 1937, bequest, to Fogg Art Museum, 1951.;NOTE: Provenance derived from "Degas to Matisse: The Maurice Wertheim Collection," John O'Brian, Harry N. Abrams, New York, 1988.1951.76 Collector, Paris, Said to have been bought directly from artist, 1918.;Maurice Wertheim, Purchased from the Valentine Gallery, 1937, Bequest to Fogg Art Museum, 1951.1951.52 Guillaume, Paris, France, 1925, 1935. By 1930, per caption of photo, Paul Guillaume dining room dated c. 1930, Georgel 2006;Maurice Wertheim, 1937, Bequest to Fogg Art Museum, 1951.1951.51 owner, Paris, sold, [through Hôtel Drouot, Paris, June 16, 1906, no 30.]. Dr. Alfred Wolff, Munich, (1912). Sir Michael Sadler, England, (1912). [De Hauke & Co., New York], sold, to A. Conger Goodyear, New York, (1929-1937) sold, [through Wildenstein && Co., New York];to Maurice Wertheim, New York (1937-1951) bequest, to Fogg Art Museum, 1951.1951.49

2) DATASET: 1000 Red Flag Names


Bignou, EtienneBignouEtienneEtienne Bignou
Billiet, DirectorBillietDirectorDirector Billiet
Binder, Dr. Moritz JuliusBinderDr. Moritz JuliusDr. Moritz Julius Binder
Bing CollectionBing CollectionBing Collection
Birtschansky, ZacharieBirtschanskyZacharieZacharie Birtschansky
Bisson, E.BissonE.E. Bisson
Blanc, PierreBlancPierrePierre Blanc
Bleye, WilliBleyeWilliWilli Bleye
Bloch, Dr. VitaleBlochDr. VitaleDr. Vitale Bloch
Bloch-Bauer Collection
Bloch-Bauer Collection
Bloch-Bauer Collection
Bode, Dr.BodeDr.Dr. Bode
Bodenschatz, General KarlBodenschatzGeneral KarlGeneral Karl Bodenschatz
Boedecker, AlfredBoedeckerAlfredAlfred Boedecker
Boehler, Julius, Jr.BoehlerJuliusJulius Boehler
Boehler, Julius, Sr.BoehlerJuliusJulius Boehler
Boehm, Dr. FranzBoehmDr. FranzDr. Franz Boehm
Boehmer, BernhardBoehmerBernhardBernhard Boehmer

3) DATESET:  Red Flag Words or Phrases

likelyprivate collector?transfertelephone
probablyanonymous[removedto at least
possiblyart marketuntil at least
maybeunidentifiedby 19
?unknownbefore 19
property of a European collectoraccording to
private collection
property of a lady

You're the doctor and the texts are your patients! Who's in good health and who's sick? How sick? With what disease? What kind of tests and measurements can we perform on the texts to help us to reach a diagnosis? What kind of markers for should we look for? What kind of patterns? 

What digital methods can we use - and put into practice during the Hackathon - to "diagnose" these texts and prioritize them for "treatment" (ie, additional provenance research)?

  • IDENTIFY Red Flag Names and Words in each Text?
  • COUNT Red Flag Names and Words in each Text?
  • CHARACTERIZE each Text (number of words? sentiment? completeness v gaps? other features to be identified that may be useful)?
  • ANALYZE for patterns, links and networks?
  • CALCULATE probability that the provenance conceals a Nazi-era history that will prove problematic if investigated in detail
  • RANK according to urgency for further in-depth provenance research 

(Voyant-Tools Whitelist containing names: 
keywords-14ca7131716f24c62c6529fcc143bbd2  )

What might a successful result look like?

  • A list of 50 provenances from the DATASET ranked most likely to conceal looted art
  • A color-coded evaluation of each provenance (RED, ORANGE, GREEN) by likelihood of concealing looted art
  • Instructions how to analyze the provenances with the tools, functions or code to use  (for example, how to use Voyant-Tools to count all the Red Flag Names and inject the result back into the spreadsheet)
  • Ideas for going further....

Triage: "assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients or casualties."

Link to Glamhack2020 project for participants

Help us to test and improve and test and improve the code!

Link to Code on Github

Issues to resolve:
- While the word list counts and general name extraction seem to work pretty well, reconciliation with the list of 1000 Red Flag Names still needs work. To test: A tighter tolerance combined with a more complete listing of alias might help.
- The extraction of transaction years after 1900 gives interesting though incomplete indicators. The decision to exclude dates in parentheses (as they are sometimes biographical dates and not transaction dates) needs to be reviewed and refined. 
(note: The texts are from multiple sources applying multiple formats and deliberately entered exactly as is)

Results RAW (with uncorrected errors for analysis)

Oct 31, 2019

DATASET MOMA Provenance Research Project (PRP) artworks

Dataset name: Enhanced MoMA PRP

Description: This enhanced Provenance dataset has been constructed from  information available on the public internet site of the Museum of Modern Art MoMA. It merges the list of artworks on the MoMa Provenance Research Project page with provenance texts published on the MoMA's detailed item pages. It is intended to facilitate research into Holocaust-era provenance for scholars, art historians and families. 

Original data sources that were merged to create new dataset:

Format: Google Sheet


Download: CSV

1. PRP Artworks with provenance 
(Artist,Title,Date,Medium,Dimensions,URL,Acc_Number,Department,Provenance,Publisher of Provenance,Author of Provenance)
2. About this file
3. Artists Count (pivot table with number of artworks by artists)
4. Department (pivot table with number of artworks by department)
5. Provenance text contains word "private" (pivot table with filter)
6. Provenance text contains name "Valentin" (pivot table with filter)
7. Provenance text contains the word "probably" (pivot table with filter)

Publisher: OAD

Date of Publication: October 31, 2019

Example of content: Provenance text contains word "private"

Selection of artworks listed on the MoMA Provenance Research Project that contain the word "private" in the provenance text published on the museum website

Oct 19, 2019

Art Museums in France

How open are French art museums concerning the acquisition history and provenance of the artworks in their collections? 
Art museums in France

On Sunday, October 20, 2019 in Paris, as part of the lecture series on the French Art Market Under the Occupation, an international panel of experts will speak on Art Restitutions in France as seen from Abroad. Reservations online.

Speakers include Wesley Fisher, Agnes Peresztegi, Emmanuelle Polack, Anne Webber, and David Zivie with moderator Philippe Dagen

Oct 10, 2019

Nazi Looted Art and the Fight for Open Data


In 2000 the American newspaper Chicago Tribune examined the struggle to publish long hidden Holocaust-related art looting archives. 

The article, "KEY TO ART NAZIS STOLE MAY BE LOCKED AWAY"written by journalist and history professor Ron Grossman, recounts the struggle to provide open access to:

 a massive cache of World War II records documenting Nazi looting of works by some of the greatest artists in history 
The context of the article is that a United States government commission on Holocaust reparations is preparing to issue its final report, and there is fear that these crucial archives, which had been marked classified and locked away, will remain inaccessible despite the efforts of the Presidential Commission. The Commission is planning to publish a public database, but there are problems. Additional government funding is needed. A deadline looms.

Oct 2, 2019

Vlug Report: transcription of Part 1


The Vlug Report, written in December of 1945 and named after its author, the Dutch art historian Jean Vlug, is an important work of investigation into the massive looting of artworks in Holland during World War II.

The Vlug report details the activities of the Dienststelle Mühlmann, which obtained works of art for Hitler, Göring and other Nazis.
The author was Jean Vlug, who served in the Royal Netherlands Army. He was a Dutch "Monuments Men" and investigator with the Art Looting Investigation Unit. His report, marked confidential and unavailable for decades, contains interviews with Nazi art looters as well of lists of artworks.
The following transcription concerns the first fifty pages of Vlug's report. The transcription is a work in progress. Please indicate any errors in the comments. Thank you.
For more information about the Vlug report, please see
The National Archives have published photographs of the Vlug report online at Fold3.
The photo above is from The Monuments Men Foundation website which honors Jan Vlug and requests more information about him.

Jun 6, 2017

Publishing open linked data: Tableminer and Austrian data portals

In this interesting paper, Tomas Knap looks at using Tableminer+ to help transform CSV files from Austrian data portals, and, into Linked Open Data.

"To leverage CSV files to Linked Data3 , it is necessary to 1) classify CSV columns based on its content and context against existing knowledge bases 2) assign RDF terms (HTTP URLs, blank nodes and literals) to the particular cell values according to Linked Data principles (HTTP URL identifiers may be reused from one of the existing knowledge bases), 3) discover relations between columns based on the evidence for the relations in the existing knowledge bases, and 4) convert CSV data to RDF data properly using data types, language tags, well-known Linked Data vocabularies, etc" - 
from: "Increasing Quality of Austrian Open Data by Linking them to Linked Data Sources: Lessons Learned?" by Tomas Knap, Charles University in Prague, Faculty of Mathematics and Physics,  Czech Republic, Semantic Web Company, Vienna, Austria
Read more: at