Text Analysis Challenge: Detect Looted Art.
Help Automate Analysis, Flagging and Ranking of Museum Art Provenance Texts by the Probability of a Hidden History
The Question: How to sift through the millions of objects in museums to identify top priorities for intensive research by humans?
The Goal: Automatically Classify and Rank 60,000+ art provenance texts by probability that further research will turn up a deliberately concealed history of looting, forced sale, theft or forgery.
The Challenge: Analyse texts quickly for Red Flags, quantify, detect patterns, classify, rank, and learn. Whatever it takes to produce a reliable list of top suspects
For this challenge several datasets will be provided.
1) DATASET:
60,000+ art provenance texts for analysis
(example)
https://www.harvardartmuseums.org/collections/object/296887 | [Pierre Matisse Gallery, New York, New York], by 1932, to M. Gutmann, 1936. Maurice Wertheim, by 1937, bequest, to Fogg Art Museum, 1951.;NOTE: Provenance derived from "Degas to Matisse: The Maurice Wertheim Collection," John O'Brian, Harry N. Abrams, New York, 1988. | 1951.76 |
https://www.harvardartmuseums.org/collections/object/229045 | Private Collector, Paris, Said to have been bought directly from artist, 1918.;Maurice Wertheim, Purchased from the Valentine Gallery, 1937, Bequest to Fogg Art Museum, 1951. | 1951.52 |
https://www.harvardartmuseums.org/collections/object/229044 | Paul Guillaume, Paris, France, 1925, 1935. By 1930, per caption of photo, Paul Guillaume dining room dated c. 1930, Georgel 2006;Maurice Wertheim, 1937, Bequest to Fogg Art Museum, 1951. | 1951.51 |
https://www.harvardartmuseums.org/collections/object/229043 | Unidentified owner, Paris, sold, [through Hôtel Drouot, Paris, June 16, 1906, no 30.]. Dr. Alfred Wolff, Munich, (1912). Sir Michael Sadler, England, (1912). [De Hauke & Co., New York], sold, to A. Conger Goodyear, New York, (1929-1937) sold, [through Wildenstein && Co., New York];to Maurice Wertheim, New York (1937-1951) bequest, to Fogg Art Museum, 1951. | 1951.49 |
2) DATASET:
1000 Red Flag Names
(example)
Bignou, Etienne | Bignou | Etienne | Etienne Bignou |
Billiet, Director | Billiet | Director | Director Billiet |
Binder, Dr. Moritz Julius | Binder | Dr. Moritz Julius | Dr. Moritz Julius Binder |
Bing Collection | Bing Collection | | Bing Collection |
Birtschansky, Zacharie | Birtschansky | Zacharie | Zacharie Birtschansky |
Bisson, E. | Bisson | E. | E. Bisson |
Blanc, Pierre | Blanc | Pierre | Pierre Blanc |
Bleye, Willi | Bleye | Willi | Willi Bleye |
Bloch, Dr. Vitale | Bloch | Dr. Vitale | Dr. Vitale Bloch |
Bloch-Bauer Collection |
| | Bloch-Bauer Collection |
Blot | Blot | | Blot |
Bode, Dr. | Bode | Dr. | Dr. Bode |
Bodenschatz, General Karl | Bodenschatz | General Karl | General Karl Bodenschatz |
Boedecker, Alfred | Boedecker | Alfred | Alfred Boedecker |
Boehler, Julius, Jr. | Boehler | Julius | Julius Boehler |
Boehler, Julius, Sr. | Boehler | Julius | Julius Boehler |
Boehm, Dr. Franz | Boehm | Dr. Franz | Dr. Franz Boehm |
Boehmer, Bernhard | Boehmer | Bernhard | Bernhard Boehmer |
3) DATESET:
Red Flag Words or Phrases
(example)
flaguncertainty | flaganonymity | flagpuncutation | flagmove | flagreliability |
likely | private collector | ? | transfer | telephone |
probably | anonymous | [ | removed | to at least |
possibly | art market | | | until at least |
maybe | unidentified | | | by 19 |
? | unknown | | | before 19 |
| property of a European collector | | | according to |
| private collection | | | |
| property of a lady | | | |
| anon. | |
You're the doctor and the texts are your patients! Who's in good health and who's sick? How sick? With what disease? What kind of tests and measurements can we perform on the texts to help us to reach a diagnosis? What kind of markers for should we look for? What kind of patterns?
What digital methods can we use - and put into practice during the Hackathon - to "diagnose" these texts and prioritize them for "treatment" (ie, additional provenance research)?
- IDENTIFY Red Flag Names and Words in each Text?
- COUNT Red Flag Names and Words in each Text?
- CHARACTERIZE each Text (number of words? sentiment? completeness v gaps? other features to be identified that may be useful)?
- ANALYZE for patterns, links and networks?
- CALCULATE probability that the provenance conceals a Nazi-era history that will prove problematic if investigated in detail
- RANK according to urgency for further in-depth provenance research
(Voyant-Tools Whitelist containing names:
keywords-14ca7131716f24c62c6529fcc143bbd2 )
What might a successful result look like?
- A list of 50 provenances from the DATASET ranked most likely to conceal looted art
- A color-coded evaluation of each provenance (RED, ORANGE, GREEN) by likelihood of concealing looted art
- Instructions how to analyze the provenances with the tools, functions or code to use (for example, how to use Voyant-Tools to count all the Red Flag Names and inject the result back into the spreadsheet)
- Ideas for going further....
Triage: "assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients or casualties."
Link to Glamhack2020 project for participants
https://hack.glam.opendata.ch/project/7
Help us to test and improve and test and improve the code!
Link to Code on Github
https://github.com/parisdata/GLAMhack2020
Issues to resolve:
- While the word list counts and general name extraction seem to work pretty well, reconciliation with the list of 1000 Red Flag Names still needs work. To test: A tighter tolerance combined with a more complete listing of alias might help.
- The extraction of transaction years after 1900 gives interesting though incomplete indicators. The decision to exclude dates in parentheses (as they are sometimes biographical dates and not transaction dates) needs to be reviewed and refined.
(note: The texts are from multiple sources applying multiple formats and deliberately entered exactly as is)