Using technology to unveil the past

Published on Wednesday, 31 July 2024 by Russ Cam

Tracing one's family history can be an emotionally charged but rewarding journey, especially when it involves piecing together fragments from tumultuous times. My neighbour, who is of Yugoslavian/Serbian descent, embarked on such a quest to uncover what happened to her great-grandfather following the Second World War. Her great-grandfather had been displaced during the war and was believed to have died in a displacement camp in Austria. The search for answers led her to various archives and eventually to a significant breakthrough, with the help of technology.

The Displacement Camp in Austria

By the spring of 1945, approximately eight million people had been displaced from their homes by the war. By autumn 1945, the Allies had repatriated six to seven million of these people. The one million or so people who remained (approximately 250,000 of which were Jewish) became a somewhat more permanent problem in postwar Europe (primarily in Germany). To care and provide a temporary residence for these people, Displaced Persons’ (DP) camps were created.

Many of the DP camps were set up on the sites of former concentration camps or military barracks. Conditions inside were often unsanitary due to severe overcrowding and a lack of supplies in the post-war period.

Displacement camps were a grim reality for many families torn apart by the war. These camps housed people who had been uprooted from their homes and were trying to find their way in the post-war world. My neighbour's great-grandfather was one of these unfortunate souls. After being displaced from Yugoslavia, he found himself in one such camp in Austria. His fate, like that of many others, remained shrouded in mystery for decades.

A Letter from the Austrian Government

The journey to uncover the past took a significant turn when my neighbour received a letter from the Upper Austrian State Archives:

Letter from Upper Austrian State Archives
Letter from Upper Austrian State Archives

This letter contained a link to scanned documents pertaining to her great-grandfather, sparking a mix of emotions - hope, anxiety, and a deep sense of connection to her heritage. The documents were official records from the late 1950s, originating from the Upper Austrian State Archives. Here are a few examples:

Example document from Upper Austrian State Archives
Another example document from Upper Austrian State Archives

The excitement of having these documents in her possession was palpable, but there was a challenge: the documents were all written in German, which posed a language barrier.

Harnessing Technology for Translation

My neighbour doesn't speak German and so was not able to fully understand the historical and bureaucratic terminology used in the documents. After some tepid offers to help translate the documents on the local Facebook community pages, I wondered how effective the use of Large Language Models (LLMs) might be to bridge the gap between the past and present. Specifically, I wondered how well OpenAI's Chat Completions API and GPT-4o model would perform in translating the documents from German to English.

I put together the following small program for the task:

using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

class Program
{
    private static readonly HttpClient client = new();
    private const string ApiUrl = "https://api.openai.com/v1/chat/completions";

    static async Task Main(string[] args)
    {
        var apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");    
        client.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");
        
        foreach (var file in Directory.EnumerateFiles("documents", "*.jpg"))
        {
            var image = Convert.ToBase64String(await File.ReadAllBytesAsync(file));
            var messages = new JArray
            {
                new JObject
                {
                    { "role", "system" },
                    { "content", "You are a helpful German translator. You only output direct, " + 
                                 "verbatim, word for word, German to English translation, and nothing else." }
                },
                new JObject
                {
                    { "role", "user" },
                    { "content", new JArray
                        {
                            new JObject
                            {
                                { "type", "text" },
                                { "text", "Translate the following image from German to English." }
                            },
                            new JObject
                            {
                                { "type", "image_url" },
                                { "image_url", new JObject { { "url", $"data:image/jpg;base64,{image}" } } }
                            }
                        }
                    }
                }
            };

            Console.WriteLine($"processing {file}");
            var (content, finishReason) = await GetChatCompletionAsync(messages);
            var contents = new List<string> { content };

            while (finishReason == "length")
            {
                Console.WriteLine($"more processing {file}");
                messages.Add(new JObject { { "role", "assistant" }, { "content", content } });
                (content, finishReason) = await GetChatCompletionAsync(messages);
                contents.Add(content);
            }

            await File.WriteAllLinesAsync(Path.ChangeExtension(file, "txt"), contents);
            Console.WriteLine($"processed {file}");
        }
    }

    private static async Task<(string, string)> GetChatCompletionAsync(JArray messages)
    {
        var requestBody = new JObject
        {
            { "model", "gpt-4o" },
            { "temperature", 0.2 },
            { "messages", messages }
        };

        var requestContent = new StringContent(JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json");
        var response = await client.PostAsync(ApiUrl, requestContent);
        var responseContent = await response.Content.ReadAsStringAsync();
        var responseObject = JObject.Parse(responseContent);
        var content = responseObject["choices"][0]["message"]["content"].ToString();
        var finishReason = responseObject["choices"][0]["finish_reason"].ToString();
        
        return (content, finishReason);
    }
}

The code does the following:

  1. enumerates through the scanned JPG document files in a directory
  2. sets the behaviour of the system using the system role to provide verbatim translation
  3. prompts the assistant to translate the image from German to English
  4. base64 encodes the JPG in order to send it to OpenAI
  5. sets the temperature to 0.2 to keep the output more focused and deterministic
  6. sends the request and checks if the finish reason is length, which indicates the model exceeded the max context length, and may have more useful content to provide. If this is the case, the response is appended to the messages sent so far for this JPG document, and another request is made

In all, there were 78 scanned documents to translate. It's quite amazing that such a small amount of code (albeit with a very large and prohibitively expensive model on the receiving end 😃) can orchestrate optical character recognition (OCR) and machine translation so easily. What's also interesting is that the outputted text formatting aligns with the documents; the last example image translated to:

**Brzak Milan**

**Additional Personal Information:**

1) No pre-military training,
   served in the First World War from 1916 to 1918 in the 5th Honved Regiment in Subotica,
   1919 to 1921 in the 11th Art. Reg., 1st Division, 1st Det., in Skopje,
   Rank: Soldier,

2) does not belong to any political party,

3) no foreign travels,

4) On 3.3.1957 at 18:30 left his apartment together with SYNDIC Mikola by train to Marien
   Sobota, where he arrived on 4.3.1957 at 22:00 and stayed overnight in the "Sreine Hotel"
   named Sveda. On 12.3.1957 at 14:30 left there with the train to Maribor, from there on
   foot to the Austrian border and crossed it at 20:00 in the municipal area of Tauka, 
   Bez. Jennersdorf, Burgenland, undetected. Subsequently arrested in Gend. Reported
   position in Tauka,

5) without border guide,

6) Parents:
   Brzak Isidor, born. Date unknown, died in 1911,
   Brzak Tada, born. Date unknown, died in 1905,

7) Siblings:
   Brzak Sima, born 1893 in Novi Sad, farm worker, fell in 1914,
   Brzak Gjorgis, born 1905 in Novi Sad, died 1924,
   Brzak Gordana, born 1899 in Novi Sad, died in 1918,

**15.3.1957**

In total, it took about 20 minutes to process all 78 documents, followed by a summarization step to produce a chronological series of events.

Discovering the Past

The translated documents revealed valuable information about my neighbour's great-grandfather, Milan Brzak.

Milan Brzak was born in Novi Sad, Yugoslavia (modern day Serbia) on July 26, 1897. He was a locksmith and machinist, and married his wife, Vida Damjanov, in 1916. Milan's parents were Isidor Brzak, and Tada Brzak (née Curcic), who unfortunately died when Milan was relatively young, in 1911 and 1905, respectively. Milan had three siblings:

  • Sima Brzak
    • Born 1893 in Novi Sad
    • Died 1914
  • Gordana Brzak
    • Born 1899 in Novi Sad
    • Died 1918
  • Gjorgis Brzak
    • Born 1905 in Novi Sad
    • Died 1924

and five children:

  • Ognyanka Brzak
  • Stojanka Brzak
  • Milovan Brzak
    • Born 1925 in Novi Sad
    • Locksmith and mechanic
    • Last resided in Novi Sad, Vodnikova 14 wh.
  • Predrag Brzak
    • Born 1927 in Novi Sad
    • Mechanic
    • Immigrated to Australia
  • Dragojub Brzak

The documents don't reveal much about Milan's movements during the First and Second World Wars; He served in the First World War from 1916 to 1918 in the 6th Honved Regiment in Subotica (modern day Serbia), 1919 to 1921 in the 11th Artillery Regiment, 1st Division, in Skopje (modern day Northern Macedonia). He was a soldier with no pre-military training.

The documents focus on Milan's movements in the 1950's, after the Second World War. Because of poor economic conditions in Yugoslavia and because of political persecution preventing him from being able to work independently since 1950, he illegally crossed the Yugoslav-Austrian border near Tauka, Jennersdorf district, Burgenland, Austria on March 12, 1957, with a desire to immigrate to Australia where his son, Predrag, my neighbour's grandfather, had already emigrated. He was interrogated by the Security Authority in Jennersdorf, and moved to a displacement camp in Asten, Wohnsiedlung 117.

Despite his efforts, Milan Brzak faced bureaucratic hurdles and did not receive an entry permit to Australia, on the grounds that he was too old. Throughout his time in Austria, he moved between various districts and sought to establish a stable life, including attempts to immigrate to other countries, like Sweden. His interactions with Austrian administrative bodies illustrate the challenges faced by refugees in obtaining permits and establishing residency during this period. Milan was given approval to stay in Austria, issued a work permit, and found work as a locksmith.

The last date in the provided records is for a moving registration form from 13 September 1960.

Conclusion

In a world where many families continue to seek answers about their ancestors, advancements in technology open up new tools to deciphering the mystery of old records.


Comments

comments powered by Disqus