NLP Work

I don’t remember how often I heard “natural language is vague and ambiguous” during my studies. Just believe me when I’m telling you, it is one of the most often used phrase in computational linguistic lectures – and there were many examples, and I  never doubt it.


But – always these buts – when I started working for Balzano Informatik AG starting with the ScanDiags project and working on the radiology reports, I thought “medical reports should have a clear language” and “my only challenge will be different structures”. Well… No, at this point, I think the structure is my smallest issue on my long list of challenges. 

Radiology reports, as any other natural language text, is ambiguous and sometimes vague. Even worse, they are written in German and this language loves the variety and allows, for example, to string words together to create new word. And that is done a lot in the reports – and on top of it, they also use sometime the English words (e.g. “disk” instead of the German word “Bandscheibe”). 

Let me give you a small insight of the blog post:

Did you ever wonder why computational linguists always say “natural language is ambiguous and various,” as though that is an explanation why their tasks are complicated? I mean, I get it: There are many different languages and they have different structures and rules. But extracting pathologies from anonymized medical reports shouldn’t be that difficult. You know, because pathologies have official, medical names. For example, “spinal disk herniation” is just that. Perhaps you need to check for the Latin name “prolapsus disci intervertebralis” too – because in medicine everyone loves Latin. But that’s it, right? Hmm... unless they’re pressed for time and shorten it to “disk herniation.” I could imagine that, because it is obvious that a disk herniation affects the spine and not the knee… And now that I think about it, I saw some other versions, like ”herniated disk” or “slipped disk.” Oh, and in British English you would write “disc” with a ‘c’ instead of a ‘k’! Never mind, I start to see why natural language is called “various.”

