Balzano Informatik AG
The more the merrier!

NLP Work

A few months ago, I started to work at Balzano Informatik AG. Our big project at the moment - which I’m working on - is called ScanDiags. My part is to work with the radiology reports: trying to identify which pathologies apply to which body parts. Sounds easy, but there are many challenges which have to be addressed during developing an labeling (or extracting) pipeline. 


I was asked if I would be interested to write a blog post - or perhaps more over time - about the NLP challenges we have to face. Today, the first blog post was published - and I promise, more will follow!


Let me give you a small insight of the blog post:

For our specific problem of identifying pathologies and other important issues on MRIs, this means we want to have as many MRIs as possible to train our networks. But to efficiently train our networks for specific pathologies, the network needs to know what pathologies are visible on the MRIs we provide during the training steps. This means if we give an MRI as input during the training for disk herniations, we need to tell the network for each single MRI if a disk herniation is visible or not. Easy right? Theoretically perhaps, but in practice, not so much. Although we can collaborate with different hospitals to collect more MRI data, the information about the pathologies is written down in an unstructured, continuous text called “radiology report.” Problem is, the networks we want to train to identify pathologies do not understand radiology reports. Therefore, we need to extract the information about pathologies out of these continuous texts and provide it in a structured form to the network. We call this process “label extraction” and use many different natural language processing (NLP) methods to come up with a powerful pipeline which takes the radiology reports as the input and gives labels in an extremely structured way as an output.


Read the full blog post on the ScanDiags webpage.