Voice Recognition in Healthcare: Challenges and Possibilities

Voice Recognition in Healthcare

‍

Voice recognition in healthcare is one of the industry's most exciting and promising development trends. Voice recognition works by recording spoken words, extracting the individual sounds and then matching them to an existing database to infer what is being communicated. This software is designed to operate similarly to a human, with the program able to record a conversation, interpret it, and then match each of its elements to certain data sets and individual speakers (this last element is the defining difference between voice recognition in healthcare and speech recognition).

In healthcare, voice recognition is being used primarily as a dictation replacement, with clinicians leveraging voice recognition technology to streamline their clinical documentation processes. These types of advanced voice recognition tools rely on a host of different AI subsets that allow them to classify medically relevant speech as well as extract meaning from natural speech (known as natural language processing, or NLP).

Get the Free eBook: The Complete Guide to Medical Documentation Solutions

‍

Use Difficulties

The pros and cons of many of these emerging technologies are well-written, but understanding why voice recognition in healthcare faces challenges in the first place is important in shaping our expectations, our designs, and our goals for it — both as producers and consumers.

One of the most common challenges facing voice recognition in healthcare now is that it’s not perfect. In healthcare, the stakes are critically high, and providers know this better than anyone, and much of the conversation and many of the shortcomings center around the technology's current inability to be 100 percent perfect.

As an example, if a clinician is using voice recognition to document a patient who is suffering from hypothyroidism and the tool instead incorrectly interprets and logs hyperthyroidism, that poses a treatment risk to the patient and a malpractice risk to the clinician. Fortunately, the misidentification of information by voice recognition is rare, and it does not represent a foundational problem with the technology, but rather a temporary hurdle that can and will be overcome through rigorous design and proper data mapping standards.

Data Mapping

Data mapping is the process in which certain source data is connected or “mapped” to certain target data categories by defining the relationships between the two. In simple data mapping systems, this process can be set up and monitored by humans. But in more advanced settings, such as voice recognition in healthcare, AI and machine learning can be coupled with human monitoring, and then leveraged to create highly advanced data maps. This is accomplished by creating a trained data model that, with AI and machine learning, can predict the target location of any source data. But, by allowing additional human oversight of these maps, mapping managers can interject into the predictability model and remap certain data criteria which allows the system to get more intelligent and more accurate over time.

To address the rather specific example above regarding hypothyroidism and hyperthyroidism, data map teams might consider developing a system to address phonetically similar medical conditions by forcing the system to automatically flag them whenever they are processed by a voice recognition tool. In a situation like this in healthcare, a care provider might be notified of a potential inconsistency, and be prompted to confirm and sign off on the correct condition as to ensure accuracy, safety and security.

This is a great example of how intentional design practices can mitigate some of the use difficulties that face voice recognition in healthcare. It takes creative thinking, advanced systems, and a degree human oversight to help create a robust system that will eventually be smart enough to run almost entirely autonomously and with extreme accuracy.

Additionally, as these voice recognition systems become more advanced and more accurate, there is a wide range of possibilities in how they can combat some of the more systemic issues facing healthcare.

Combatting Structural Racism in Healthcare

In June, DeepScribe co-founder and COO, Matthew Ko, wrote a guest article for KevinMD in which he unpacks structural racism in health tech. Ko argues that even though technology may help us overcome subconscious clinical bias, it’s important to understand that sometimes the technological tools clinicians use can fall victim to the same embedded, subconscious bias that humans do.

“Some of the state-of-the-art dictation and transcription tools are trained using voices that feature General American English,” he writes. “As a result, they do a poor job recognizing the voices of clinicians who may have immigrated from other parts of the world that speak in a manner accented beyond what the software may recognize as General American English.”

At the end of the day, even the most advanced AI systems are designed by humans, and as such, the inherent, subconscious bias of humans are often mirrored in those systems. It’s the reason we’ve all heard of machine bias and its reflection of racism. And it’s not necessarily because designers are intentionally racist or that they’re bad people, it’s simply a reflection of a lack of design team diversity and rigorous modeling standards.

With voice recognition in healthcare, combatting structural racism requires a calculated approach in which data sets are inclusive and representative. Modern day voice recognition technology needs to be able to synthesize multiple neural network models in order to, in this instance, decipher multiple dialects and accents. If voice recognition in healthcare is only relying on very homogenous samples, the tech will only serve that over-represented group.

Instead, we must design systems that represent an ultra-broad population, with samples and data set training that comes from that entire, ultra-broad population. Accomplishing that will give us a much better chance at combatting structural racism in health tech, specifically as it relates to voice recognition in healthcare.

Emotion Recognition and Depression Detection

This phenomenon has already been discussed, but it’s important to circle back and understand how we might continue to leverage AI and voice recognition to better serve both patients and clinicians.

Studies have found that there are considerable vocal differences between those who are depressed and those who are not. Data from these studies suggests that depressed patients take more time to express themselves, speak with greater hesitancy, and have more pauses in their speech, and the frequency and length of those markers were found to be indicative of depression severity.

Additionally, there is research that suggests correlation between voice cadence and coronary artery disease, and vocal pitch with diseases such as Parkinson’s. It’s still too early to suggest voice recognition will be the key to all under-diagnosed diseases in healthcare, but it’s important to understand how this technology may shape the way we deliver care in the future. And how that same technology may help produce more positive health outcomes for both patients and clinicians.