New findings suggest that crowd work by dermatology non-experts could be an efficient way to demarcate areas of affected skin in clinical images, for use in research and in training image-recognition computer systems.
Conducted by researchers at the Vanderbilt University Medical Center in Nashville, Tenn., and published online ahead of print in Skin Research and Technology (Feb. 20, 2019), the study was conducted to try and find a way around a limiting factor in dermatology research—the time and expense involved in having experts evaluate and quantify hundreds of photographs.
In a press release, the authors of the study note that research into causes and cures for skin conditions often requires reliably isolating and quantifying the proportion of affected skin in a large number of research subjects. This is typically a task performed by research dermatologists, working on large sets of relevant medical photographs stored by hospitals and clinics, digitally drawing a perimeter around affected areas.
However, “the time and expense involved in having experts endlessly pore over these images is a major impediment, and from one study or one expert to the next the consistency in the application of the relevant visual evaluation scales tends to be poor,” said lead author Dr. Eric Tkaczyk, assistant professor of dermatology and biomedical engineering at Vanderbilt, in the release.
While artificial intelligence computer systems may be able to do some work of this type in the future at a greatly reduced cost, those machine learning systems need to be 'trained' by being provided with large data sets of already analyzed and annotated images.
“A solution for economically generating the needed training sets could streamline research into a host of diseases and conditions, and benefit patient evaluation to boot,” said Dr. Tkaczyk. “We wondered, particularly with today's gig economy, what sorts of results might be achieved by giving non-experts a few pointers and letting them demarcate images in a web interface. How might pooled non-expert evaluations stack up against expert evaluation?”
Dr. Eric Tkaczyk, Photo by Vanderbilt University Medical Center
To test this possibility, Dr. Tkaczyk and his team had 41 three-dimensional images from three patients with chronic graft-versus-host disease (cGVHD) delineated by a board-certified dermatologist. Then, 410 two-dimensional projections of the raw images were each annotated by seven crowd workers—medical students and nurses. The consensus performance of the crowd workers was then compared to the work of the expert.
The researchers evaluated the crowd’s work pixel by pixel, discarding extremes of least or most pixels highlighted. Across the 410 images, the median accuracy of a pixel-by-pixel match with the expert evaluation for the pooled evaluations of four crowd workers was 76%.
“This places this group of crowd workers, as a collective, very much on a par with expert evaluation for cGVHD,” Dr. Tkaczyk said. “Our results establish that crowdsourcing could aid machine learning in this realm, which stands to benefit research and clinical evaluation of this disease.”