当前位置: 代码迷 >> 综合 >> Skin Cancer MNIST(皮肤癌患者相关数据集)
  详细解决方案

Skin Cancer MNIST(皮肤癌患者相关数据集)

热度:38   发布时间:2024-03-05 22:43:14.0

原文:

Skin Cancer MNIST: HAM10000

a large collection of multi-source dermatoscopic images of pigmented lesions.

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc).

More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (followup), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesionid-column within the HAM10000_metadata file.

The test set is not public, but the evaluation server remains running (see the challenge website). Any publications written using the HAM10000 data should be evaluated on the official test set hosted there, so that methods can be fairly compared.

译:

皮肤癌患者:HAM10000

大量多源皮肤镜图像的色素病变。

人工神经网络用于色素沉着性皮损的自动诊断的训练由于皮肤镜图像数据集的小和缺乏多样性而受到阻碍。我们通过发布HAM10000(“具有10000个训练图像的人对机器”)数据集来解决这个问题。我们收集了不同人群的皮肤镜图像,通过不同的方式采集和存储。最后的数据集由10015张皮肤镜图像组成,可以作为学术机器学习的训练集。病例包括色素性病变领域所有重要诊断类别的代表性集合:光化性角化病和上皮内癌/博文病(akiec)、基底细胞癌(bcc)、良性角化样病变(日光性皮疹/脂溢性角化病和扁平苔藓样角化病,bkl),皮肤纤维瘤(df)、黑色素瘤(mel)、黑色素细胞痣(nv)和血管病变(血管瘤、血管角化瘤、化脓性肉芽肿和出血、血管病变)。

超过50%的病变是通过组织病理学(histo)证实的,其余病例的基本事实要么是随访检查(follow-up)、专家共识(consensition),要么是体内共焦显微镜(confocal)确认。数据集包含多个图像的病灶,这些图像可以通过HAM10000_元数据文件中的lesionid列进行跟踪。

测试集不是公共的,但评估服务器仍在运行(请参阅challenge网站)。任何使用HAM10000数据编写的出版物都应该在官方测试集上进行评估,这样就可以对方法进行公平的比较。

大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020101702”获取下载链接。

 

  相关解决方案