# 2.3.7 工具 | 2021-一个很有想法的工具——Ikarus，想要在单细胞水平直接鉴定肿瘤细胞

## 前言

题目：Identifying tumor cells at the single cell level

日期：2021-10-25

期刊：biorxiv预印本

链接：<https://www.biorxiv.org/content/10.1101/2021.10.15.463909v2>

代码: [https://github.com/BIMSBbioinfo/ikarus](#qian-yan)

图表复现: [https://github.com/BIMSBbioinfo/ikarus---auxiliary](#qian-yan)

## 设计初衷

目前主要有三类细胞注释工具：

* manifold matching software：aligns the shape of an unlabeled dataset to an expertly labeled dataset
* deep learning based tools：model the batch effects through latent space embedding
* gene set based classifiers：使用已知的marker辅助分类

目前使用最多的应该是使用marker gene辅助

作者提出一个问题：能否做出一个分类器（**重点就是选出一系列基因**），可以直接从一群细胞中区分出tumor和normal？这样可以帮助肿瘤抗原的自动预测，帮助检查肿瘤细胞状态，还可以帮助诊断病理组织

## 设计思路

Ikarus工具为了检查肿瘤细胞，设计了2步：

* 首先还是根据比较靠谱的，经过专家审核的单细胞数据集，从中找到一些共有的肿瘤细胞基因signature
* 构建逻辑回归分类器 + network-based propagation of cell labels using a custom built cell-cell network

作者先整合了结直肠癌+肺癌的10X数据，其中数据注释了细胞是否是癌细胞。接下来寻找癌细胞相关的marker gene：

* 每个数据差异分析，然后取交集（**但是不同癌症的marker基因直接取交集是否有意义呢？那些特异性的gene岂不是被抛弃？**）=> 得到162 个交集基因集
* 同理选出正常组织的marker基因，它包含两部分：cell type specific markers & genes which are specifically depleted in the tumor cells

![image-20211028111814469](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-031814.png)

## 验证

找几个数据集验证一下tumor和normal 各自的基因集，找了5种癌症类型的patient-derived xenograft (PDX)、cancer cell line encyclopedia (CCLE)，发现 tumor signature score was significantly higher than the normal signature score

![image-20211028112352974](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-032353.png)

用几个测试数据集比较了不同方法区分tumor和normal细胞，包括标准的机器学习（SVM, random forest, and logistic regression）、SingleCellNet、ACTINN。发现**Ikarus的平均准确度可以达到0.98（A图），并且AUROC也很高**。

使用Lambrechts lung cancer dataset（图DE），看到Ikarus也能基本得到和作者一样的肿瘤细胞分布

![image-20211028122522894](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-042523.png)

### **测试一下如果没有足够的基因，或者缺少某几个关键基因，Ikarus还能预测准确吗**

* 图A：当然，使用的tumor gene 数量越多越准确
* 图B-D：当缺少serum amyloid A (SAA1) 和fibrinogen beta chain (FGB)，会导致准确度大大降低（**也就是说这个工具还是很依赖关键的marker gene的，因此前期还是要筛选合适的marker基因作为输入**）；当然作者说只在Lambrechts这个数据集中发现了这个情况，其他没发现

![image-20211028123705043](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-043705.png)

### **作者认为自己的signature找的很有效，于是看看它们具体有哪些特性**

发现了

* 在各个数据中，这些基因之间的Pearson相关性竟然大部分接近0
* tumor gene signature is partially related to cell cycle, and DNA replication（C图）
* tumor gene signature preferentially overlapped with the cell cycle hallmark（D图）

![image-20211028124223866](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-044223.png)

### **看一下筛出来的基因集和预后的联系**

* **more than 75%** of tumor signature genes are predictive of unfavourable prognosis in **at least one cancer** type
* 在5种癌症中（liver, renal, pancreatic, lung and endometrial）这个基因集对预后不利
* 与CNV： significantly **higher overlap with the known CNV regions** in the majority of profiled cancer types

![image-20211028125011561](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2021-10-28-045011.png)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/02/2.3.7-gong-ju-2021-yi-ge-hen-you-xiang-fa-de-gong-ju-ikarus-xiang-yao-zai-dan-xi-bao-shui-ping-zhi-j.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
