# 4.10 实战十 | CEL-seq | 小鼠造血干细胞

## 1 前言

数据来自[Grun et al. 2016](https://pubmed.ncbi.nlm.nih.gov/27345837/)的小鼠造血干细胞 haematopoietic stem cell (HSC) ，使用的技术是CEL-seq

### **数据准备**

```r
library(scRNAseq)
sce.grun.hsc <- GrunHSCData(ensembl=TRUE)
sce.grun.hsc
# class: SingleCellExperiment 
# dim: 21817 1915 
# metadata(0):
#   assays(1): counts
# rownames(21817): ENSMUSG00000109644
# ENSMUSG00000007777 ... ENSMUSG00000055670
# ENSMUSG00000039068
# rowData names(3): symbol chr originalName
# colnames(1915): JC4_349_HSC_FE_S13_
# JC4_350_HSC_FE_S13_ ...
# JC48P6_1203_HSC_FE_S8_
# JC48P6_1204_HSC_FE_S8_
# colData names(2): sample protocol
# reducedDimNames(0):
#   altExpNames(0):

table(sce.grun.hsc$sample)
# 
# JC20   JC21   JC26   JC27   JC28   JC30   JC32 
# 87     96     85     91     80     96     93 
# JC35   JC36   JC37   JC39    JC4   JC40   JC41 
# 96     80     87     93     84     96     94 
# JC43   JC44   JC45   JC46 JC48P4 JC48P6 JC48P7 
# 92     94     90     96     95     96     94
```

### **ID转换**

```r
library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
anno <- select(ens.mm.v97, keys=rownames(sce.grun.hsc), 
               keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))

# 这里全部对应
> sum(is.na(anno$SYMBOL))
[1] 0
> sum(is.na(anno$SEQNAME))
[1] 0

# 接下来只需要匹配顺序即可
rowData(sce.grun.hsc) <- anno[match(rownames(sce.grun.hsc), anno$GENEID),]

sce.grun.hsc
## class: SingleCellExperiment 
## dim: 21817 1915 
## metadata(0):
## assays(1): counts
## rownames(21817): ENSMUSG00000109644 ENSMUSG00000007777 ...
##   ENSMUSG00000055670 ENSMUSG00000039068
## rowData names(3): GENEID SYMBOL SEQNAME
## colnames(1915): JC4_349_HSC_FE_S13_ JC4_350_HSC_FE_S13_ ...
##   JC48P6_1203_HSC_FE_S8_ JC48P6_1204_HSC_FE_S8_
## colData names(2): sample protocol
## reducedDimNames(0):
## altExpNames(0):
```

## 2 质控

**依然是备份一下，把unfiltered数据主要用在质控的探索上**

```r
unfiltered <- sce.grun.hsc
```

**发现这个数据既没有MT也没有ERCC**

```r
grep('MT',rowData(sce.grun.hsc)$SEQNAME)
# integer(0)
```

能用的数据只有其中的`protocol`了，它表示细胞提取方法

```r
table(sce.grun.hsc$protocol)
# 
# micro-dissected cells 
# 1546 
# sorted hematopoietic stem cells 
# 369 

# 再看一下各个样本与提取方法的对应关系
table(sce.grun.hsc$protocol,sce.grun.hsc$sample)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-111940.png)

根据背景知识，**大部分显微操作（micro-dissected）得到的细胞很多质量都较低**，和我们的质控假设相违背，于是这里就不把它们纳入过滤条件

```r
library(scater)
stats <- perCellQCMetrics(sce.grun.hsc)
# 只用sorted hematopoietic stem cells 计算过滤条件
qc <- quickPerCellQC(stats, batch=sce.grun.hsc$protocol,
    subset=grepl("sorted", sce.grun.hsc$protocol))

colSums(as.matrix(qc))
##   low_lib_size low_n_features        discard 
##            465            482            488

sce.grun.hsc <- sce.grun.hsc[,!qc$discard]
```

**做个图看看**

```r
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- qc$discard

gridExtra::grid.arrange(
    plotColData(unfiltered, y="sum", x="sample", colour_by="discard", 
        other_fields="protocol") + scale_y_log10() + ggtitle("Total count") +
        facet_wrap(~protocol),
    plotColData(unfiltered, y="detected", x="sample", colour_by="discard",
        other_fields="protocol") + scale_y_log10() + 
        ggtitle("Detected features") + facet_wrap(~protocol),
    ncol=1
)
```

可以看到，大多数的显微操作技术得到的细胞文库都比较小，相比于细胞分选方法，它在提取过程中对细胞损伤较大

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-112248.png)

## 3 归一化

使用去卷积方法

```r
library(scran)
set.seed(101000110)
clusters <- quickCluster(sce.grun.hsc)
sce.grun.hsc <- computeSumFactors(sce.grun.hsc, clusters=clusters)
sce.grun.hsc <- logNormCounts(sce.grun.hsc)
```

## 4 找表达量高变化基因

这里没有指定任何的批次，因为想保留这两种技术产生的任何差异

```r
set.seed(00010101)
dec.grun.hsc <- modelGeneVarByPoisson(sce.grun.hsc) 
top.grun.hsc <- getTopHVGs(dec.grun.hsc, prop=0.1)
```

做个图

```r
plot(dec.grun.hsc$mean, dec.grun.hsc$total, pch=16, cex=0.5,
    xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.grun.hsc)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
```

**看到这个线有点“太平缓”**，和之前见过的都不一样，感觉“中间少了一个峰”。**这是因为细胞中的基因表达量都比较低**，差别也不大【大家一起贫穷，于是贫富差距很小】，所以大部分细胞在纵坐标（衡量变化的方差）上体现不出来差距，也就导致了拟合的曲线不会有“峰”

> 可能会想，那为什么不是大家表达量都很高呢（大家都很富有，贫富差距不是也很小吗）？因为横坐标可以看到，从0-3.5，这个范围对于表达量来说确实很小，之前做的图有的都大于10、15

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-113627.png)

## 5 降维聚类

### **降维就采取最基础的方式：**

```r
set.seed(101010011)
sce.grun.hsc <- denoisePCA(sce.grun.hsc, technical=dec.grun.hsc, subset.row=top.grun.hsc)
sce.grun.hsc <- runTSNE(sce.grun.hsc, dimred="PCA")

# 检查PC的数量
ncol(reducedDim(sce.grun.hsc, "PCA"))
## [1] 9
```

### **聚类**

```r
snn.gr <- buildSNNGraph(sce.grun.hsc, use.dimred="PCA")
colLabels(sce.grun.hsc) <- factor(igraph::cluster_walktrap(snn.gr)$membership)

table(colLabels(sce.grun.hsc))
## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 259 148 221 103 177 108  48 122  98  63  62  18
```

### **作图**

```r
short <- ifelse(grepl("micro", sce.grun.hsc$protocol), "micro", "sorted")
gridExtra:::grid.arrange(
    plotTSNE(sce.grun.hsc, colour_by="label"),
    plotTSNE(sce.grun.hsc, colour_by=I(short)),
    ncol=2
)
```

由于没有去除两个技术批次的差异，所以这里分的很开

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-114640.png)

## 6 找marker基因

```r
markers <- findMarkers(sce.grun.hsc, test.type="wilcox", direction="up",
    row.data=rowData(sce.grun.hsc)[,"SYMBOL",drop=FALSE])
```

检查一下cluster6的marker基因

```r
chosen <- markers[['6']]
best <- chosen[chosen$Top <= 10,]
length(best)
# [1] 16

# 将cluster6与其他clusters对比的AUC结果提取出来
aucs <- getMarkerEffects(best, prefix="AUC")
rownames(aucs) <- best$SYMBOL

library(pheatmap)
pheatmap(aucs, color=viridis::plasma(100))
```

看到溶菌酶相关基因（LYZ家族）、Camp、 Lcn2、 Ltf 都上调，表明**cluster6可能是神经元起源细胞**

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-115323.png)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/04/04-10.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
