# 4.13 实战十三 | 10X | 小鼠乳腺上皮细胞

## 1 前言

数据来自[Bach et al. (2017)](https://pubmed.ncbi.nlm.nih.gov/29225342/)，使用的是妊娠期小鼠乳腺上皮细胞 + 10X技术建库

### **数据准备**

```r
library(scRNAseq)
sce.mam <- BachMammaryData(samples="G_1")
sce.mam
# class: SingleCellExperiment 
# dim: 27998 2915 
# metadata(0):
#   assays(1): counts
# rownames: NULL
# rowData names(2): Ensembl Symbol
# colnames: NULL
# colData names(3): Barcode Sample Condition
# reducedDimNames(0):
#   altExpNames(0):
```

### **数据初探**

```r
# 样本信息
sapply(names(colData(sce.mam)), function(x) head(colData(sce.mam)[,x]))
# Barcode              Sample Condition  
# [1,] "AAACCTGAGGATGCGT-1" "G_1"  "Gestation"
# [2,] "AAACCTGGTAGTAGTA-1" "G_1"  "Gestation"
# [3,] "AAACCTGTCAGCATGT-1" "G_1"  "Gestation"
# [4,] "AAACCTGTCGTCCGTT-1" "G_1"  "Gestation"
# [5,] "AAACGGGCACGAAATA-1" "G_1"  "Gestation"
# [6,] "AAACGGGCAGACGCTC-1" "G_1"  "Gestation"
```

### **ID转换**

依然是整合行名 + 添加染色体信息

```r
library(scater)
rownames(sce.mam) <- uniquifyFeatureNames(
    rowData(sce.mam)$Ensembl, rowData(sce.mam)$Symbol)

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
rowData(sce.mam)$SEQNAME <- mapIds(ens.mm.v97, keys=rowData(sce.mam)$Ensembl,
    keytype="GENEID", column="SEQNAME")

# 总共有13个线粒体基因
sum(grepl("MT",rowData(sce.mam)$SEQNAME))
# [1] 13
```

## 2 质控

**依然是备份一下，把unfiltered数据主要用在质控的探索上**

```r
unfiltered <- sce.mam
```

使用线粒体信息进行过滤

```r
is.mito <- rowData(sce.mam)$SEQNAME == "MT"
stats <- perCellQCMetrics(sce.mam, subsets=list(Mito=which(is.mito)))
qc <- quickPerCellQC(stats, percent_subsets="subsets_Mito_percent")

colSums(as.matrix(qc))
##              low_lib_size            low_n_features high_subsets_Mito_percent 
##                         0                         0                       143 
##                   discard 
##                       143
```

作图

```r
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- qc$discard

gridExtra::grid.arrange(
    plotColData(unfiltered, y="sum", colour_by="discard") + 
        scale_y_log10() + ggtitle("Total count"),
    plotColData(unfiltered, y="detected", colour_by="discard") + 
        scale_y_log10() + ggtitle("Detected features"),
    plotColData(unfiltered, y="subsets_Mito_percent", 
        colour_by="discard") + ggtitle("Mito percent"),
    ncol=3
)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-150405.png)

再看看线粒体含量与文库大小的关系

```r
plotColData(unfiltered, x="sum", y="subsets_Mito_percent", 
    colour_by="discard") + scale_x_log10()
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-150514.png)

**最后过滤**

```r
dim(unfiltered);dim(sce.mam)
# [1] 27998  2915
# [1] 27998  2772
```

## 3 归一化

使用去卷积的方法

```r
library(scran)
set.seed(101000110)
clusters <- quickCluster(sce.mam)
sce.mam <- computeSumFactors(sce.mam, clusters=clusters)
sce.mam <- logNormCounts(sce.mam)
```

## 4 找高变异基因

这里由于是10X的数据，所以会有UMI信息，因此可以用基于泊松分布的模型构建方法

```r
set.seed(00010101)
dec.mam <- modelGeneVarByPoisson(sce.mam)
top.mam <- getTopHVGs(dec.mam, prop=0.1)
```

最后做个图

```r
plot(dec.mam$mean, dec.mam$total, pch=16, cex=0.5,
    xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.mam)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-150855.png)

## 5 降维聚类

### **降维**

```r
library(BiocSingular)
set.seed(101010011)
sce.mam <- denoisePCA(sce.mam, technical=dec.mam, subset.row=top.mam)
sce.mam <- runTSNE(sce.mam, dimred="PCA")

# 检查PC的数量
ncol(reducedDim(sce.mam, "PCA"))
## [1] 15
```

### **聚类**

> 有一个很重要的参数是`k` ，含义是：the number of nearest neighbors used to construct the graph。如果k设置越大，得到的图之间联通程度越高，cluster也越大。因此这个参数也是可以不断尝试的

我们这里由于细胞数量比较多，所以设置的k就比较大，得到的cluster就少而大

```r
snn.gr <- buildSNNGraph(sce.mam, use.dimred="PCA", k=25)
colLabels(sce.mam) <- factor(igraph::cluster_walktrap(snn.gr)$membership)

table(colLabels(sce.mam))
## 
##   1   2   3   4   5   6   7   8   9  10 
## 550 799 716 452  24  84  52  39  32  24
```

### 作图

```r
plotTSNE(sce.mam, colour_by="label")
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-151128.png)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/04/04-13.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
