# 4.12 实战十二 | 10X | 小鼠嵌合体胚胎

## 1 前言

数据来自[Pijuan-Sala et al. (2019)](https://pubmed.ncbi.nlm.nih.gov/30787436/)，研究的是小鼠E8.5发育阶段的嵌合胚胎

**数据准备**

```r
# 自己下载
library(MouseGastrulationData)
sce.chimera <- WTChimeraData(samples=5:10)

# 或者加载之前分享的数据
load('sce.chimera.RData')
sce.chimera
## class: SingleCellExperiment 
## dim: 29453 20935 
## metadata(0):
## assays(1): counts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## altExpNames(0):

names(colData(sce.chimera))
# [1] "cell"            "barcode"        
# [3] "sample"          "stage"          
# [5] "tomato"          "pool"           
# [7] "stage.mapped"    "celltype.mapped"
# [9] "closest.cell"    "doub.density"
```

**简单看一下colData中的各个信息**

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-135013.png)

其中包含了6个样本的信息，总共20935个细胞

```r
table(sce.chimera$sample)
# 
# 5    6    7    8    9   10 
# 2298 1026 2740 2904 4057 6401
```

**整合行名**

```r
library(scater)
rownames(sce.chimera) <- uniquifyFeatureNames(
    rowData(sce.chimera)$ENSEMBL, rowData(sce.chimera)$SYMBOL)
```

## 2 简单质控

之前作者已经对数据进行了质控，并把细胞做了标志，这里只需要把标记“stripped”、“Doublet”的细胞去掉即可

```r
drop <- sce.chimera$celltype.mapped %in% c("stripped", "Doublet")
table(drop)
# drop
# FALSE  TRUE 
# 19426  1509 
sce.chimera <- sce.chimera[,!drop]
```

## 3 归一化

看到原来数据中也计算了size factors，那么这里就不需要计算，直接应用

```r
sce.chimera <- logNormCounts(sce.chimera)
```

## 4 找表达量高变化基因

我们的数据有6个样本，可以说异质性非常高了。把它们当做不同的批次信息，并尽可能多地从中保存基因

```r
library(scran)
dec.chimera <- modelGeneVar(sce.chimera, block=sce.chimera$sample)
chosen.hvgs <- dec.chimera$bio > 0
table(chosen.hvgs)
# chosen.hvgs
# FALSE  TRUE 
# 14754 14699
```

## 5 数据整合并矫正批次效应

使用了一种“层次整合”的方法，就是先将同种表型样本整合起来（比如3个处理和3个对照先内部整合），再将不同表型的样本组合（将处理和对照整合）

> correctExperiments的含义是：Apply a correction to multiple SingleCellExperiment objects,

```r
library(batchelor)
set.seed(01001001)
# 下面的merge.order就设置了整合的顺序
merged <- correctExperiments(sce.chimera, 
    batch=sce.chimera$sample, 
    subset.row=chosen.hvgs,
    PARAM=FastMnnParam(
        merge.order=list(
            list(1,3,5), # WT (3 replicates)
            list(2,4,6)  # td-Tomato (3 replicates)
        )
    )
)
```

看下结果：**`lost.var` 值越大表示丢失的真实生物异质性越多**

```r
metadata(merged)$merge.info$lost.var
##              5         6         7         8        9       10
## [1,] 0.000e+00 0.0204433 0.000e+00 0.0169567 0.000000 0.000000
## [2,] 0.000e+00 0.0007389 0.000e+00 0.0004409 0.000000 0.015474
## [3,] 3.090e-02 0.0000000 2.012e-02 0.0000000 0.000000 0.000000
## [4,] 9.024e-05 0.0000000 8.272e-05 0.0000000 0.018047 0.000000
## [5,] 4.321e-03 0.0072518 4.124e-03 0.0078280 0.003831 0.007786
```

Large proportions of lost variance **(>10%)** suggest that correction is **removing genuine biological heterogeneity.**

## 6 聚类

```r
g <- buildSNNGraph(merged, use.dimred="corrected")
clusters <- igraph::cluster_louvain(g)
colLabels(merged) <- factor(clusters$membership)
```

**看分群与细胞类型之间关系**

```r
tab <- table(Cluster=colLabels(merged), Sample=merged$sample)
library(pheatmap)
pheatmap(log10(tab+10), color=viridis::viridis(100))
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-142727.png)

## 7 降维

```r
merged <- runTSNE(merged, dimred="corrected")
merged <- runUMAP(merged, dimred="corrected")

gridExtra::grid.arrange(
  plotTSNE(merged, colour_by="label", text_by="label", text_col="red"),
  plotTSNE(merged, colour_by="batch"),
  ncol=2
)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-143149.png)
