# 4.12 实战十二 | 10X | 小鼠嵌合体胚胎

## 1 前言

数据来自[Pijuan-Sala et al. (2019)](https://pubmed.ncbi.nlm.nih.gov/30787436/)，研究的是小鼠E8.5发育阶段的嵌合胚胎

**数据准备**

```r
# 自己下载
library(MouseGastrulationData)
sce.chimera <- WTChimeraData(samples=5:10)

# 或者加载之前分享的数据
load('sce.chimera.RData')
sce.chimera
## class: SingleCellExperiment 
## dim: 29453 20935 
## metadata(0):
## assays(1): counts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## altExpNames(0):

names(colData(sce.chimera))
# [1] "cell"            "barcode"        
# [3] "sample"          "stage"          
# [5] "tomato"          "pool"           
# [7] "stage.mapped"    "celltype.mapped"
# [9] "closest.cell"    "doub.density"
```

**简单看一下colData中的各个信息**

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-135013.png)

其中包含了6个样本的信息，总共20935个细胞

```r
table(sce.chimera$sample)
# 
# 5    6    7    8    9   10 
# 2298 1026 2740 2904 4057 6401
```

**整合行名**

```r
library(scater)
rownames(sce.chimera) <- uniquifyFeatureNames(
    rowData(sce.chimera)$ENSEMBL, rowData(sce.chimera)$SYMBOL)
```

## 2 简单质控

之前作者已经对数据进行了质控，并把细胞做了标志，这里只需要把标记“stripped”、“Doublet”的细胞去掉即可

```r
drop <- sce.chimera$celltype.mapped %in% c("stripped", "Doublet")
table(drop)
# drop
# FALSE  TRUE 
# 19426  1509 
sce.chimera <- sce.chimera[,!drop]
```

## 3 归一化

看到原来数据中也计算了size factors，那么这里就不需要计算，直接应用

```r
sce.chimera <- logNormCounts(sce.chimera)
```

## 4 找表达量高变化基因

我们的数据有6个样本，可以说异质性非常高了。把它们当做不同的批次信息，并尽可能多地从中保存基因

```r
library(scran)
dec.chimera <- modelGeneVar(sce.chimera, block=sce.chimera$sample)
chosen.hvgs <- dec.chimera$bio > 0
table(chosen.hvgs)
# chosen.hvgs
# FALSE  TRUE 
# 14754 14699
```

## 5 数据整合并矫正批次效应

使用了一种“层次整合”的方法，就是先将同种表型样本整合起来（比如3个处理和3个对照先内部整合），再将不同表型的样本组合（将处理和对照整合）

> correctExperiments的含义是：Apply a correction to multiple SingleCellExperiment objects,

```r
library(batchelor)
set.seed(01001001)
# 下面的merge.order就设置了整合的顺序
merged <- correctExperiments(sce.chimera, 
    batch=sce.chimera$sample, 
    subset.row=chosen.hvgs,
    PARAM=FastMnnParam(
        merge.order=list(
            list(1,3,5), # WT (3 replicates)
            list(2,4,6)  # td-Tomato (3 replicates)
        )
    )
)
```

看下结果：**`lost.var` 值越大表示丢失的真实生物异质性越多**

```r
metadata(merged)$merge.info$lost.var
##              5         6         7         8        9       10
## [1,] 0.000e+00 0.0204433 0.000e+00 0.0169567 0.000000 0.000000
## [2,] 0.000e+00 0.0007389 0.000e+00 0.0004409 0.000000 0.015474
## [3,] 3.090e-02 0.0000000 2.012e-02 0.0000000 0.000000 0.000000
## [4,] 9.024e-05 0.0000000 8.272e-05 0.0000000 0.018047 0.000000
## [5,] 4.321e-03 0.0072518 4.124e-03 0.0078280 0.003831 0.007786
```

Large proportions of lost variance **(>10%)** suggest that correction is **removing genuine biological heterogeneity.**

## 6 聚类

```r
g <- buildSNNGraph(merged, use.dimred="corrected")
clusters <- igraph::cluster_louvain(g)
colLabels(merged) <- factor(clusters$membership)
```

**看分群与细胞类型之间关系**

```r
tab <- table(Cluster=colLabels(merged), Sample=merged$sample)
library(pheatmap)
pheatmap(log10(tab+10), color=viridis::viridis(100))
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-142727.png)

## 7 降维

```r
merged <- runTSNE(merged, dimred="corrected")
merged <- runUMAP(merged, dimred="corrected")

gridExtra::grid.arrange(
  plotTSNE(merged, colour_by="label", text_by="label", text_col="red"),
  plotTSNE(merged, colour_by="batch"),
  ncol=2
)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-21-143149.png)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/04/04-12.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
