# 3.12 细胞轨迹推断

## 1 前言

接触单细胞数据，相信你一定听过：轨迹推断，英文名词是： Trajectory Analysis。和它相关的另一个名词是：拟时序分析（pseudotime），指的是细胞沿着这个轨迹，并且对潜在的生物活动进行量化。注意这里看字面意思就知道，并不是指真正的时间，而是指细胞与细胞之间的更替、转化的顺序或者是轨迹，可以理解为“一个连续过程的缩影”。

不同的生物过程对应的“拟时序”也是不同的：

![图片来自：https://www.youtube.com/watch?v=XmHDexCtjyw](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-070704.png)

许多生物过程都伴随着细胞状态的连续性变化，比如研究发育就会经常使用到。我们可以利用单细胞数据在高维空间画一条线，贯穿于多种细胞状态。最简单是点到点的一条路径，更复杂的还有一个点出发再生成多个分支。

看一下[现在做相关分析的工具](https://broadinstitute.github.io/2019_scWorkshop/pseudotime-cell-trajectories.html)：

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-064042.png)

自2014年以后，已经开发出了超50种方法，那么选择何种方法进行分析成为了一个难题，因为我们不可能每一种都试一下，但有评测文章发现：[**Slingshot**](https://bioconductor.org/packages/release/bioc/vignettes/slingshot/inst/doc/vignette.html)**、**[**TSCAN**](https://www.bioconductor.org/packages/release/bioc/vignettes/TSCAN/inst/doc/TSCAN.pdf)**、**[**Monocle DDRTree**](http://cole-trapnell-lab.github.io/monocle-release/docs_mobile/)**这几种方法都不错**

> 当然还有其他的几种方法值得推荐，还做成了一个流程图方便查阅 如果对评测感兴趣，可以看看[dynverse](https://github.com/dynverse/dynverse)

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-064932.png)

### 资源整理

* ELIXIR-SE 公开课视频推荐：[Trajectory inference analysis of scRNA-seq data from ELIXIR-SE](https://www.youtube.com/watch?v=XmHDexCtjyw) \
  他们也有[线上课程资料](https://github.com/NBISweden/excelerate-scRNAseq)
* Anthony Gitter整理了一份[轨迹推断工具的清单](https://github.com/agitter/single-cell-pseudotime)

### **做这个分析之前，最好先问几个问题：**

* 确定数据会体现发育轨迹吗？也就是研究的样本是不是和发育相关的？
* 数据中的细胞会体现出中间态吗？
* 是否认为轨迹会出现分支？

### **并且要注意：**

* 任何数据都可以强行画出轨迹，但不一定都有生物学意义！
* 先要保证目前找到的HVGs和降维结果符合我们的预期，才能继续向下分析&#x20;

## 2 学习Slingshot

它需要两个必须的输入文件：降维结果与细胞分群结果

因为它分析的基础假设就是：在低维空间上，细胞的位置是连续的并且是一个接一个的

### **2.1 数据准备**

```r
means <- rbind(
    # non-DE genes
    matrix(rep(rep(c(0.1,0.5,1,2,3), each = 300),100),
        ncol = 300, byrow = TRUE),
    # early deactivation
    matrix(rep(exp(atan( ((300:1)-200)/50 )),50), ncol = 300, byrow = TRUE),
    # late deactivation
    matrix(rep(exp(atan( ((300:1)-100)/50 )),50), ncol = 300, byrow = TRUE),
    # early activation
    matrix(rep(exp(atan( ((1:300)-100)/50 )),50), ncol = 300, byrow = TRUE),
    # late activation
    matrix(rep(exp(atan( ((1:300)-200)/50 )),50), ncol = 300, byrow = TRUE),
    # transient
    matrix(rep(exp(atan( c((1:100)/33, rep(3,100), (100:1)/33) )),50), 
        ncol = 300, byrow = TRUE)
)
counts <- apply(means,2,function(cell_means){
    total <- rnbinom(1, mu = 7500, size = 4)
    rmultinom(1, total, cell_means)
})
rownames(counts) <- paste0('G',1:750)
colnames(counts) <- paste0('c',1:300)
> counts[1:4,1:4]
   c1 c2 c3 c4
G1  0  1  2  0
G2  2  3  2  5
G3  3  8  5  4
G4  5 16  6  8
> dim(counts)
[1] 750 300

# 构建一个sce对象
sim <- SingleCellExperiment(assays = List(counts = counts))
> sim
class: SingleCellExperiment 
dim: 750 300 
metadata(0):
assays(1): counts
rownames(750): G1 G2 ... G749 G750
rowData names(0):
colnames(300): c1 c2 ... c299 c300
colData names(0):
reducedDimNames(0):
altExpNames(0):
```

### **2.2 基因过滤**

```r
geneFilter <- apply(assays(sim)$counts,1,function(x){
    sum(x >= 3) >= 10
})
sim <- sim[geneFilter, ]
# 过滤掉11个基因
> dim(sim)
[1] 739 300
```

### **2.3 归一化**

```r
FQnorm <- function(counts){
    rk <- apply(counts,2,rank,ties.method='min')
    counts.sort <- apply(counts,2,sort)
    refdist <- apply(counts.sort,1,median)
    norm <- apply(rk,2,function(r){ refdist[r] })
    rownames(norm) <- rownames(counts)
    return(norm)
}
assays(sim)$norm <- FQnorm(assays(sim)$counts)
```

### **2.4降维**

**方法一：PCA**

```r
pca <- prcomp(t(log1p(assays(sim)$norm)), scale. = FALSE)
rd1 <- pca$x[,1:2]

plot(rd1, col = rgb(0,0,0,.5), pch=16, asp = 1)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-072820.png)

**方法二：diffusion maps**

```r
library(destiny, quietly = TRUE)
dm <- DiffusionMap(t(log1p(assays(sim)$norm)))
rd2 <- cbind(DC1 = dm$DC1, DC2 = dm$DC2)
plot(rd2, col = rgb(0,0,0,.5), pch=16, asp = 1)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-073017.png)

**将两种结果都保存起来**

```r
reducedDims(sim) <- SimpleList(PCA = rd1, DiffMap = rd2)
```

### **2.5 聚类**

**方法一： Gaussian mixture modeling**

```r
library(mclust, quietly = TRUE)
#根据PCA结果
cl1 <- Mclust(rd1)$classification
colData(sim)$GMM <- cl1

library(RColorBrewer)
plot(rd1, col = brewer.pal(9,"Set1")[cl1], pch=16, asp = 1)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-073137.png)

**方法二：k-means**

```r
cl2 <- kmeans(rd1, centers = 4)$cluster
colData(sim)$kmeans <- cl2

plot(rd1, col = brewer.pal(9,"Set1")[cl2], pch=16, asp = 1)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-073234.png)

### **2.6 使用slingshot**

```r
sim <- slingshot(sim, clusterLabels = 'GMM', reducedDim = 'PCA')
summary(sim$slingPseudotime_1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   8.633  21.118  21.415  34.367  43.186
```

* 如果要把slingshot的所有结果都提取出来，可以用`SlingshotDataSet`
* 像是SingleCellExperiment这一类对象的结果可以用 `metadata(sim)$slingshot` 获取

**对结果可视化**

```r
colors <- colorRampPalette(brewer.pal(11,'Spectral')[-6])(100)
plotcol <- colors[cut(sim$slingPseudotime_1, breaks=100)]

plot(reducedDims(sim)$PCA, col = plotcol, pch=16, asp = 1)
lines(SlingshotDataSet(sim), lwd=2, col='black')
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-073339.png)

## 3 学习TSCAN

> 说明文档在：<https://bioconductor.org/packages/3.11/bioc/vignettes/TSCAN/inst/doc/TSCAN.pdf>

### **3.1 准备数据**

```r
library(TSCAN)
data(lpsdata)
procdata <- preprocess(lpsdata)
```

这个`preprocess`函数做了三件事：

* 对表达量进行了log2(exp+1)
* 去掉了在超过一半细胞中表达量小于1的基因
* 将coefficient  ofcovariance小于1的基因去掉

### **3.2 构建拟时序**

使用`exprmclust`函数，进行了PCA降维以及model-based的聚类

```r
lpsmclust <- exprmclust(procdata)
# 然后看下结果
plotmclust(lpsmclust)
```

这个图上很乱，但不妨碍看到分了3群

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-074550.png)

接着获得排序

```r
lpsorder <- TSCANorder(lpsmclust)
```

### **3.3 基于找到的排序检测差异基因**

使用`difftest`函数

```r
diffval <- difftest(procdata,lpsorder)
```

根据q值找差异基因

```r
head(row.names(diffval)[diffval$qval < 0.05])
```

对其中某个差异基因作图

```r
# 以STAT2基因为例
STAT2expr <- log2(lpsdata["STAT2",]+1)
singlegeneplot(STAT2expr, TSCANorder(lpsmclust,flip=TRUE,orderonly=FALSE))
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-16-074954.png)

## 4 关于monocle

monocle2的拟时序分析前期数据准备可以看：[单细胞转录组学习笔记-18-scRNA包学习Monocle2](https://www.jianshu.com/p/356fa97342b3)

> 另外monocle3可以看：[跟着官网学习单细胞Monocle3](https://www.jianshu.com/p/8f57ac63f399)

以版本2为例，基本上还是分三步走：**从差异分析结果选合适基因=》降维=》细胞排序**

### **step1: 选合适基因**

```r
ordering_genes <- row.names (subset(diff_test_res, qval < 0.01))
cds <- setOrderingFilter(cds, ordering_genes)
plot_ordering_genes(cds)
```

### **step2: 降维**

```r
# 默认使用DDRTree的方法 
cds <- reduceDimension(cds, max_components = 2,
                            method = 'DDRTree')
```

### **step3: 细胞排序**

```r
cds <- orderCells(cds)
```

### **最后可视化**

```r
plot_cell_trajectory(cds, color_by = "Biological_Condition")
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2019-09-04-100349.png)

这个图就可以看到细胞的发展过程

另外，`plot_genes_in_pseudotime` 可以对基因在不同细胞中的表达量变化进行绘图

```r
plot_genes_in_pseudotime(cds[cg,],
                         color_by = "Biological_Condition")
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2019-09-04-100608.png)

## TODO：

* **4 ”关于monocle“章节中的**[单细胞转录组学习笔记-18-scRNA包学习Monocle2](https://www.jianshu.com/p/356fa97342b3)和

  > [跟着官网学习单细胞Monocle3](https://www.jianshu.com/p/8f57ac63f399) 放到补充篇


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/03/03-12.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
