# 3.15 不同R包数据的相互转换

## 1 前言

> 这部分内容是来自Seurat：<https://satijalab.org/seurat/v3.1/conversion_vignette.html>

单细胞数据格式目前有这么几大派：

* Bioconductor主导的SingleCellExperiment数据格式：例如scran、scater、monocle（尽管它的对象不直接使用SingleCellExperiment，但灵感来源于SingleCellExperiment，并且操作也是类似的）
* Seurat：SeuratObject格式
* scanpy：AnnData格式

这么一来，很多分析流程就被固定在某个包中了，比如使用Seurat会一用到底，也不会去学习scater或其他R包了，但也许就错过了其他R包好用的一些功能（比如我感觉`scater`的`uniquifyFeatureNames`就很好用）

既然有需求，就有开发者添加功能 ，这里Davis McCarthy 和Alex Wolf就为Seurat添加了和其他数据类型转换的函数

## 2 Seurat与SingleCellExperiment的相互转换

```r
library(scater)
# devtools::install_github(repo = "satijalab/seurat", ref = "loom")
library(loomR)
library(Seurat)
library(patchwork)
```

### **2.1 Seurat转SingleCellExperiment**

```r
# 使用Seurat内置数据
data("pbmc_small")
> pbmc_small
An object of class Seurat 
230 features across 80 samples within 1 assay 
Active assay: RNA (230 features)
 2 dimensional reductions calculated: pca, tsne

# 一个函数即可
pbmc.sce <- as.SingleCellExperiment(pbmc_small)
> pbmc.sce
class: SingleCellExperiment 
dim: 230 80 
metadata(0):
assays(2): counts logcounts
rownames(230): MS4A1 CD79B ... SPON2 S100B
rowData names(5): vst.mean vst.variance
  vst.variance.expected
  vst.variance.standardized vst.variable
colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ...
  GGAACACTTCAGAC CTTGATTGATCTTC
colData names(8): orig.ident nCount_RNA ...
  RNA_snn_res.1 ident
reducedDimNames(2): PCA TSNE
spikeNames(0):
altExpNames(0):

# 接下来就是scater的操作了
p1 <- plotExpression(pbmc.sce, features = "MS4A1", x = "ident") + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1))
p2 <- plotPCA(pbmc.sce, colour_by = "ident")
p1 + p2
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-18-121906.png)

### **2.2 SingleCellExperiment转Seurat**

```r
# 导入sce对象（https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/manno_human.rds）
manno <- readRDS(file = "manno_human.rds")
> manno
class: SingleCellExperiment 
dim: 20560 4029 
metadata(0):
assays(2): counts logcounts
rownames(20560): 'MARC1' 'MARC2' ... ZZEF1 ZZZ3
rowData names(10): feature_symbol
  is_feature_control ... total_counts
  log10_total_counts
colnames(4029): 1772122_301_C02 1772122_180_E05
  ... 1772116-063_G02 1772099-259_H03
colData names(34): Species cell_type1 ...
  pct_counts_ERCC is_cell_control
reducedDimNames(0):
altExpNames(0):

manno <- runPCA(manno)
# 转为seurat对象
manno.seurat <- as.Seurat(manno, counts = "counts", data = "logcounts")

# 看下这个函数
# as.Seurat(
#     x,
#     counts = "counts",
#     data = "logcounts",
#     assay = "RNA",
#     project = "SingleCellExperiment",
#     ...
# )
# 既然有默认参数，因此直接按下面这么写就可以：
manno.seurat <- as.Seurat(manno)

> manno.seurat
An object of class Seurat 
20560 features across 4029 samples within 1 assay 
Active assay: RNA (20560 features)
 1 dimensional reduction calculated: PCA

Idents(manno.seurat) <- "cell_type1"
p1 <- DimPlot(manno.seurat, reduction = "PCA", group.by = "Source") + NoLegend()
p2 <- RidgePlot(manno.seurat, features = "ACTB", group.by = "Source")
p1 + p2
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-18-115930.png)

## 3 Seurat与loom的相互转换

还记得上次在[单细胞交响乐16-处理大型数据](https://www.jianshu.com/p/786752478e08)中说到：处理大型数据遇到内存不足时，可以使用这个`HDF5Array`R包（类似的还有 `bigmemory`, `matter`），它会将底层数据做成HDF5格式，用硬盘空间来存储数据，必要时再调用一部分数据到内存。loom格式就是处理HDF5使用的

### **3.1 Seurat转为loom**

```r
pbmc.loom <- as.loom(pbmc, filename = "pbmc3k.loom", verbose = FALSE)
pbmc.loom
## Class: loom
## Filename: /__w/1/s/output/pbmc3k.loom
## Access type: H5F_ACC_RDWR
## Attributes: version, chunks, LOOM_SPEC_VERSION, assay, last_modified
## Listing:
##        name    obj_type dataset.dims dataset.type_class
##   col_attrs   H5I_GROUP         <NA>               <NA>
##  col_graphs   H5I_GROUP         <NA>               <NA>
##      layers   H5I_GROUP         <NA>               <NA>
##      matrix H5I_DATASET 2638 x 13714          H5T_FLOAT
##   row_attrs   H5I_GROUP         <NA>               <NA>
##  row_graphs   H5I_GROUP         <NA>               <NA>

# 最后使用完要记得关上loom对象
pbmc.loom$close_all()
```

### **3.2 loom转为Seurat**

**首先读取：用 loomR 的connect**

```r
l6.immune <- connect(filename = "../data/l6_r1_immune_cells.loom", mode = "r")
l6.immune
## Class: loom
## Filename: /__w/1/s/data/l6_r1_immune_cells.loom
## Access type: H5F_ACC_RDONLY
## Attributes: CreationDate, last_modified
## Listing:
##        name    obj_type  dataset.dims dataset.type_class
##   col_attrs   H5I_GROUP          <NA>               <NA>
##  col_graphs   H5I_GROUP          <NA>               <NA>
##      layers   H5I_GROUP          <NA>               <NA>
##      matrix H5I_DATASET 14908 x 27998          H5T_FLOAT
##   row_attrs   H5I_GROUP          <NA>               <NA>
##  row_graphs   H5I_GROUP          <NA>               <NA>
```

**然后转换**

```r
l6.seurat <- as.Seurat(l6.immune)
VlnPlot(l6.seurat, features = c("Sparc", "Ftl1", "Junb", "Ccl4"), ncol = 2, pt.size = 0.1)
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-18-120927.png)

**最后处理完，记得关闭loom文件**

```r
l6.immune$close_all()
```

### **3.3 补充**

**如果使用Seurat V2**，还有一个自带的函数`Convert`

```r
data("pbmc_small")
pbmc_small
pfile <- Convert(from = pbmc_small, to = "loom", filename = "pbmc_small.loom", 
    display.progress = FALSE)
pfile
## Class: loom
## Filename: /home/paul/Documents/Satija/pbmc_small.loom
## Access type: H5F_ACC_RDWR
## Attributes: version, chunks
## Listing:
##        name    obj_type dataset.dims dataset.type_class
##   col_attrs   H5I_GROUP         <NA>               <NA>
##  col_graphs   H5I_GROUP         <NA>               <NA>
##      layers   H5I_GROUP         <NA>               <NA>
##      matrix H5I_DATASET     80 x 230          H5T_FLOAT
##   row_attrs   H5I_GROUP         <NA>               <NA>
##  row_graphs   H5I_GROUP         <NA>               <NA>
```

## 4 Scanpy转Seurat

Seurat有一个函数`ReadH5AD`可以读取AnnData的H5AD文件

```r
pbmc3k <- ReadH5AD(file = "pbmc3k.h5ad")
# 利用Seurat操作
Idents(pbmc3k) <- "louvain"
p1 <- DimPlot(pbmc3k, label = TRUE)
p2 <- VlnPlot(pbmc3k, features = c("CST3", "NKG7", "PPBP"), combine = FALSE)
wrap_plots(c(list(p1), p2), ncol = 2) & NoLegend()
```

![](https://jieandze1314-1255603621.cos.ap-guangzhou.myqcloud.com/blog/2020-07-18-121342.png)

目前还不能直接将Seurat写成H5AD文件，因此不能之间将Seurat转为Scanpy；但是可以将loom文件作为桥梁实现Seurat转Scanpy，例如`Scanpy` 有一个函数`scanpy.read_loom()`

> 参考：<https://scanpy.readthedocs.io/en/stable/api/scanpy.read_loom.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jieandze1314.osca.top/03/03-15.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
