Last updated: 2019-04-03

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(20190110)

The command set.seed(20190110) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

✔ Repository version: 5870e6e

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ._.DS_Store
    Ignored:    analysis/cache/
    Ignored:    build-logs/
    Ignored:    data/alevin/
    Ignored:    data/cellranger/
    Ignored:    data/processed/
    Ignored:    data/published/
    Ignored:    output/.DS_Store
    Ignored:    output/._.DS_Store
    Ignored:    output/03-clustering/selected_genes.csv.zip
    Ignored:    output/04-marker-genes/de_genes.csv.zip
    Ignored:    packrat/.DS_Store
    Ignored:    packrat/._.DS_Store
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    packrat/src/

Untracked files:
    Untracked:  DGEList.Rds
    Untracked:  output/90-methods/package-versions.json
    Untracked:  scripts/build.pbs

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   output/01-preprocessing/droplet-selection.pdf
    Modified:   output/01-preprocessing/parameters.json
    Modified:   output/01-preprocessing/selection-comparison.pdf
    Modified:   output/01B-alevin/alevin-comparison.pdf
    Modified:   output/01B-alevin/parameters.json
    Modified:   output/02-quality-control/qc-thresholds.pdf
    Modified:   output/02-quality-control/qc-validation.pdf
    Modified:   output/03-clustering/cluster-comparison.pdf
    Modified:   output/03-clustering/cluster-validation.pdf
    Modified:   output/04-marker-genes/de-results.pdf

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	5870e6e	Luke Zappia	2019-04-03	Adjust figures and fix names
html	33ac14f	Luke Zappia	2019-03-20	Tidy up website
Rmd	ae75188	Luke Zappia	2019-03-06	Revise figures after proofread
html	ae75188	Luke Zappia	2019-03-06	Revise figures after proofread
html	2693e97	Luke Zappia	2019-03-05	Add methods page
html	4a2f376	Luke Zappia	2019-02-20	Add default Cell Ranger and update figures
Rmd	431c180	Luke Zappia	2019-02-19	Add droplet selection figure
html	431c180	Luke Zappia	2019-02-19	Add droplet selection figure
Rmd	8f826ef	Luke Zappia	2019-02-08	Rebuild site and tidy
html	8f826ef	Luke Zappia	2019-02-08	Rebuild site and tidy
Rmd	2daa7f2	Luke Zappia	2019-01-25	Improve output and rebuild
html	2daa7f2	Luke Zappia	2019-01-25	Improve output and rebuild
Rmd	b4432ff	Luke Zappia	2019-01-18	Add emptyDrops p-value plot
html	b4432ff	Luke Zappia	2019-01-18	Add emptyDrops p-value plot
html	bf40049	Luke Zappia	2019-01-18	Add preprocessing

# scRNA-seq
library("DropletUtils")
library("SingleCellExperiment")

# Plotting
library("cowplot")
library("UpSetR")
library("grid")

# Tidyverse
library("tidyverse")

source(here::here("R/load.R"))
source(here::here("R/annotate.R"))
source(here::here("R/output.R"))

bpparam <- BiocParallel::MulticoreParam(workers = 10)

Introduction

In this document we are going to read in the unfiltered counts matrix produced by Cell Ranger and determine which of those droplets contain cells using the DropletUtils package.

path <- here::here("data/cellranger")
raw <- read10x(path, dataset = "Orgs123")

filt_barcodes <- readr::read_lines(file.path(path, "filtered_barcodes.tsv.gz"))

colData(raw)$CellRangerFilt <- paste(colData(raw)$Barcode, 
                                     colData(raw)$Sample, sep = "-") %in%
    filt_barcodes

The raw dataset has 33538 features and 2211840 droplets.

raw <- raw[Matrix::rowSums(counts(raw)) > 0, Matrix::colSums(counts(raw)) > 0]

After removing all zero features and droplets the dataset has 25020 features and 945038 droplets.

Barcode ranks

empty_thresh <- 100

bc_ranks <- barcodeRanks(counts(raw), lower = empty_thresh)

colData(raw)$BarcodeRank   <- bc_ranks$rank
colData(raw)$BarcodeTotal  <- bc_ranks$total
colData(raw)$BarcodeFitted <- bc_ranks$fitted

bc_data <- colData(raw) %>%
    as.data.frame() %>%
    select(Cell, Kept = CellRangerFilt, Rank = BarcodeRank,
           Total = BarcodeTotal, Fitted = BarcodeFitted) %>%
    arrange(Rank)

Let’s start by ordering the droplets according to their total counts and plotting this on a log scale. This let’s us see the distribution of total counts.

ggplot(bc_data, aes(x = Rank, y = Total)) +
    geom_point(shape = 1, aes(colour = Kept)) +
    geom_line(aes(y = Fitted), colour = "red") +
    geom_hline(yintercept = bc_ranks$knee,
               colour = "dodgerblue", linetype = "dashed") +
    annotate("text", x = 0, y = bc_ranks$knee, label = "Knee",
             colour = "dodgerblue", hjust = 0, vjust = -1) +
    geom_hline(yintercept = bc_ranks$inflection,
               colour = "forestgreen", linetype = "dashed") +
    annotate("text", x = 0, y = bc_ranks$inflection, label = "Inflection",
             colour = "forestgreen", hjust = 0, vjust = -1) +
    geom_hline(yintercept = empty_thresh,
               colour = "darkorchid", linetype = "dashed") +
    annotate("text", x = 0, y = empty_thresh, label = "Empty threshold",
             colour = "darkorchid", hjust = 0, vjust = -1) +
    scale_x_log10(labels = scales::number) +
    scale_y_log10(labels = scales::number) +
    scale_colour_manual(values = c("black", "violet")) +
    ylab("Total counts") +
    theme_minimal()

Expand here to see past versions of barcodes-plot-1.png:

Version	Author	Date
bf40049	Luke Zappia	2019-01-18

This is typical of what we see for 10x experiment where there is a sharp drop off between droplets with lots of counts and those without many. The inflection and knee points are methods for identifying the transition between distributions. These are roughly associated with the cells selected by Cell Ranger The empty threshold line indicates the point at which we assume droplets must be empty (total counts <= 100).

Traditional Cell Ranger

The version of Cell Ranger we have used here selects cells using a modified version of the EmptyDrops method (see below) but older versions used an alternative approach by calculating the 99th percentile of the total number of counts in top given expected number of cells and selecting droplets that had at least 10 percent of this many counts. For comparison purposes we will perform this method as well.

default1 <- defaultDrops(counts(raw[, colData(raw)$Sample == 1]))
default2 <- defaultDrops(counts(raw[, colData(raw)$Sample == 2]))
default3 <- defaultDrops(counts(raw[, colData(raw)$Sample == 3]))
colData(raw)$DefaultFilt <- c(default1, default2, default3)

plot_data <- colData(raw) %>%
    as.data.frame() %>%
    select(Cell, Sample, Total = BarcodeTotal, Kept = DefaultFilt) %>%
    group_by(Sample) %>%
    mutate(Rank = rank(-Total)) %>%
    arrange(Sample, Rank)

ggplot(plot_data, aes(x = Rank, y = Total)) +
    geom_point(shape = 1, aes(colour = Kept)) +
    scale_x_log10(labels = scales::number) +
    scale_y_log10(labels = scales::number) +
    scale_colour_manual(values = c("black", "forestgreen")) +
    facet_wrap(~ Sample, nrow = 1, labeller = label_both) +
    ylab("Total counts") +
    theme_minimal()

This method identifies 7006 droplets as containing cells.

Empty drops

We will now look at identifying which droplets to select using the EmptyDrops method. This method tests whether the composition of a droplet is significantly different from the ambient RNA in the sample which is obtained by pooling the empty droplets. Droplets with very large counts are also automatically retained.

set.seed(1)

emp_iters <- 30000

emp_drops <- emptyDrops(counts(raw), lower = empty_thresh, niters = emp_iters,
                        test.ambient = TRUE, BPPARAM = bpparam)

EmptyDrops calculates p-values using a permutation approach. Let’s check that we are usually a sufficient number of iterations. If there are any droplets that have non-significant p-values but are limited by the number of permuations the number should be increased.

emp_fdr <- 0.01
is_cell <- emp_drops$FDR <= emp_fdr
is_cell[is.na(is_cell)] <- FALSE

colData(raw)$EmpDropsLogProb <- emp_drops$LogProb
colData(raw)$EmpDropsPValue  <- emp_drops$PValue
colData(raw)$EmpDropsLimited <- emp_drops$Limited
colData(raw)$EmpDropsFDR     <- emp_drops$FDR
colData(raw)$EmpDropsFilt    <- is_cell

table(Limited = emp_drops$Limited, Significant = is_cell)

       Significant
Limited  FALSE   TRUE
  FALSE 935180    534
  TRUE       0   9324

Another way to check the EmptyDrops results is to look at the droplets below our empty threshold. We are assuming that these droplets only contain ambient RNA and therefore the null hypothesis should be true and the distribution of p-values should be approximately uniform.

colData(raw) %>%
    as.data.frame() %>%
    filter(BarcodeTotal <= empty_thresh,
           BarcodeTotal > 0) %>%
    ggplot(aes(x = EmpDropsPValue)) +
    geom_histogram() +
    xlab("p-value") +
    theme_minimal()

Expand here to see past versions of empty-drops-pvals-1.png:

Version	Author	Date
8f826ef	Luke Zappia	2019-02-08
2daa7f2	Luke Zappia	2019-01-25
b4432ff	Luke Zappia	2019-01-18

Peaks near zero would tell us that not all of the droplets below the threshold are truly empty and that we should lower it.

We can also plot the negative log-probability against the total counts to see which droplets EmptyDrops has selected.

colData(raw) %>%
    as.data.frame() %>%
    filter(!is.na(EmpDropsFilt)) %>%
    ggplot(aes(x = BarcodeTotal, y = -EmpDropsLogProb, colour = EmpDropsFilt)) +
    geom_point() +
    scale_colour_discrete(name = "Significant") +
    xlab("Total counts") +
    ylab("-log(probability)") +
    theme_minimal()

Expand here to see past versions of empty-drops-plot-1.png:

Version	Author	Date
8f826ef	Luke Zappia	2019-02-08
2daa7f2	Luke Zappia	2019-01-25
b4432ff	Luke Zappia	2019-01-18
bf40049	Luke Zappia	2019-01-18

There were 923875 droplets with less than 100 counts which were used to make up the ambient RNA pool. Of the remaining 21163 droplets 9858 were found to have profiles significantly different from the ambient RNA and should contain cells.

Comparison

Let’s quickly compare differences between the selection methods using an UpSet plot:

plot_data <- list(
    "Cell Ranger"    = colData(raw)$Cell[colData(raw)$DefaultFilt],
    "Cell Ranger v3" = colData(raw)$Cell[colData(raw)$CellRangerFilt],
    "EmptyDrops"     = colData(raw)$Cell[colData(raw)$EmpDropsFilt]
)

upset(fromList(plot_data), order.by = "freq",
      sets.x.label = "Number of cells", text.scale = c(2, 1.2, 2, 1.2, 2, 3))

Expand here to see past versions of compare-overlap-1.png:

Version	Author	Date
4a2f376	Luke Zappia	2019-02-20
8f826ef	Luke Zappia	2019-02-08
2daa7f2	Luke Zappia	2019-01-25
b4432ff	Luke Zappia	2019-01-18
bf40049	Luke Zappia	2019-01-18

We can see that most of the cells are identified by all methods. A large number of cells are also identified by the new Cell Ranger and EmptyDrops methods. These are likely to be those cells that fall below the total counts threshold selected by the traditional Cell Ranger algorithm. Our use of the EmptyDrops algorithm has identified even more cells than Cell Ranger v3 but there are very few cells that Cell Ranger identifed which EmptyDrops didn’t.

Selection

We are going to perform further quality control of these cells anyway so at this stage we will keep those that were selected by either Cell Ranger v3 or EmptyDrops.

selected <- raw[, colData(raw)$CellRangerFilt | colData(raw)$EmpDropsFilt]
selected <- selected[Matrix::rowSums(counts(selected)) > 0, ]

colData(selected)$SelMethod <- "Both"
colData(selected)$SelMethod[!colData(selected)$CellRangerFilt] <- "emptyDrops"
colData(selected)$SelMethod[!colData(selected)$EmpDropsFilt] <- "CellRanger"

Annotation

Now that we have a dataset that just contains actual cells we will add some extra annotation. This includes downloading feature annotation from BioMart and assigning cell cycle stages using the cyclone function in the scran package as well as calculating a range of quality control metrics using scater.

selected <- annotateSCE(selected,
                        org        = "human",
                        add_anno   = TRUE,
                        host       = "jul2018.archive.ensembl.org",
                        calc_qc    = TRUE,
                        calc_cpm   = TRUE,
                        cell_cycle = TRUE,
                        BPPARAM    = bpparam)

glimpse(as.data.frame(colData(selected)))

Observations: 10,002
Variables: 55
$ Cell                                           <chr> "Cell74", "Cell10…
$ Dataset                                        <chr> "Orgs123", "Orgs1…
$ Barcode                                        <chr> "AAACCTGAGAGTACAT…
$ Sample                                         <chr> "1", "1", "1", "1…
$ CellRangerFilt                                 <lgl> TRUE, TRUE, FALSE…
$ BarcodeRank                                    <dbl> 2281.5, 3193.5, 1…
$ BarcodeTotal                                   <dbl> 11084, 9414, 114,…
$ BarcodeFitted                                  <dbl> 11089.186, 9427.9…
$ DefaultFilt                                    <lgl> TRUE, TRUE, FALSE…
$ EmpDropsLogProb                                <dbl> -11721.5793, -948…
$ EmpDropsPValue                                 <dbl> 3.333222e-05, 3.3…
$ EmpDropsLimited                                <lgl> TRUE, TRUE, TRUE,…
$ EmpDropsFDR                                    <dbl> 0.000000000, 0.00…
$ EmpDropsFilt                                   <lgl> TRUE, TRUE, TRUE,…
$ SelMethod                                      <chr> "Both", "Both", "…
$ G1Score                                        <dbl> 0.035, 0.398, 0.9…
$ SScore                                         <dbl> 0.035, 0.398, 0.9…
$ G2MScore                                       <dbl> 0.561, 0.100, 0.0…
$ CellCycle                                      <chr> "G2M", "S", "G1",…
$ is_cell_control                                <lgl> FALSE, FALSE, FAL…
$ total_features_by_counts                       <int> 3193, 2663, 108, …
$ log10_total_features_by_counts                 <dbl> 3.504335, 3.42553…
$ total_counts                                   <dbl> 11084, 9414, 114,…
$ log10_total_counts                             <dbl> 4.044736, 3.97382…
$ pct_counts_in_top_50_features                  <dbl> 31.07182, 33.8963…
$ pct_counts_in_top_100_features                 <dbl> 44.92061, 49.2033…
$ pct_counts_in_top_200_features                 <dbl> 55.37712, 59.6983…
$ pct_counts_in_top_500_features                 <dbl> 67.68315, 71.8398…
$ total_features_by_counts_endogenous            <int> 3183, 2653, 108, …
$ log10_total_features_by_counts_endogenous      <dbl> 3.502973, 3.42390…
$ total_counts_endogenous                        <dbl> 10647, 9142, 114,…
$ log10_total_counts_endogenous                  <dbl> 4.027268, 3.96108…
$ pct_counts_endogenous                          <dbl> 96.05738, 97.1106…
$ pct_counts_in_top_50_features_endogenous       <dbl> 30.95708, 34.4891…
$ pct_counts_in_top_100_features_endogenous      <dbl> 44.38809, 49.5077…
$ pct_counts_in_top_200_features_endogenous      <dbl> 54.16549, 59.1227…
$ pct_counts_in_top_500_features_endogenous      <dbl> 66.63849, 71.3301…
$ total_features_by_counts_feature_control       <int> 10, 10, 0, 7, 10,…
$ log10_total_features_by_counts_feature_control <dbl> 1.0413927, 1.0413…
$ total_counts_feature_control                   <dbl> 437, 272, 0, 13, …
$ log10_total_counts_feature_control             <dbl> 2.6414741, 2.4361…
$ pct_counts_feature_control                     <dbl> 3.9426200, 2.8893…
$ pct_counts_in_top_50_features_feature_control  <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_100_features_feature_control <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_200_features_feature_control <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_500_features_feature_control <dbl> 100, 100, NaN, 10…
$ total_features_by_counts_MT                    <int> 10, 10, 0, 7, 10,…
$ log10_total_features_by_counts_MT              <dbl> 1.0413927, 1.0413…
$ total_counts_MT                                <dbl> 437, 272, 0, 13, …
$ log10_total_counts_MT                          <dbl> 2.6414741, 2.4361…
$ pct_counts_MT                                  <dbl> 3.9426200, 2.8893…
$ pct_counts_in_top_50_features_MT               <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_100_features_MT              <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_200_features_MT              <dbl> 100, 100, NaN, 10…
$ pct_counts_in_top_500_features_MT              <dbl> 100, 100, NaN, 10…

glimpse(as.data.frame(rowData(selected)))

Observations: 24,812
Variables: 20
$ ID                         <chr> "ENSG00000243485", "ENSG00000238009",…
$ Symbol                     <chr> "MIR1302-2HG", "AL627309.1", "AL62730…
$ Type                       <chr> "Gene Expression", "Gene Expression",…
$ Name                       <chr> "MIR1302-2HG", "AL627309.1", "AL62730…
$ ensembl_gene_id            <chr> "ENSG00000243485", "ENSG00000238009",…
$ entrezgene                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ external_gene_name         <chr> "MIR1302-2HG", "AL627309.1", "AL62730…
$ hgnc_symbol                <chr> "MIR1302-2HG", "", "", "", "FAM87B", …
$ chromosome_name            <chr> "1", "1", "1", "1", "1", "1", "1", "1…
$ description                <chr> "MIR1302-2 host gene", "", "", "", "f…
$ gene_biotype               <chr> "lincRNA", "lincRNA", "lincRNA", "lin…
$ percentage_gene_gc_content <dbl> 48.84, 39.91, 48.75, 40.42, 48.52, 52…
$ is_feature_control         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ is_feature_control_MT      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ mean_counts                <dbl> 0.00009998, 0.00119976, 0.00019996, 0…
$ log10_mean_counts          <dbl> 4.341859e-05, 5.207369e-04, 8.683285e…
$ n_cells_by_counts          <int> 1, 12, 2, 379, 10, 153, 204, 12, 879,…
$ pct_dropout_by_counts      <dbl> 99.99000, 99.88002, 99.98000, 96.2107…
$ total_counts               <dbl> 1, 12, 2, 415, 10, 160, 214, 12, 1045…
$ log10_total_counts         <dbl> 0.3010300, 1.1139434, 0.4771213, 2.61…

Figure

plot_data <- colData(raw) %>%
    as.data.frame() %>%
    select(Cell, Kept = EmpDropsFilt, Rank = BarcodeRank,
           Total = BarcodeTotal, Fitted = BarcodeFitted) %>%
    arrange(Rank)

emp_plot <- plot_data %>%
    filter(Total > 0) %>%
    ggplot(aes(x = Rank, y = Total)) +
    geom_point(shape = 1, aes(colour = Kept)) +
    geom_line(aes(y = Fitted), colour = "#00B7C6", size = 1) +
    geom_hline(yintercept = bc_ranks$knee,
               colour = "#00ADEF", linetype = "dashed") +
    annotate("text", x = 0, y = bc_ranks$knee, label = "Knee",
             colour = "#00ADEF", hjust = 0, vjust = -1) +
    geom_hline(yintercept = bc_ranks$inflection,
               colour = "#8DC63F", linetype = "dashed") +
    annotate("text", x = 0, y = bc_ranks$inflection, label = "Inflection",
             colour = "#8DC63F", hjust = 0, vjust = -1) +
    geom_hline(yintercept = empty_thresh,
               colour = "#7A52C7", linetype = "dashed") +
    annotate("text", x = 0, y = empty_thresh, label = "Empty threshold",
             colour = "#7A52C7", hjust = 0, vjust = -1) +
    scale_x_log10(labels = scales::number, breaks = 10 ^ seq(0, 6)) +
    scale_y_log10(labels = scales::number, breaks = 10 ^ seq(0, 5)) +
    scale_colour_manual(values = c("black", "#EC008C"),
                        labels = c("Empty", "Cell")) +
    ggtitle("EmptyDrops droplet selection") +
    ylab("Total counts") +
    theme_minimal() +
    theme(legend.position = "bottom",
          legend.title = element_blank(),
          plot.margin = unit(c(5.5, 20, 5.5, 5.5), "points"))

ggsave(here::here("output", DOCNAME, "droplet-selection.pdf"), emp_plot,
       width = 7, height = 5, scale = 1)
ggsave(here::here("output", DOCNAME, "droplet-selection.png"), emp_plot,
       width = 7, height = 5, scale = 1)

emp_plot

Expand here to see past versions of figure-1.png:

Version	Author	Date
ae75188	Luke Zappia	2019-03-06
4a2f376	Luke Zappia	2019-02-20
431c180	Luke Zappia	2019-02-19

plot_data <- colData(raw) %>%
    as.data.frame() %>%
    select(Name             = Cell,
           `Cell Ranger`    = DefaultFilt,
           `Cell Ranger v3` = CellRangerFilt,
           EmptyDrops       = `EmpDropsFilt`,
           `Total counts`   = BarcodeTotal) %>%
    mutate(`Cell Ranger`    = if_else(`Cell Ranger`,    1L, 0L),
           `Cell Ranger v3` = if_else(`Cell Ranger v3`, 1L, 0L),
           EmptyDrops       = if_else(EmptyDrops,       1L, 0L)) %>%
    mutate(`Total counts`   = log10(`Total counts`))

upset(plot_data, order.by = "freq",
      sets.x.label = "Number of cells",
      text.scale = c(2, 1.6, 2, 1.3, 2, 3),
      matrix.color = "#7A52C7",
      main.bar.color = "#7A52C7",
      sets.bar.color = "#7A52C7")
grid.edit("arrange", name = "arrange2")
comp_plot <- grid.grab()

ggsave(here::here("output", DOCNAME, "selection-comparison.pdf"), comp_plot,
       width = 7, height = 4, scale = 1.2)
ggsave(here::here("output", DOCNAME, "selection-comparison.png"), comp_plot,
       width = 7, height = 4, scale = 1.2)

Summary

Parameters

This table describes parameters used and set in this document.

params <- list(
    list(
        Parameter = "n_droplets",
        Value = ncol(raw),
        Description = "Number of droplets in the raw dataset"
    ),
    list(
        Parameter = "empty_thresh",
        Value = empty_thresh,
        Description = "Droplets with less than this many counts are empty"
    ),
    list(
        Parameter = "emp_iters",
        Value = emp_iters,
        Description = "Number of iterations for EmptyDrops p-values"
    ),
    list(
        Parameter = "emp_fdr",
        Value = emp_fdr,
        Description = "FDR cutoff for EmptyDrops"
    ),
    list(
        Parameter = "n_default",
        Value = sum(colData(raw)$DefaultFilt),
        Description = "Number of cells selected by the default Cell Ranger method"
    ),
    list(
        Parameter = "n_cellranger",
        Value = sum(colData(raw)$CellRangerFilt),
        Description = "Number of cells selected by the Cell Ranger v3 method"
    ),
    list(
        Parameter = "n_empdrops",
        Value = sum(colData(raw)$EmpDropsFilt),
        Description = "Number of cells selected by the EmptyDrops method"
    ),
    list(
        Parameter = "n_cells",
        Value = ncol(selected),
        Description = "Number of cells selected"
    )
)

params <- jsonlite::toJSON(params, pretty = TRUE)
knitr::kable(jsonlite::fromJSON(params))

Parameter	Value	Description
n_droplets	945038	Number of droplets in the raw dataset
empty_thresh	100	Droplets with less than this many counts are empty
emp_iters	30000	Number of iterations for EmptyDrops p-values
emp_fdr	0.01	FDR cutoff for EmptyDrops
n_default	7006	Number of cells selected by the default Cell Ranger method
n_cellranger	8291	Number of cells selected by the Cell Ranger v3 method
n_empdrops	9858	Number of cells selected by the EmptyDrops method
n_cells	10002	Number of cells selected

Output files

This table describes the output files produced by this document. Right click and Save Link As… to download the results.

write_rds(selected, here::here("data/processed/01-selected.Rds"))

dir.create(here::here("output", DOCNAME), showWarnings = FALSE)

for (sample in unique(colData(selected)$Sample)) {
    barcodes <- colData(selected) %>%
        as.data.frame() %>%
        filter(Sample == sample) %>%
        select(Barcode) %>%
        mutate(Barcode = paste0(Barcode, "-1"))
    
    readr::write_tsv(barcodes,
                     print(here::here("output", DOCNAME,
                                glue::glue("barcodes_Org{sample}.tsv"))),
                     col_names = FALSE)
}

readr::write_lines(params, here::here("output", DOCNAME, "parameters.json"))

knitr::kable(data.frame(
    File = c(
        getDownloadLink("parameters.json", DOCNAME),
        getDownloadLink("barcodes_Org1.tsv", DOCNAME),
        getDownloadLink("barcodes_Org2.tsv", DOCNAME),
        getDownloadLink("barcodes_Org3.tsv", DOCNAME),
        getDownloadLink("droplet-selection.png", DOCNAME),
        getDownloadLink("droplet-selection.pdf", DOCNAME),
        getDownloadLink("selection-comparison.png", DOCNAME),
        getDownloadLink("selection-comparison.pdf", DOCNAME)
    ),
    Description = c(
        "Parameters set and used in this analysis",
        "Selected barcodes for organoid 1",
        "Selected barcodes for organoid 2",
        "Selected barcodes for organoid 3",
        "Droplet selection figure (PNG)",
        "Droplet selection figure (PDF)",
        "Selection comparison figure (PNG)",
        "Selection comparison figure (PDF)"
    )
))

File	Description
parameters.json	Parameters set and used in this analysis
barcodes_Org1.tsv	Selected barcodes for organoid 1
barcodes_Org2.tsv	Selected barcodes for organoid 2
barcodes_Org3.tsv	Selected barcodes for organoid 3
droplet-selection.png	Droplet selection figure (PNG)
droplet-selection.pdf	Droplet selection figure (PDF)
selection-comparison.png	Selection comparison figure (PNG)
selection-comparison.pdf	Selection comparison figure (PDF)

Session information

devtools::session_info()

─ Session info ──────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 os       CentOS release 6.7 (Final)  
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Australia/Melbourne         
 date     2019-04-03                  

─ Packages ──────────────────────────────────────────────────────────────
 ! package              * version   date       lib source        
   assertthat             0.2.0     2017-04-11 [1] CRAN (R 3.5.0)
   backports              1.1.3     2018-12-14 [1] CRAN (R 3.5.0)
   bindr                  0.1.1     2018-03-13 [1] CRAN (R 3.5.0)
   bindrcpp               0.2.2     2018-03-29 [1] CRAN (R 3.5.0)
   Biobase              * 2.42.0    2018-10-30 [1] Bioconductor  
   BiocGenerics         * 0.28.0    2018-10-30 [1] Bioconductor  
   BiocParallel         * 1.16.5    2019-01-04 [1] Bioconductor  
   bitops                 1.0-6     2013-08-17 [1] CRAN (R 3.5.0)
   broom                  0.5.1     2018-12-05 [1] CRAN (R 3.5.0)
   callr                  3.1.1     2018-12-21 [1] CRAN (R 3.5.0)
   cellranger             1.1.0     2016-07-27 [1] CRAN (R 3.5.0)
   cli                    1.0.1     2018-09-25 [1] CRAN (R 3.5.0)
   colorspace             1.4-0     2019-01-13 [1] CRAN (R 3.5.0)
   cowplot              * 0.9.4     2019-01-08 [1] CRAN (R 3.5.0)
   crayon                 1.3.4     2017-09-16 [1] CRAN (R 3.5.0)
   DelayedArray         * 0.8.0     2018-10-30 [1] Bioconductor  
   desc                   1.2.0     2018-05-01 [1] CRAN (R 3.5.0)
   devtools               2.0.1     2018-10-26 [1] CRAN (R 3.5.0)
   digest                 0.6.18    2018-10-10 [1] CRAN (R 3.5.0)
   dplyr                * 0.7.8     2018-11-10 [1] CRAN (R 3.5.0)
   DropletUtils         * 1.2.2     2019-01-04 [1] Bioconductor  
   edgeR                  3.24.3    2019-01-02 [1] Bioconductor  
   evaluate               0.12      2018-10-09 [1] CRAN (R 3.5.0)
   forcats              * 0.3.0     2018-02-19 [1] CRAN (R 3.5.0)
   fs                     1.2.6     2018-08-23 [1] CRAN (R 3.5.0)
   generics               0.0.2     2018-11-29 [1] CRAN (R 3.5.0)
   GenomeInfoDb         * 1.18.1    2018-11-12 [1] Bioconductor  
   GenomeInfoDbData       1.2.0     2019-01-15 [1] Bioconductor  
   GenomicRanges        * 1.34.0    2018-10-30 [1] Bioconductor  
   ggplot2              * 3.1.0     2018-10-25 [1] CRAN (R 3.5.0)
   git2r                  0.24.0    2019-01-07 [1] CRAN (R 3.5.0)
   glue                   1.3.0     2018-07-17 [1] CRAN (R 3.5.0)
   gridExtra              2.3       2017-09-09 [1] CRAN (R 3.5.0)
   gtable                 0.2.0     2016-02-26 [1] CRAN (R 3.5.0)
   haven                  2.0.0     2018-11-22 [1] CRAN (R 3.5.0)
   HDF5Array              1.10.1    2018-12-05 [1] Bioconductor  
   here                   0.1       2017-05-28 [1] CRAN (R 3.5.0)
   hms                    0.4.2     2018-03-10 [1] CRAN (R 3.5.0)
   htmltools              0.3.6     2017-04-28 [1] CRAN (R 3.5.0)
   httr                   1.4.0     2018-12-11 [1] CRAN (R 3.5.0)
   IRanges              * 2.16.0    2018-10-30 [1] Bioconductor  
   jsonlite               1.6       2018-12-07 [1] CRAN (R 3.5.0)
   knitr                  1.21      2018-12-10 [1] CRAN (R 3.5.0)
 P lattice                0.20-35   2017-03-25 [5] CRAN (R 3.5.0)
   lazyeval               0.2.1     2017-10-29 [1] CRAN (R 3.5.0)
   limma                  3.38.3    2018-12-02 [1] Bioconductor  
   locfit                 1.5-9.1   2013-04-20 [1] CRAN (R 3.5.0)
   lubridate              1.7.4     2018-04-11 [1] CRAN (R 3.5.0)
   magrittr               1.5       2014-11-22 [1] CRAN (R 3.5.0)
 P Matrix                 1.2-14    2018-04-09 [5] CRAN (R 3.5.0)
   matrixStats          * 0.54.0    2018-07-23 [1] CRAN (R 3.5.0)
   memoise                1.1.0     2017-04-21 [1] CRAN (R 3.5.0)
   modelr                 0.1.3     2019-02-05 [1] CRAN (R 3.5.0)
   munsell                0.5.0     2018-06-12 [1] CRAN (R 3.5.0)
 P nlme                   3.1-137   2018-04-07 [5] CRAN (R 3.5.0)
   pillar                 1.3.1     2018-12-15 [1] CRAN (R 3.5.0)
   pkgbuild               1.0.2     2018-10-16 [1] CRAN (R 3.5.0)
   pkgconfig              2.0.2     2018-08-16 [1] CRAN (R 3.5.0)
   pkgload                1.0.2     2018-10-29 [1] CRAN (R 3.5.0)
   plyr                   1.8.4     2016-06-08 [1] CRAN (R 3.5.0)
   prettyunits            1.0.2     2015-07-13 [1] CRAN (R 3.5.0)
   processx               3.2.1     2018-12-05 [1] CRAN (R 3.5.0)
   ps                     1.3.0     2018-12-21 [1] CRAN (R 3.5.0)
   purrr                * 0.3.0     2019-01-27 [1] CRAN (R 3.5.0)
   R.methodsS3            1.7.1     2016-02-16 [1] CRAN (R 3.5.0)
   R.oo                   1.22.0    2018-04-22 [1] CRAN (R 3.5.0)
   R.utils                2.7.0     2018-08-27 [1] CRAN (R 3.5.0)
   R6                     2.3.0     2018-10-04 [1] CRAN (R 3.5.0)
   Rcpp                   1.0.0     2018-11-07 [1] CRAN (R 3.5.0)
   RCurl                  1.95-4.11 2018-07-15 [1] CRAN (R 3.5.0)
   readr                * 1.3.1     2018-12-21 [1] CRAN (R 3.5.0)
   readxl                 1.2.0     2018-12-19 [1] CRAN (R 3.5.0)
   remotes                2.0.2     2018-10-30 [1] CRAN (R 3.5.0)
   rhdf5                  2.26.2    2019-01-02 [1] Bioconductor  
   Rhdf5lib               1.4.2     2018-12-03 [1] Bioconductor  
   rlang                  0.3.1     2019-01-08 [1] CRAN (R 3.5.0)
   rmarkdown              1.11      2018-12-08 [1] CRAN (R 3.5.0)
   rprojroot              1.3-2     2018-01-03 [1] CRAN (R 3.5.0)
   rstudioapi             0.9.0     2019-01-09 [1] CRAN (R 3.5.0)
   rvest                  0.3.2     2016-06-17 [1] CRAN (R 3.5.0)
   S4Vectors            * 0.20.1    2018-11-09 [1] Bioconductor  
   scales                 1.0.0     2018-08-09 [1] CRAN (R 3.5.0)
   sessioninfo            1.1.1     2018-11-05 [1] CRAN (R 3.5.0)
   SingleCellExperiment * 1.4.1     2019-01-04 [1] Bioconductor  
   stringi                1.2.4     2018-07-20 [1] CRAN (R 3.5.0)
   stringr              * 1.3.1     2018-05-10 [1] CRAN (R 3.5.0)
   SummarizedExperiment * 1.12.0    2018-10-30 [1] Bioconductor  
   testthat               2.0.1     2018-10-13 [4] CRAN (R 3.5.0)
   tibble               * 2.0.1     2019-01-12 [1] CRAN (R 3.5.0)
   tidyr                * 0.8.2     2018-10-28 [1] CRAN (R 3.5.0)
   tidyselect             0.2.5     2018-10-11 [1] CRAN (R 3.5.0)
   tidyverse            * 1.2.1     2017-11-14 [1] CRAN (R 3.5.0)
   UpSetR               * 1.3.3     2017-03-21 [1] CRAN (R 3.5.0)
   usethis                1.4.0     2018-08-14 [1] CRAN (R 3.5.0)
   whisker                0.3-2     2013-04-28 [1] CRAN (R 3.5.0)
   withr                  2.1.2     2018-03-15 [1] CRAN (R 3.5.0)
   workflowr              1.1.1     2018-07-06 [1] CRAN (R 3.5.0)
   xfun                   0.4       2018-10-23 [1] CRAN (R 3.5.0)
   xml2                   1.2.0     2018-01-24 [1] CRAN (R 3.5.0)
   XVector                0.22.0    2018-10-30 [1] Bioconductor  
   yaml                   2.2.0     2018-07-25 [1] CRAN (R 3.5.0)
   zlibbioc               1.28.0    2018-10-30 [1] Bioconductor  

[1] /group/bioi1/luke/analysis/phd-thesis-analysis/packrat/lib/x86_64-pc-linux-gnu/3.5.0
[2] /group/bioi1/luke/analysis/phd-thesis-analysis/packrat/lib-ext/x86_64-pc-linux-gnu/3.5.0
[3] /group/bioi1/luke/analysis/phd-thesis-analysis/packrat/lib-R/x86_64-pc-linux-gnu/3.5.0
[4] /home/luke.zappia/R/x86_64-pc-linux-gnu-library/3.5
[5] /usr/local/installed/R/3.5.0/lib64/R/library

 P ── Loaded and on-disk path mismatch.

This reproducible R Markdown analysis was created with workflowr 1.1.1

Pre-processing