Creates a plot of a clustering tree overlaid on a scatter plot of individual samples.
Usage
clustree_overlay(x, ...)
# S3 method for matrix
clustree_overlay(
x,
prefix,
metadata,
x_value,
y_value,
suffix = NULL,
count_filter = 0,
prop_filter = 0.1,
node_colour = prefix,
node_colour_aggr = NULL,
node_size = "size",
node_size_aggr = NULL,
node_size_range = c(4, 15),
node_alpha = 1,
node_alpha_aggr = NULL,
edge_width = 1,
use_colour = c("edges", "points"),
alt_colour = "black",
point_size = 3,
point_alpha = 0.2,
point_shape = 18,
label_nodes = FALSE,
label_size = 3,
plot_sides = FALSE,
side_point_jitter = 0.45,
side_point_offset = 1,
...
)
# S3 method for data.frame
clustree_overlay(x, prefix, ...)
# S3 method for SingleCellExperiment
clustree_overlay(
x,
prefix,
x_value,
y_value,
exprs = "counts",
red_dim = NULL,
...
)
# S3 method for seurat
clustree_overlay(
x,
x_value,
y_value,
prefix = "res.",
exprs = c("data", "raw.data", "scale.data"),
red_dim = NULL,
...
)
# S3 method for Seurat
clustree_overlay(
x,
x_value,
y_value,
prefix = paste0(assay, "_snn_res."),
exprs = c("data", "counts", "scale.data"),
red_dim = NULL,
assay = NULL,
...
)
Arguments
- x
object containing clustering data
- ...
extra parameters passed to other methods
- prefix
string indicating columns containing clustering information
- metadata
data.frame containing metadata on each sample that can be used as node aesthetics
- x_value
numeric metadata column to use as the x axis
- y_value
numeric metadata column to use as the y axis
- suffix
string at the end of column names containing clustering information
- count_filter
count threshold for filtering edges in the clustering graph
- prop_filter
in proportion threshold for filtering edges in the clustering graph
- node_colour
either a value indicating a colour to use for all nodes or the name of a metadata column to colour nodes by
- node_colour_aggr
if
node_colour
is a column name than a string giving the name of a function to aggregate that column for samples in each cluster- node_size
either a numeric value giving the size of all nodes or the name of a metadata column to use for node sizes
- node_size_aggr
if
node_size
is a column name than a string giving the name of a function to aggregate that column for samples in each cluster- node_size_range
numeric vector of length two giving the maximum and minimum point size for plotting nodes
- node_alpha
either a numeric value giving the alpha of all nodes or the name of a metadata column to use for node transparency
- node_alpha_aggr
if
node_aggr
is a column name than a string giving the name of a function to aggregate that column for samples in each cluster- edge_width
numeric value giving the width of plotted edges
- use_colour
one of "edges" or "points" specifying which element to apply the colour aesthetic to
- alt_colour
colour value to be used for edges or points (whichever is NOT given by
use_colour
)- point_size
numeric value giving the size of sample points
- point_alpha
numeric value giving the alpha of sample points
- point_shape
numeric value giving the shape of sample points
- label_nodes
logical value indicating whether to add labels to clustering graph nodes
- label_size
numeric value giving the size of node labels is
label_nodes
isTRUE
- plot_sides
logical value indicating whether to produce side on plots
- side_point_jitter
numeric value giving the y-direction spread of points in side plots
- side_point_offset
numeric value giving the y-direction offset for points in side plots
- exprs
source of gene expression information to use as node aesthetics, for
SingleCellExperiment
objects it must be a name inassayNames(x)
, for aseurat
object it must be one ofdata
,raw.data
orscale.data
and for aSeurat
object it must be one ofdata
,counts
orscale.data
- red_dim
dimensionality reduction to use as a source for x_value and y_value
- assay
name of assay to pull expression and clustering data from for
Seurat
objects
Details
Data sources
Plotting a clustering tree requires information about which cluster each
sample has been assigned to at different resolutions. This information can
be supplied in various forms, as a matrix, data.frame or more specialised
object. In all cases the object provided must contain numeric columns with
the naming structure PXS
where P
is a prefix indicating that the column
contains clustering information, X
is a numeric value indicating the
clustering resolution and S
is any additional suffix to be removed. For
SingleCellExperiment
objects this information must be in the colData
slot
and for Seurat
objects it must be in the meta.data
slot. For all objects
except matrices any additional columns can be used as aesthetics.
Filtering
Edges in the graph can be filtered by adjusting the count_filter
and
prop_filter
parameters. The count_filter
removes any edges that represent
less than that number of samples, while the prop_filter
removes edges that
represent less than that proportion of cells in the node it points towards.
Node aesthetics
The aesthetics of the plotted nodes can be controlled in various ways. By
default the colour indicates the clustering resolution, the size indicates
the number of samples in that cluster and the transparency is set to 100%.
Each of these can be set to a specific value or linked to a supplied metadata
column. For a SingleCellExperiment
or Seurat
object the names of genes
can also be used. If a metadata column is used than an aggregation function
must also be supplied to combine the samples in each cluster. This function
must take a vector of values and return a single value.
Colour aesthetic
The colour aesthetic can be applied to either edges or sample points by
setting use_colour
. If "edges" is selected edges will be coloured according
to the clustering resolution they originate at. If "points" is selected they
will be coloured according to the cluster they are assigned to at the highest
resolution.
Dimensionality reductions
For SingleCellExperiment
and Seurat
objects precomputed dimensionality
reductions can be used for x or y aesthetics. To do so red_dim
must be set
to the name of a dimensionality reduction in reducedDimNames(x)
(for a
SingleCellExperiment
) or x@dr
(for a Seurat
object). x_value
and
y_value
can then be set to red_dimX
when red_dim
matches the red_dim
argument and X
is the column of the dimensionality reduction to use.
Examples
data(nba_clusts)
clustree_overlay(nba_clusts, prefix = "K", x_value = "PC1", y_value = "PC2")