The sina plot is a data visualization chart suitable for plotting any single variable in a multiclass dataset. It is an enhanced jitter strip chart, where the width of the jitter is controlled by the density distribution of the data within each class.

stat_sina(mapping = NULL, data = NULL, geom = "sina",
position = "dodge", scale = "area", method = "density",
bw = "nrd0", kernel = "gaussian", maxwidth = NULL, adjust = 1,
bin_limit = 1, binwidth = NULL, bins = NULL, seed = NA, ...,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

geom_sina(mapping = NULL, data = NULL, stat = "sina",
position = "dodge", ..., na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE)

## Details

There are two available ways to define the x-axis borders for the samples to spread within:

• method == "density" A density kernel is estimated along the y-axis for every sample group, and the samples are spread within that curve. In effect this means that points will be positioned randomly within a violin plot with the same parameters.

• method == "counts": The borders are defined by the number of samples that occupy the same bin.

## Aesthetics

geom_sina understand the following aesthetics (required aesthetics are in bold):

• x

• y

• color

• group

• size

• alpha

## Computed variables

density

The density or sample counts per bin for each point

scaled

density scaled by the maximum density in each group

n

The number of points in the group the point belong to

## Examples

ggplot(midwest, aes(state, area)) + geom_point()
# Boxplot and Violin plots convey information on the distribution but not the # number of samples, while Jitter does the opposite. ggplot(midwest, aes(state, area)) + geom_violin()
ggplot(midwest, aes(state, area)) + geom_jitter()
# Sina does both! ggplot(midwest, aes(state, area)) + geom_violin() + geom_sina()
p <- ggplot(midwest, aes(state, popdensity)) + scale_y_log10() p + geom_sina()
# Colour the points based on the data set's columns p + geom_sina(aes(colour = inmetro))
# Or any other way cols <- midwest$popdensity > 10000 p + geom_sina(colour = cols + 1L) # Sina plots with continuous x: ggplot(midwest, aes(cut_width(area, 0.02), popdensity)) + geom_sina() + scale_y_log10() ### Sample gaussian distributions # Unimodal a <- rnorm(500, 6, 1) b <- rnorm(400, 5, 1.5) # Bimodal c <- c(rnorm(200, 3, .7), rnorm(50, 7, 0.4)) # Trimodal d <- c(rnorm(200, 2, 0.7), rnorm(300, 5.5, 0.4), rnorm(100, 8, 0.4)) df <- data.frame( 'Distribution' = c( rep('Unimodal 1', length(a)), rep('Unimodal 2', length(b)), rep('Bimodal', length(c)), rep('Trimodal', length(d)) ), 'Value' = c(a, b, c, d) ) # Reorder levels df$Distribution <- factor( df$Distribution, levels(df$Distribution)[c(3, 4, 1, 2)] ) p <- ggplot(df, aes(Distribution, Value)) p + geom_boxplot()
p + geom_violin() + geom_sina()
# By default, Sina plot scales the width of the class according to the width # of the class with the highest density. Turn group-wise scaling off with: p + geom_violin() + geom_sina(scale = FALSE)