scatterplot with no x variable - r

My data set has a response variable and a 2-level factor explanatory variable. Is there a function for creating a scatter plot with no x axis variable? I'd like the variables to be randomly spread out along the x axis to make them easier to see and differentiate the 2 groups by color. I'm able to create a plot by creating an "ID" variable, but I'm wondering if it's possible to do it without it? The "ID" variable is causing problems when I try to add + facet_grid(. ~ other.var) to view the same plot broken out by another factor variable.
#Create dummy data set
response <- runif(500)
group <- c(rep('group1',250), rep('group2',250))
ID <- c(seq(from=1, to=499, by=2), seq(from=2, to=500, by=2))
data <- data.frame(ID, group, response)
#plot results
ggplot() +
geom_point(data=data, aes(x=ID, y=response, color=group))

How about using geom_jitter, setting the x axis to some fixed value?
ggplot() +
geom_jitter(data=data, aes(x=1, y=response, color=group))

You could plot x as the row number?
ggplot() +
geom_point(data=data, aes(x=1:nrow(data), y=response, color=group))
Or randomly order it first?
RandomOrder <- sample(1:nrow(data), nrow(data))
ggplot() +
geom_point(data=data, aes(x= RandomOrder, y=response, color=group))

Here's how you can scatter plot a variable against row index without intermediate variable:
ggplot(data = data, aes(y = response, x = seq_along(response), color = group)) +
geom_point()
To shuffle row index just add a sample function, like this:
ggplot(data = data, aes(y = response, x = sample(seq_along(response)), color = group)) +
geom_point()

Related

subgroups for discrete x Axis in ggplot2

I would like to create a sub grouping in a ggplot2 (geom_point), meaning that I would like to shift discrete x values slightly according to a subgroup (see Figure).
I could achieve that by changing the discrete values to continuous and add a subgroup dependent shift value (see Fig.B), and than manually adjust the x labels. But I thought there is probably a more elegant way which deals with spacing and labeling issues.Below is a minimal example which hopefully describes what I mean.
library(ggplot2)
set.seed(1)
df <- data.frame(
ID = rep(seq(1,8),2),
group = rep(LETTERS[1:4],4),
subgroup = c(rep("a",8),rep("b",8)),
value = runif(16)
)
df$xpos <- as.numeric(df$group)+(as.numeric(df$subgroup)/4)
ggplot(data=df, aes(x=group, y= value, color=subgroup))+
geom_point()+
ggtitle("How it is")
ggplot(data=df, aes(x=xpos, y= value, color=subgroup))+
geom_point() +
ggtitle("How I would like it (without adjusted xAxes Labels)")
We can use position_dodge:
library(ggplot2)
ggplot(data=df, aes(x=group, y= value, color=subgroup))+
geom_point(position=position_dodge(width=0.5))+
ggtitle("How it is")
Data
set.seed(1)
df <- data.frame(
ID = rep(seq(1,8),2),
group = rep(LETTERS[1:4],4),
subgroup = c(rep("a",8),rep("b",8)),
value = runif(16)
)

Density over histogram using ggplot2

I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).
ggplot(kz6, aes(x = values)) +
geom_histogram()
However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.
density <- ggplot(kz6, aes(x = x, fill = factor(sex))) +
geom_density()
both <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density()
both_2 <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density(aes(x = kz6[kz6$sex == 1,]))
P.S. some examples contains y=..density.. what does it mean? How to interpret this?
To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.
ggplot(kz6, aes(x = values)) +
geom_histogram(aes(y = ..density..), bins = 20) +
geom_density(aes(group = sex, colour = sex), adjust = 2)
Data creation code.
I will create a test data set from built-in data set iris.
kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)

How to use sec_axis() for discrete data in ggplot2 R?

I have discreet data that looks like this:
height <- c(1,2,3,4,5,6,7,8)
weight <- c(100,200,300,400,500,600,700,800)
person <- c("Jack","Jim","Jill","Tess","Jack","Jim","Jill","Tess")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,person,height,weight)
I'm trying to plot a graph with same x-axis(person), and 2 different y-axis (weight and height). All the examples, I find is trying to plot the secondary axis (sec_axis), or discreet data using base plots.
Is there an easy way to use sec_axis for discreet data on ggplot2?
Edit: Someone in the comments suggested I try the suggested reply. However, I run into this error now
Here is my current code:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("height",sec_axis(~.*1.2, name="height"))
p2
I get the error: Error in x < range[1] :
comparison (3) is possible only for atomic and list types
Alternately, now I have modified the example to match this example posted.
p <- ggplot(dat, aes(x = person))
p <- p + geom_line(aes(y = height, colour = "Height"))
# adding the relative weight data, transformed to match roughly the range of the height
p <- p + geom_line(aes(y = weight/100, colour = "Weight"))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*100, name = "Relative weight [%]"))
# modifying colours and theme options
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(y = "Height [inches]",
x = "Person",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9))+ facet_wrap(~set, scales="free")
p
I get an error that says
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
I get the template, but no points get plotted
R function arguments are fed in by position if argument names are not specified explicitly. As mentioned by #Z.Lin in the comments, you need sec.axis= before your sec_axis function to indicate that you are feeding this function into the sec.axis argument of scale_y_continuous. If you don't do that, it will be fed into the second argument of scale_y_continuous, which by default, is breaks=. The error message is thus related to you not feeding in an acceptable data type for the breaks argument:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("weight", sec.axis = sec_axis(~.*1.2, name="height"))
p2
The first argument (name=) of scale_y_continuous is for the first y scale, where as the sec.axis= argument is for the second y scale. I changed your first y scale name to correct that.

Adding multiple points to a ggplot ecdf plot

I'm trying to generate a ggplot only C.D.F. plot for some of my data. I am also looking to be able to plot an arbitrary number of percentiles as points on top. I have a solution that works for adding a single point to my curve but fails for multiple values.
This works for plotting one percentile value
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.5)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
However this fails
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.25,0.5,0.75)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
With error
Error: Aesthetics must be either length 1 or the same as the data (1000): x, y
How can I add an arbitrary number of points to a stat_ecdf() plot?
You need to define a new dataset, outside of the aesthetics. aes refers to the original dataframe that you used for making the CDF (in the original ggplot argument).
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(data = data.frame(x=quantile(TestDf$Values, percentiles),
y=percentiles), aes(x=x, y=y))

Violin plots with additional points

Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")

Resources