I am trying to display some data, where I don't only need to display a point using geom_point, but also want to trace a line to it from the axis. I figured I can do it with geom_segment, but I want to display a sequence of discrete dots instead.
Say I have a data like this:
df2 <- data_frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
# A tibble: 4 × 2
x y
<chr> <int>
1 a 3
2 b 4
3 c 5
4 d 6
What I want to get is like the graph below, only having a dot in each of 4 variables between 0 and their value (with the desired points marked manually in red):
ggplot(df2, aes(x=x)) + geom_point(aes(y=y)) + geom_point(aes(y=0))
This works... you could wrap it up in a function to make it more generalizable if needed.
First we use expand.grid to create all combinations of x and 1:(max(y) - 1), join it to the original data, and filter out the unnecessary ones.
library(dplyr)
df3 = left_join(expand.grid(x = unique(df2$x), i = 1:max(df2$y - 1)),
df2) %>%
filter(i < y)
Once the data is constructed, the plotting is easy:
ggplot(df2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_point(y = 0) +
geom_point(data = df3, aes(y = i), color = "red") +
expand_limits(y = 0)
I'm not sure if you actually want the dots to be red - if you want them to all look the same then you could use 1:max(df2$y) (omit the -1) and use <= in the filter to and then only use the resulting data frame.
If you wanted to use a data.table approach, using a similar expansion methodology you could use:
dt <- setDT(df2)
dt_expand<-dt[rep(seq(nrow(dt)),dt$y),]
dt_expand[,y2:=(1:.N),by=.(x)]
ggplot(dt_expand, aes(x=x)) + geom_point(aes(y=y2)) + geom_point(aes(y=0))
Note I didn't include the red coloring, but that is easily done if you want it
Here a solution in base R. The idea is to create 2 different datasets , one for red points:
dat1 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=seq(0,y)),df2$x,df2$y))
And another for the black points
dat2 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=c(0,y)),df2$x,df2$y))
Then the plot is just the juxtopsition of 2 layers of the same plot but with different datas:
library(ggplot2)
ggplot(data=dat1,aes(x=x,y=y)) +
geom_point(col="red") +
geom_point(data=dat2)
Yet another option, which is similar to #Gregor's, in that it's creating a new data vector.
d <- data.frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
new_points <- mapply(seq, 0, d$y)
new <- data.frame(new = unlist(lapply(new_points, as.data.frame)),
x = rep(letters[1:4], d$y + 1),
group = 1)
d <- merge(d, new, by = "x")
d$group <- as.factor(ifelse(d$y == d$new|d$new == 0, 2, d$group))
ggplot(d, aes(x, new, color = group)) +
geom_point() +
scale_color_manual(values = c("red", "black")) +
theme(legend.position = "none")
Related
I'd like to insert another column value of my data into a gganimate animation title.
Example, here the states level variable is x and I'd like to add to title variable y:
df <- tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
df
A tibble: 10 x 2
x y
<int> <chr>
1 1 a
2 2 a
3 3 b
4 4 d
5 5 c
6 6 a
7 7 b
8 8 c
9 9 d
10 10 e
This works as expected:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
This fails:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}, another_var: {y}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'y' not found
Also tried this, but y will not change:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = str_c('{closest_state}, another_var: ', df$y)) +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Another option is to map y as the states level variable and use the frame variable instead of x, but in my application y is either a not-necessarily-unique character variable like above, or it is a numeric variable but again not-necessarily-unique and not-necessarily-ordered. In which case gganimate (or ggplot?) will order it as it sees fit, making the final result weird not ordered by x:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{frame}, another_var: {closest_state}') +
transition_states(y,
transition_length = 0.1,
state_length = 0.1)
So how to simply add the changing value of the un-ordered, not numeric, y variable?
Finally: This question was asked here but without a reproducible example so it was not answered, hoping this one is better.
One dirty solution would be to paste together the variables and make a new one to use in the transition_states:
df <- mutate(df, title_var = factor(paste(x, y, sep="-"), levels = paste(x, y, sep="-")))
# # A tibble: 6 x 3
# x y title_var
# <int> <chr> <fct>
# 1 1 a 1-a
# 2 2 a 2-a
# 3 3 b 3-b
# 4 4 d 4-d
# 5 5 c 5-c
# 6 6 a 6-a
Then we could use gsub() in ordet to strip closest_state from the unwanted part, like this:
gsub(pattern = "\\d+-", replacement = "", "1-a")
"a"
So:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{gsub(pattern = "\\d+-", replacement = "", closest_state)}') +
transition_states(title_var, transition_length = 0.1, state_length = 0.1)
Another possibility, slightly more compact, from the author of gganimate himself, following the issue I opened:
https://github.com/thomasp85/gganimate/issues/252#issuecomment-450846868
According to Thomas:
There are multiple reasons why random columns from the input data
cannot be accessed so it is unlikely to get any better than this...
Here's a solution using dplyr, based on the gganimate developer Thomas's solution, provided by Giora.
library(tidyverse)
library(gganimate)
df <- tibble::tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
a <- ggplot(df, aes(x, x)) +
geom_point() +
labs(title = "{closest_state}, another_var: {df %>% filter(x == closest_state) %>% pull(y)}") +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
animate(a)
The gganimate titles use glue syntax for the animated title elements, and you can include entire dplyr data manipulation pipelines within them.
You can refer to the closest_state variable provided by gganimate::transition_states() within your dplyr calls. Here, since the animation's frames are indexed by successive levels of x, I use filter() to subset df for a given frame based on the value of x and then refer to corresponding rows of column y, which contain additional information I'd like to display in the title. Using pull, you can grab the individual value of y corresponding to x and display it within the animation's title.
This is a clean and straightforward way to do it with the advantage that you can, e.g., compute summary values to display on-the-fly by adding summarize() and other calls in your magrittr pipeline.
I am trying to present the following data
x <- factor(c(1,2,3,4,5))
x
[1] 1 2 3 4 5
Levels: 1 2 3 4 5
value <- c(10,5,7,4,12)
value
[1] 10 5 7 4 12
y <- data.frame(x, value)
y
x value
1 1 10
2 2 5
3 3 7
4 4 4
5 5 12
I want to convert the above information into the following graphical representation
What is the above type of graphs called. I checked out dot plot, but that only stacks vertically.
This solution plots sets of three bar graphs facetted by x. The height of the bars within each set is determined using the remainder from dividing value by 3. Horizontal spacing is provided by natural geom spacing. Vertical spacing is created using white gridlines.
library(ggplot2)
library(reshape2)
Data
dataset <- data.frame('x' = 1:5, 'value' = c(10, 5, 7, 4, 12))
Since every value is supposed to be represented by three bars, we will add 3 columns to the dataset and distribute the magnitude of the value among them using integer division:
dataset[, c('col1', 'col2', 'col3')] <- floor(dataset$value / 3)
r <- dataset$value %% 3
dataset[r == 1, 'col1'] <- dataset[dataset$value %% 3 == 1, 'col1'] + 1
dataset[r == 2, c('col1', 'col2')] <- dataset[r == 2, c('col1', 'col2')] + 1
Now, we will melt the dataframe for the purposes of plotting:
dataset <- melt(dataset, id.vars = c('x', 'value'))
colnames(dataset)[4] <- 'magnitude' # avoiding colnames conflict
dataset$variable <- as.character(dataset$variable) # column ordering within a facet
Plot
First, we will make a regular bar graph. We can move facet labels to the bottom of the plot area using the switch parameter.
plt <- ggplot(data = dataset)
plt <- plt + geom_col(aes(x=variable, y = magnitude), fill = 'black')
plt <- plt + facet_grid(.~x, switch="both")
Then we will use theme_minimal() and add a few tweaks to the parameters that govern the appearance of gridlines. Specifically, we will make sure that minor XY gridlines and major X gridlines are blank, whereas major Y gridlines are white and plotted on top of the data.
plt <- plt + theme_minimal()
plt <- plt + theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(colour = "white", size = 1.5),
panel.grid.minor = element_blank(),
panel.ontop = TRUE)
We can add value labels using geom_text(). We will only use x values from col2 records such that we're not plotting the value over each bar within each set (col2 happens to be the middle bar).
plt <- plt + geom_text(data = dataset[dataset$variable == 'col2', ],
aes(label = value, x = variable, y = magnitude + 0.5))
plt <- plt + theme(axis.text.x=element_blank()) # removing the 'col' labels
plt + xlab('x') + ylab('value')
The following code will do a graph similar to the one in the question.
I had to change the data.frame, yours was not fit to graph with geom_dotplot. The new variable z$value is a vector of the values 1:5 each repeated as many times as value.
library(ggplot2)
value <- c(10, 5, 7, 4, 12)
z <- sapply(value, function(v) c(1, rep(0, v - 1)))
z <- cumsum(unlist(z))
z <- data.frame(value = z)
ggplot(z, aes(x = jitter(value))) +
geom_dotplot() +
xlab("value")
Say I have the following data frame:
# Dummy data frame
df <- data.frame(x = rep(1:5, 2), y = runif(10), z = rep(c("A", "B"), each = 5))
# x y z
# 1 1 0.92024937 A
# 2 2 0.37246007 A
# 3 3 0.76632809 A
# 4 4 0.03418754 A
# 5 5 0.33770400 A
# 6 1 0.15367174 B
# 7 2 0.78498276 B
# 8 3 0.03341913 B
# 9 4 0.77484244 B
# 10 5 0.13309999 B
I'd like to plot cases where z == "A" as points and cases where z == "B" as lines. Simple enough.
library(ggplot2)
# Plot data
g <- ggplot()
g <- g + geom_point(data = df %>% filter(z == "A"), aes(x = x, y = y))
g <- g + geom_line(data = df %>% filter(z == "B"), aes(x = x, y = y))
g
My data frame and aesthetic for the points and lines are identical, so this seems a bit verbose – especially if I want to do this lots of times (e.g., z == "A" through z == "Z"). Is there a way that I could state ggplot(df, aes(x = x, y = y)) and then subsequently state my filtering or subsetting criteria within the appropriate geoms?
I find the example in the question itself the most readable, although verbose. The second part of the question about dealing with more cases just requires a more sophisticated test in filter using for example %in% (or grep, grepl, etc.) when dealing with multiple cases. Taking advantage of the possibility of accessing default plot data within a layer, and as mentioned by #MrFlick moving the mapping of aesthetics out of the individual layers results in more concise code. All earlier answers get the plot done, so in this respect my answer is not better than any of them...
library(ggplot2)
library(dplyr)
df <- data.frame(x = rep(1:5, 4),
y = runif(20),
z = rep(c("A", "B", "C", "Z"), each = 5))
g <- ggplot(data = df, aes(x = x, y = y)) +
geom_point(data = . %>% filter(z %in% c("A", "B", "C"))) +
geom_line(data = . %>% filter(z == "Z"))
g
Another option would be to spread the data and then just supply the y aesthetic.
library(tidyverse)
df %>% spread(z,y) %>%
ggplot(aes(x = x))+
geom_point(aes(y = A))+
geom_line(aes(y = B))
You can plot lines and points for all z records, but remove unwanted lines and points with passing NA to scale_linetype_manual and scale_shape_manual:
library(ggplot2)
ggplot(df, aes(x, y, linetype = z, shape = z)) +
geom_line() +
geom_point() +
scale_linetype_manual(values = c(1, NA)) +
scale_shape_manual(values = c(NA, 16))
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.
Suppose I have the following data frames:
df1 = data.frame(c11 = c(1:5), c12 = c(1:5))
df2 = data.frame(c21 = c(1:5), c22 = (c(1:5))^0.5)
df3 = data.frame(c31 = c(1:5), c32 = (c(1:5))^2)
I want to plot these as lines in the same plot/panel. I can do this by
p <- ggplot() + geom_line(data=df1, aes(x=c11, y = c12)) +
geom_line(data=df2, aes(x=c21,y=c22)) +
geom_line(data=df3, aes(x=c31, c32))
All these will be black. If I want them in a different color, I can specify the color explicitly as an argument to geom_line(). My question is can I specify a list of a few colors, say 5 colors, such as, red, blue, green, orange, gray, and use that list so that I do not have to explicitly specify the colors as an argument to geom_line() in case of each line. If the plot p contains 2 geom_line() statements then it will color them red and blue respectively. If it contains 3 geom_line statements, it will color them red, blue and green. Finally, how can I specify the legend for these plots. Even if I can give the colors as a vector at the end of p that would be great. Please let me know if the question is not clear.
Thanks.
ggplot2 works best if you work with a melted data.frame that contains a different column to specify the different aesthetics. Melting is easier with common column names, so I'd start there. Here are the steps I'd take:
rename the columns
melt the data which adds a new variables that we'll map to the colour aesthetic
define your colour vector
Specify the appropriate scale with scale_colour_manual
'
names(df1) <- c("x", "y")
names(df2) <- c("x", "y")
names(df3) <- c("x", "y")
newData <- melt(list(df1 = df1, df2 = df2, df3 = df3), id.vars = "x")
#Specify your colour vector
cols <- c("red", "blue", "green", "orange", "gray")
#Plot data and specify the manual scale
ggplot(newData, aes(x, value, colour = L1)) +
geom_line() +
scale_colour_manual(values = cols)
Edited for clarity
The structure of newData:
'data.frame': 15 obs. of 4 variables:
$ x : int 1 2 3 4 5 1 2 3 4 5 ...
$ variable: Factor w/ 1 level "y": 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 1 2 3 4 5 ...
$ L1 : chr "df1" "df1" "df1" "df1" ...
And the plot itself:
You dont have to melt, group or gather. Its pretty simple. Just add the color to the geom_line
library(tidyverse)
df1 = data.frame(c11 = c(1:5), c12 = c(1:5))
df2 = data.frame(c21 = c(1:5), c22 = (c(1:5))^0.5)
df3 = data.frame(c31 = c(1:5), c32 = (c(1:5))^2)
p <- ggplot() + geom_line(data=df1, aes(x=c11, y = c12), color= "red") +
geom_line(data=df2, aes(x=c21,y=c22), color = "blue") +
geom_line(data=df3, aes(x=c31, c32), color = "green")
p
These sorts of questions become much easier to solve if you adjust your thinking to the way that ggplot2 approaches graphics. ggplot2 is organized around the idea that everything that appears in your graph should (in principle) exist as a column in your data frame. (There are exceptions, of course, but this is the general idea.)
So your attempt to build this graph piece by piece, one line at a time, each coming from different data frames and then assigning colors to them is very un-ggplot2ish. If you want to label things in your graph with different colors, your first thought should always be:
How can I encode this color labeling information as a variable?
In this case, the solution is fairly simple. Simply rbind your three data frames together (you'll need to make sure the colnames match up first) and create a new column, say grp that has three levels corresponding to your three data frames:
dat <- rbind(df1,df2,df3)
dat$grp <- rep(factor(1:3),times = c(nrow(df1),nrow(df2),nrow(df3)))
and then map the variable grp to the aesthetic color in the ggplot call:
ggplot(data = dat, aes(x=...,y=...,colour = grp) +
geom_line()
Finally, if you don't like the default colors, you can specify your own using scale_colour_manual:
+ scale_colour_manual(value = c('green','blue','grey'))
or you can use some nice 'pre-chosen' palettes from scale_colour_brewer.
EDIT: I fixed a typo above to ensure that grp is a factor. Here's my final version:
df1 = data.frame(c1 = c(1:5), c2 = c(1:5))
df2 = data.frame(c1 = c(1:5), c2 = (c(1:5))^0.5)
df3 = data.frame(c1 = c(1:5), c2 = (c(1:5))^2)
dat <- rbind(df1,df2,df3)
dat$grp <- rep(factor(1:3),times=c(nrow(df1),nrow(df2),nrow(df3)))
ggplot(data = dat, aes(x = c1, y = c2, colour = grp)) +
geom_line()