I have a data frame with the dimensions 625616 x 12. I would like to illustrate the data with a bubble plot. To illustrate my situation I will use the mtcars data set.
mtcars$cyl = as.factor(mtcars$cyl)
bp = ggplot(as.data.frame(mtcars), aes(x = wt, y = mpg, size = qsec)) + geom_point(shape = 21)
bp
Analogous to my data frame, I used with this command the data from 3 out of 12 columns. Ideally, I would like to add to this bubble plot another set of bubbles in another colour (column 4-6).
I tried to use the "add" function.
bp2 = ggplot(as.data.frame(mtcars), aes(x = wt2, y = mpg2, size = qsec2)) + geom_point(shape = 21)
plot(bp2, add = T)
Unfortunately it doesn't work out neither.
In case you have different x, y and size variables in the same data set, you can define them in the aesthetics for each geom_point
df <- data.frame(x1 = rnorm(20), y1 = rnorm(20),
x2 = rnorm(20), y2 = rnorm(20),
z1 = rnorm(20), z2 = rnorm(20))
ggplot(df) +
geom_point(aes(x = x1, y = y1, size = z1), col = "red") +
geom_point(aes(x = x2, y = y2, size = z2), col = "blue")
In case you have two distinct data sets, you can define that in the geoms as well:
ggplot() +
geom_point(aes(x = x1, y = y1, size = z1), col = "red", data = df1) +
geom_point(aes(x = x2, y = y2, size = z2), col = "blue",data = df2)
Edit based on your comment: you can change the overall size of points by e.g. using scale_size_continuous(range = c(0, 10)) and changing 10 to another value.
Related
I've created a scatterplot of the relationship between variables x and y1, but I also want to add a fitted line showcasing the relationship between variables x and y2 on the same graph.
I decided to combine the data to make it easier, as follows:
data1 <- data %>%
group_by(var) %>%
summarize(x = n(), y1 = mean(y1_var), y2 = mean(y2_var))
I hope this isn't too confusing. I don't know how to actually make the plot. I've been trying anything, with my latest attempt being:
data1 %>%
ggplot(aes(x = x, y = y1)) +
geom_point(color = "blue") +
geom_point(x = x, y = y2, color = "yellow") +
geom_smooth(method = "lm", se = FALSE)
I know I don't have a good understanding of ggplot2, but just to show sort of where I'm at.
Any help would be appreciated!
Not knowing how your data looks like, it is slightly confusing when you state y1 = mean(y1_var). So is y1 just the one mean value? How is that a scatter point?
y1 <- (1:100) + rnorm(100, mean = 1, sd = 1)
y2 <- (100:1) + rnorm(100, mean = 1, sd = 1)
x <- 1:100
df <- data.frame(x, y1, y2)
df %>%
ggplot(aes(x = x, y = y1)) +
geom_point(colour = "blue") +
geom_point(aes(x = x, y = y2), colour = "yellow") +
geom_smooth(aes(x = x, y = y2), method = "lm", se = FALSE)
I've created what it sounds like you're describing.
Strange_plot
I am trying to create a line plot with 2 types of measurements, but my data is missing some x values. In Line break when no data in ggplot2 I have found how to create plot that will make a break when there is now data, but id does not allow to plot 2 lines (one for each Type).
1) When I try
ggplot(Data, aes(x = x, y = y, group = grp)) + geom_line()
it makes only one line, but with break when there is no data
2) When I try
ggplot(Data, aes(x = x, y = y, col = Type)) +
geom_line()
it makes 2 lines, but with break when there is no data
3) When I try
ggplot(Data, aes(x = x, y = y, col = Type, group = grp)) +
geom_line()
it makes unreadyble chart
4) of course I could combine the Type and grp to make new variable, but then the legend is not nice, and I get 4 groups (and colours) insted of 2.
5) also I could make something like that, but it dose not produce a legend, and in my real dataset i have way to many Types to do that
ggplot() +
geom_line(data = Data[Data$Type == "A",], aes(x = x, y = y, group = grp), col = "red") +
geom_line(data = Data[Data$Type == "B",], aes(x = x, y = y, group = grp), col = "blue")
Data sample:
Data <- data.frame(x = c(1:100, 201:300), y = rep(c(1, 2), 100), Type = rep(c("A", "B"), 100), grp = rep(c(1, 2), each = 100))
One way is to use interaction() to specify a grouping of multiple columns:
library(ggplot2)
Data <- data.frame(x = c(1:100, 201:300), y = rep(c(1, 2), 100), Type = rep(c("A", "B"), 100), grp = rep(c(1, 2), each = 100))
ggplot(Data, aes(x = x, y = y, col = Type, group = interaction(grp,Type))) +
geom_line()
I have three dataframes, containing data for the same variables (x and y, grouped by variable case) but each dataframe contains data from a different source (test, sim and model). The levels of case are identical for test and model, but they are different for sim. For each value of case, I want all xy curves from different sources but with the same case to have the same color. I need to have a legend which clearly identifies the data source, but I would also like to use different geoms for different data sources. This is what I've been able to do:
rm(list=ls())
gc()
graphics.off()
library(ggplot2)
# build the dataframes
nx <- 10
x1 <- seq(0, 1, len = nx)
x2 <- x1+ 0.1
x3 <- x2+ 0.1
x4 <- x3+ 0.1
x <- c(x1, x2, x3, x4)
y1 <- 1 - x1
y2 <- 1.1 * y1
y3 <- 1.1 * y2
y4 <- 1.1 * y3
y <- c(y1, y2, y3, y4)
z1 <- (y1 + y2)/2
z2 <- (y2 + y3)/2
z3 <- (y3 + y4)/2
z4 <- (y4 + 1.1 * y4)/2
z <- c(z1, z2, z3, z4)
w <- y*1.01
case_y <- c("I-26_1", "I00", "I20_5", "I40_9")
case_z <- c("I-23_6", "I00", "I22_4", "I42_3")
case_y <- rep(case_y, each = nx)
case_z <- rep(case_z, each = nx)
foo <- data.frame(x = x, y = z, case = case_z, type = "test")
bar <- data.frame(x = x, y = y, case = case_y, type = "sim")
mod <- data.frame(x = x, y = w, case = case_z, type = "model")
# different data frames have different factor levels: to avoid this,
# I bind all dataframes and I reorder the levels of case
foobar <- rbind(foo, bar, mod)
case_levels <- c("I-26_1", "I-23_6", "I00", "I20_5", "I22_4", "I40_9", "I42_3")
foobar$case <- factor(foobar$case, levels = case_levels)
# now I can plot the resulting dataframe
p <- ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = type), size = 1)
p
The problem here is that it's difficult to discern sim and model. In order to make a more readable plot, I switch to geom_point for the model data:
foobar <- rbind(foo, bar)
case_levels <- c("I-26_1", "I-23_6", "I00", "I20_5", "I22_4", "I40_9", "I42_3")
foobar$case <- factor(foobar$case, levels = case_levels)
mod$case <- factor(mod$case, levels = case_levels)
# now I can plot the resulting dataframe
p <- ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = type), size = 1) +
geom_point(data = mod)
However, now I don't have a model label in the legend. How can I make sure that the model curves are clearly labeled in the legend, but they are also easy to discern visually from the sim and test curves?
EDIT Procrastinatus Maximus suggests an edit to Pierre Lafortune's code which should eliminate the space between the model label and the type legend, but apparently it eliminates the space between model and the case legend instead:
ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = type), size = 1) +
geom_point(data = mod, aes(shape=type)) +
scale_shape_discrete(name="") +
guides(colour = guide_legend(override.aes = list(linetype=c(1),
shape=c(NA)))) +
theme(legend.margin = margin(0,0,0,0), legend.spacing = unit(0, 'lines'))
The result is
This will get you closer to your goal. I will look to see if we can close the gap between the two legends.
ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = type), size = 1) +
geom_point(data = mod, aes(shape=type)) +
scale_shape_discrete(name="") +
guides(colour = guide_legend(override.aes = list(linetype=c(1),
shape=c(NA))))
Edit
##ProcrastinatusMaximus
ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = type), size = 1) +
geom_point(data = mod, aes(shape = type)) +
guides(color = guide_legend(override.aes = list(linetype = c(1), shape = c(NA)), order = 1),
linetype = guide_legend(order = 2),
shape = guide_legend(title = NULL, order = 3))+
theme(legend.margin = margin(0,0,0,0), legend.spacing = unit(0, 'lines'))
Personally, I think all you need to do is change to order of the type, so that the solid line is in the middle. If you make the background white and the line colors a bit brighter, I think your figure is clear:
(p <- ggplot(data = foobar, aes(x = x, y = y, color = case)) +
geom_line(aes(linetype = rev(type)), size = 1) +
scale_color_manual(values = c("black","green","blue","purple","pink","red","brown"))+
theme_bw())
require(reshape2);require(ggplot2)
df <- data.frame(time = 1:10,
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
y1 = rnorm(10),
y2 = rnorm(10))
df <- melt(df, id = "time")
ggplot(df, aes(x = time, y = value, color = variable, group = variable,
size = variable, linetype = variable)) +
geom_line() +
scale_linetype_manual(values = c(rep(1, 3), 2, 2)) +
scale_size_manual(values = c(rep(.3, 3), 2, 2)) +
scale_color_manual(values = c(rep("grey", 3), "red", "green")) +
theme_minimal()
This example might not be very representative, but, for example, imagine running bunch of regression models that individually are not important but just contribute to the picture. While I want to emphasize only actual and averaged fit series. So basically variables x are not important and should not appear on legend.
I've tried to set scale_color_discrete(breaks = c("y1", "y2")) as suggested in some other posts. But the problem is that all of aesthetics are already in use via manual and trying to set another discrete version will override properties that are already set for graph (and mess up whole thing). So ideally - I'd want to see the exact same graph, but only y1 and y2 displayed in the legend.
You can try subsetting the data set by the variable name and plotting them separately.
p <- ggplot(df, aes(x = time, y = value, color = variable,
group = variable, size = variable, linetype = variable)) +
geom_line(data=df[which(substr(df$variable,1,1)=='y'),])+
scale_linetype_manual(values = c(2, 2)) + scale_size_manual(values = c(2, 2)) +
scale_color_manual(values = c("red", "green")) +
theme_minimal() +
geom_line(data=df[which(substr(df$variable,1,1)=='x'),],
aes(x = time, y = value, group = variable),
color="grey",size=0.3,linetype=1)
# Plot elements that have attributes set outside of aes() will
# not appear on legend!
If you look at this
ggplot(mtcars,aes(x=disp,y=mpg,colour=mpg))+geom_line()
you will see that the line colour varies according to the corresponding y value, which is what I want, but only section-by-section. I would like the colour to vary continuously according to the y value. Any easy way?
One possibility which comes to mind would be to use interpolation to create more x- and y-values, and thereby make the colours more continuous. I use approx to " linearly interpolate given data points". Here's an example on a simpler data set:
# original data and corresponding plot
df <- data.frame(x = 1:3, y = c(3, 1, 4))
library(ggplot2)
ggplot(data = df, aes(x = x, y = y, colour = y)) +
geom_line(size = 3)
# interpolation to make 'more values' and a smoother colour gradient
vals <- approx(x = df$x, y = df$y)
df2 <- data.frame(x = vals$x, y = vals$y)
ggplot(data = df2, aes(x = x, y = y, colour = y)) +
geom_line(size = 3)
If you wish the gradient to be even smoother, you may use the n argument in approx to adjust the number of points to be created ("interpolation takes place at n equally spaced points spanning the interval [min(x), max(x)]"). With a larger number of values, perhaps geom_point gives a smoother appearance:
vals <- approx(x = df$x, y = df$y, n = 500)
df2 <- data.frame(x = vals$x, y = vals$y)
ggplot(data = df2, aes(x = x, y = y, colour = y)) +
geom_point(size = 3)
Since ggplot2 v0.8.5 one can use geom_line or geom_path with different lineend options (right now there are three options: round, butt and square). Selection depends on the nature of the data.
round would work on sharp edges (like in given OPs data):
library(ggplot2)
ggplot(mtcars, aes(disp, mpg, color = mpg)) +
geom_line(size = 3, lineend = "round")
square would work on a more continuous variable:
df <- data.frame(x = seq(0, 100, 10), y = seq(0, 100, 10) ^ 2)
ggplot(data = df, aes(x = x, y = y, colour = y)) +
geom_path(size = 3, lineend = "square")
Maybe this will work for you:
library(dplyr)
library(ggplot2)
my_mtcars <-
mtcars %>%
mutate(my_colors = cut(disp, breaks = c(0, 130, 200, 400, Inf)))
ggplot(my_mtcars, aes(x = disp, y = mpg, col = mpg)) +
geom_line() + facet_wrap(~ my_colors, scales = 'free_x')