I've set up a time series line graph for variables of ratios of housing cost and housing income in R but I'm having no success specifying different point symbols for each variable as suggested in other posts. I'm getting the error message “A continuous variable can not be mapped to shape” against the following (simplified for two variables):
ggplot(housing, aes(year)) +
geom_line(aes(y = Greenwich, colour = "Greenwich"))+
geom_point(aes(y = Greenwich, colour = "Greenwich", shape = 1)) +
scale_shape_identity() + #added missing "+"
geom_line(aes(y = median, colour = "median"))+
geom_point(aes(y = median, colour = "median", shape = 2)) + # added missing parenthesis
scale_shape_identity() + # removed extra parenthesis
ylab("house price to earnings (lower quartile)")+
theme(legend.title=element_blank())
Any suggestions most welcome.
You are pretty close:
## toy data
year <- runif(20,10,20)
Greenwich <- runif(20,30,50)
median <- runif(20,30,50)
data<-data.frame(year,Greenwich,median)
## map it
ggplot(data, aes(year)) +
geom_line(aes(y = Greenwich, colour = "Greenwich"))+ scale_shape_identity()+
geom_point(aes(y = Greenwich, colour = "Greenwich",shape = 12,size=8))+
geom_line(aes(y = median, colour = "median")) +
geom_point(aes(y = median, colour = "median",shape = 10,size=8))+
ylab("house price to earnings (lower quartile)")+
theme(legend.title=element_blank())
Related
I'm making sediment profile grain size distribution graphs, with stacked bar charts representing sand, silt and clay and an added line showing the median value for each depth. The graph looks good, yet the legend of my final output is mixing up some of my items.
Here is a breakdown of my code:
GS_as = data.frame(Depth = c(10,30,50,70,90),
clay = c(0.99,0,0,2.86,3.62),
silt = c(55.48,81.48,53.26,79.5,70.71),
sand = c(43.53,18.52,46.74,17.64,25.67))
long = melt(GS_as,id = "Depth")
df = data.frame(Depth = c(10,30,50,70,90),
value = c(34.8,24.84,48.9,12.7,19.73),
variable = c("median","median","median","median","median"))
ggplot(long,aes(x=Depth,y=value,fill=variable)) +
geom_bar(stat="identity") + coord_flip() +
scale_y_continuous(position = "right") +
scale_x_continuous(breaks = seq(10,900,by = 20),trans='reverse') +
scale_fill_grey() +
geom_line(data=df, aes(x= Depth, y = value,group=variable,colour=variable)) +
geom_point(data=df,aes(x= Depth, y = value,group=variable,colour=variable))
The final output is giving me this graph 1
Now, how do I remove median from the legend grayscale of grain sizes, and how do i remove the points from each box in grayscale? The points should only be presented with the median as a separate variable. I've searched long to find a solution, but have not gotten anywhere. I'm guessing I got to my final graph by a strange unintuitive way.
Additionally, if its possible I would also like the median line and points to be black, remove the variables title and group all the items under 1 level.
I appreciate any help you can give.
To fix your first issue with the median showing up in the fill legend you could make fill a locale aes of geom_bar. For a black color you could set the color via scale_color_manual. The legend titles could be set or removed via labs and finally (and as far as I understand you) you could "group all the items under 1 level" via theme options by removing the spacing between the legends and almost all the margin around them.
library(ggplot2)
ggplot(long, aes(x = Depth, y = value)) +
geom_bar(aes(fill = variable), stat = "identity") +
coord_flip() +
scale_y_continuous(position = "right") +
scale_x_continuous(breaks = seq(10, 900, by = 20), trans = "reverse") +
scale_fill_grey() +
geom_line(data = df, aes(x = Depth, y = value, group = variable, colour = variable)) +
geom_point(data = df, aes(x = Depth, y = value, group = variable, colour = variable)) +
scale_color_manual(values = c("black")) +
labs(fill = NULL, color = NULL) +
theme(legend.spacing.y = unit(0, "pt"), legend.margin = margin(1, 0, 0, 0))
In my dataset, I have segregated the data by a parameter par for either Black or Red noise that are staggered in represtation. Now, for both species, I want to colour the "Black" noise as black, and "Red" as red. Furthermore, I want to join the points by par -- specifically, I want to join par -- No with a Dashed line, and Yes as a solid line. I tried the piece of code attached (and multiple versions of it)..but no luck. Any suggestions?
#Data
set.seed(100)
sp <- factor(c("A","A","A","A","B","B","B","B"))
par <- factor(c("No","No","Yes","Yes","No","No","Yes","Yes"))
y <- rnorm(8, 2,3)
noise <- factor(c("Black","Red","Black","Red","Black","Red","Black","Red"))
df <- data.frame(sp, par, y, noise)
df$noise <- factor(df$noise, levels = c("Black","Red"))
library(ggplot2)
ggplot(data = df, aes(x = noise, y = y, fill = par, color = par)) +
geom_point(size = 4) +
facet_wrap(.~sp) +
theme_classic() +
scale_fill_manual(values = c("black","red")) + scale_color_manual(values = c("black","red")) +
geom_line(aes(linetype=par)) + scale_linetype_manual(name = "indicator", values = c(2,1,2))
geom_path(aes(group = par,linetype=par), geom = "path")
ERROR: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
In your code, you forget to add a + to link geom_path() with the ggplot(). Since the aes() of geom_point() and geom_path() doesn't match, you'll need to include them in the corresponding geom_*().
library(tidyverse)
ggplot(data = df, aes(x = noise, y = y, group = par, linetype = par)) +
geom_point(aes(fill = noise, color = noise, ), size = 4) +
facet_wrap(.~sp) +
theme_classic() +
scale_fill_manual(values = c("black","red")) +
scale_color_manual(values = c("black","red")) +
geom_line() +
scale_linetype_manual(name = "indicator", values = c(2,1,2)) +
geom_path()
I am trying to add a legend for the mean and median to my histogram. I am also trying to change the scale on the y-axis that is labeled count. It is currently showing the density scale. I want the density plot but the count scale. Alternatively, I would be fine with a second scale or the counts at the end of the histogram. I am just not sure how to go about it. Below is some data and the current code. Thank you in advance.
studyData=data.frame(X=rchisq(1:100000, df=3))
colnames(studyData) <- "hoursstudying"
mu <- data.frame(mean(studyData$hoursstudying))
colnames(mu) <- "Mean"
med <- data.frame(median(studyData$hoursstudying))
colnames(med) <- "Median"
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean),
color = "red", linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = median(Median)),
color = "purple", size = 1) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying")
theme(plot.title = element_text(hjust = 0.5))
p
You can access the count instead of density on the y axis much in the same way you reference the internal calculation of density using the "..XXXX.." notation. In this case, use ..count...
You will need to change both y aesthetics for geom_histogram() and geom_density():
ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..count..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(aes(y=..count..), alpha=.2, fill="#FF6666") +
# ... everything else is the same
Note: also, I echo the comment from u/Limey. The median and mean values in your original plot shared are clearly wrong... yet when I run the code I am getting the values looking correct. Not sure what that's about, OP, but perhaps that's a different question.
Since #chemdork123 answered the question about the y-axis scale I won't say anything about it. To add the median/mean values to the legend you need to add them as aesthetics.
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean,
color = "red"),
linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = Median,
color = "purple"),
size = 1) +
scale_color_manual(values = c("purple", "red"),
labels = c("Median", "Mean")) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying") +
theme(plot.title = element_text(hjust = 0.5))
I want to plot the standard deviation for 1 line (1 flow serie, the plot will have 2) in a plot with lines or smoth areas. I've seen and applied some code from sd representation and other examples... but it's not working for me.
My original data has several flow values for the same day, of which I've calculated the daily mean and sd. I'm stuck here, don't know if it is possible to represent the daily sd with lines from the column created "called sd" or should I use the original data.
The bellow code is a general example of what I'll apply to my data. The flow, flow1 and sd, are examples of the result calculation of daily mean and sd of the original data.
library(gridExtra)
library(ggplot2)
library(grid)
x <- data.frame(
date = seq(as.Date("2012-01-01"),as.Date("2012-12-31"), by="week"),
rain = sample(0:20,53,replace=T),
flow1 = sample(50:150,53,replace=T),
flow = sample(50:200,53,replace=T),
sd = sample (0:10,53, replace=T))
g.top <- ggplot(x, aes(x = date, y = rain, ymin=0, ymax=rain)) +
geom_linerange() +
scale_y_continuous(limits=c(22,0),expand=c(0,0), trans="reverse")+
theme_classic() +
theme(plot.margin = unit(c(5,5,-32,6),units="points"),
axis.title.y = element_text(vjust = 0.3))+
labs(y = "Rain (mm)")
g.bottom <- ggplot(x, aes(x = date)) +
geom_line(aes(y = flow, colour = "flow")) +
geom_line(aes(y = flow1, colour = "flow1")) +
stat_summary(geom="ribbon", fun.ymin="min", fun.ymax="max", aes(fill=sd), alpha=0.3) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")
grid.arrange(g.top, g.bottom , heights = c(1/5, 4/5))
The above code gives Error: stat_summary requires the following missing aesthetics: y
Other option is geom_smooth, but as far as I could understand it requires some line equation (I can be wrong, I'm new in R).
Something like this maybe?
g.bottom <- x %>%
select(date, flow1, flow, sd) %>%
gather(key, value, c(flow, flow1)) %>%
mutate(min = value - sd, max = value + sd) %>%
ggplot(aes(x = date)) +
geom_ribbon(aes(ymin = min, ymax = max, fill = key)) +
geom_line(aes(y = value, colour = key)) +
scale_fill_manual(values = c("grey", "grey")) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")
I am trying to make a plot with multiple lines and with stat_summary defining the mean values. When I apply geom_errorbar(), some of them are placed with a some distance to mean indications, which means, that some of them are 'flying'. What is happening?
Thanks!
My code:
#First I add another data set with SE, SD and mean.
cdata <- ddply(data2, c("OGTT","Treatment"), summarise,
N = sum(!is.na(Glucose)),
mean = mean(Glucose, na.rm=TRUE),
sd = sd(Glucose, na.rm=TRUE),
se = sd / sqrt(N))
#Then I merge it with my original data
totalglu<-merge(data2,cdata)
#Then I make the ggplot
p<-ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), geom = "point", fun.y = mean, shape = 16, size = 2) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")+
geom_errorbar(aes(ymin=mean-se,ymax=mean+se), width=.1, colour="black")
p
It appears that you are not using End.start when calculating the bars, but it is being used by stat_summary because of the faceting.
Try:
cdata <- ddply(data2, c("OGTT","Treatment","End.start"), summarise,
N = sum(!is.na(Glucose)),
mean = mean(Glucose, na.rm=TRUE),
sd = sd(Glucose, na.rm=TRUE),
se = sd / sqrt(N))
#Then I merge it with my original data
totalglu<-merge(data2,cdata)
#Then I make the ggplot
p<-ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), geom = "point", fun.y = mean, shape = 16, size = 2) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")+
geom_errorbar(aes(ymin=mean-se,ymax=mean+se), width=.1, colour="black")
p
Though, without the actual starting data, I am not quite sure what data2 looks like, or how ddply is affecting things. Instead, I might suggest skipping making cdata altogether, and just using:
ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), fun.data = mean_cl_normal) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")