I learned from stack overflow that geom_path() can remove the line from different part of data on the same line. It means that inside the whole Red line, there are some other colors parts, and without this command, the last point at previous blue part is linked to the first point in next blue part. Code and image are as below:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
When I use them, it seems that interval data has solved this questions, but there is a wired line linked between the first data and the last data. Can you please tell me how to remove that?
The data is too large and I cannot link here sorry about that and here is the link:
Data
Thank you!
The geom_path() is used to draw line according to data order in data frame. In my question, column M1_value includes two types data with same period. So the last data in type 1 is next to the first data in type, and that's the reason for the wired line. The solution is to add a new column (say type) in data frame and add group = type in aes() to remove line. (BTW, this question is similar as recording monthly temperature because first day of new month's temperature will be linked by temperature of the last day of month, and the key for this type of question is to specify classification clearly and group them makes the question more easier.)
Answer:
Change previous code to:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'), y = M1_Value,
color = NewGroup, group = M1_Type)) + geom_path() + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
and the plot changes to:
So the wired line is removed. Done!
Thanks for the help from following link:
R geom_path lines "closing", sometimes. How to keep them "open"?.
For me, a simpler solution was to arrange the dataframe using the variable plotted on the x-axis. In the example above, this would result in:
p6 <- ggplot(data = M1.m %>% arrange(M1_Date), mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
I haven't tested this on the data above. The solution given by Ericshaw did not work for me because I could not add a group aesthetic when already using linetype as an aesthetic.
I am trying to plot to show the average number of bikes rent on weekday by hourly using ggplot as shown below in R. Since I have weekday as number in my data, its giving text legend as numbers.
ggplot(bike_share_train, aes(x=hour, y=count, color=day))+
geom_point(data = day_summary_by_hour, aes(group = day))+
geom_line(data = day_summary_by_hour, aes(group = day))+
ggtitle("Bikes Rent By Weekday")+scale_colour_hue('Weekday')
I want to change those numbers to weekday names and I tried using below command, but legend is missing in ggplot. Please let me know how to fix this issue.
ggplot(bike_share_train, aes(x=hour, y=count, color=day))+geom_point(data = day_summary_by_hour, aes(group = day)) +
geom_line(data = day_summary_by_hour, aes(group = day))+
ggtitle("Bikes Rent By Weekday")+
scale_colour_hue('Weekday', breaks = levels(bike_share_train$day), labels=c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'))
Although the OP found the solution, I'm proposing an alternative approach when each group consist of only one observation.
library(ggplot2)
# create some sample data
bike_share_data<-data.frame(hour = c(1.5,2.3,1.3,2.2,1.5),
count = c(21,26,30,15,20),
day = c("1","2","3","4","5"))
head(bike_share_data)
ggplot(bike_share_data, aes(x=hour, y=count, color=day))+
geom_point(data = bike_share_data, aes(group = 1))+
geom_line(data = bike_share_data, aes(group = 1))+
ggtitle("Bikes Rent By Weekday")+
scale_colour_hue('Weekday',
breaks = levels(bike_share_data$day),
labels=c('Monday','Tuesday','Wednesday','Thursday','Friday')
)
My sincere apologies to everyone for my mistake. I extracted the weekday, hour from datetime value but forgot to change the datatype from char to factor for these fields. So, issue in ggplot. I have changed the datatype and checked the ggplot. Now legends are showing.
I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.