Can't make a ggplot with multiple lines, geom_line() - r

I'm trying to plot two lines using flight data I gathered. My problem is that after trying different formulas, R is still only showing one line. I've separated my data according to regions (see image below). Can someone help me out with my formula?
If you need any additional information don't hesitate to ask, this is my first time posting on this channel.
ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1) +
theme_minimal()

When I enter :
library(ggplot2)
ica.vs.total = data.frame(flights=c(215947,197757,185782,201023,279218,261045,213343,205609),
region=c('TotalFlights','TotalFlights','TotalFlights','TotalFlights',
'TotalFlightsICA','TotalFlightsICA','TotalFlightsICA','TotalFlightsICA'),
Year=c(2008,2009,2010,2011,2000,2001,2002,2003))
g = ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1)+
theme_minimal()
print(g)
I get the expected result :
Double check your code.

Related

Remove a line linked from first to last data, by geom_path()

I learned from stack overflow that geom_path() can remove the line from different part of data on the same line. It means that inside the whole Red line, there are some other colors parts, and without this command, the last point at previous blue part is linked to the first point in next blue part. Code and image are as below:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
When I use them, it seems that interval data has solved this questions, but there is a wired line linked between the first data and the last data. Can you please tell me how to remove that?
The data is too large and I cannot link here sorry about that and here is the link:
Data
Thank you!
The geom_path() is used to draw line according to data order in data frame. In my question, column M1_value includes two types data with same period. So the last data in type 1 is next to the first data in type, and that's the reason for the wired line. The solution is to add a new column (say type) in data frame and add group = type in aes() to remove line. (BTW, this question is similar as recording monthly temperature because first day of new month's temperature will be linked by temperature of the last day of month, and the key for this type of question is to specify classification clearly and group them makes the question more easier.)
Answer:
Change previous code to:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'), y = M1_Value,
color = NewGroup, group = M1_Type)) + geom_path() + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
and the plot changes to:
So the wired line is removed. Done!
Thanks for the help from following link:
R geom_path lines "closing", sometimes. How to keep them "open"?.
For me, a simpler solution was to arrange the dataframe using the variable plotted on the x-axis. In the example above, this would result in:
p6 <- ggplot(data = M1.m %>% arrange(M1_Date), mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
I haven't tested this on the data above. The solution given by Ericshaw did not work for me because I could not add a group aesthetic when already using linetype as an aesthetic.

How do I set the x axis continuous that each plot in the graph is scattered relatively

The left image is my current graph and I would like to make it look like the right one. I'm having two problems. The first is even if I used step in the plot, it doesn't graph the line connecting each dots. The second problem is while the right graph's plots are scattered relatively to the year, mine is scattered proportionally throughout the whole x-axis.
Here is my code
ggplot() +
geom_step(data = tbl, mapping = aes(x = tbl$date, y = tbl$size)) +
geom_point(data = tbl, aes(x = tbl$date, y = tbl$size)) +
labs(x = 'Data', y = 'Size (Kilobytes)', title = 'stringr: timeline of version sizes')
I have to somehow convert current date format(yyyy-mm-dd) and change it to just yyyy format but doing that so would make some points to be in the same year. For example, the first three dates I have are 2009-11, 2009-11, and 2010-02 so if I change the format of year, two of them will be on same spot. And I don't know how to figure this out since I am still trying to learn how to use R.
Thank you in advance!
It takes some finagling with the date, but all you should have to do is add a function from the scalespackage to set your x-axis scale. It requires your time to be as class POSIXct. Used some dummy data since you didn't post any.
library(ggplot2)
library(scales)
library(zoo)
tbl$date <- as.POSIXct(as.yearmon(tbl$date, format = "%Y-%m"))
ggplot() +
geom_step(data = tbl, mapping = aes(x = date, y = size)) +
geom_point(data = tbl, aes(x = date, y = size)) +
labs(x = 'Data', y = 'Size (Kilobytes)', title = 'stringr: timeline of version sizes') +
scale_x_datetime(labels = date_format("%Y"))

Missing legend in ggplot

I am trying to plot to show the average number of bikes rent on weekday by hourly using ggplot as shown below in R. Since I have weekday as number in my data, its giving text legend as numbers.
ggplot(bike_share_train, aes(x=hour, y=count, color=day))+
geom_point(data = day_summary_by_hour, aes(group = day))+
geom_line(data = day_summary_by_hour, aes(group = day))+
ggtitle("Bikes Rent By Weekday")+scale_colour_hue('Weekday')
I want to change those numbers to weekday names and I tried using below command, but legend is missing in ggplot. Please let me know how to fix this issue.
ggplot(bike_share_train, aes(x=hour, y=count, color=day))+geom_point(data = day_summary_by_hour, aes(group = day)) +
geom_line(data = day_summary_by_hour, aes(group = day))+
ggtitle("Bikes Rent By Weekday")+
scale_colour_hue('Weekday', breaks = levels(bike_share_train$day), labels=c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'))
Although the OP found the solution, I'm proposing an alternative approach when each group consist of only one observation.
library(ggplot2)
# create some sample data
bike_share_data<-data.frame(hour = c(1.5,2.3,1.3,2.2,1.5),
count = c(21,26,30,15,20),
day = c("1","2","3","4","5"))
head(bike_share_data)
ggplot(bike_share_data, aes(x=hour, y=count, color=day))+
geom_point(data = bike_share_data, aes(group = 1))+
geom_line(data = bike_share_data, aes(group = 1))+
ggtitle("Bikes Rent By Weekday")+
scale_colour_hue('Weekday',
breaks = levels(bike_share_data$day),
labels=c('Monday','Tuesday','Wednesday','Thursday','Friday')
)
My sincere apologies to everyone for my mistake. I extracted the weekday, hour from datetime value but forgot to change the datatype from char to factor for these fields. So, issue in ggplot. I have changed the datatype and checked the ggplot. Now legends are showing.

dodge columns in ggplot2

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

Strange behavior on ggplot2

I'm trying to do a map to identify specific areas by coloring them. First, I made this plot to check if the data was ok (Setor is the sector's number):
ggplot(aes(x = long, y = lat, fill = Setor), data = mapa2010) + geom_polygon(colour = 'black') # data is ok
Them I tried to made the plot, filling by another variable (AGSN):
ggplot(aes(x = long, y = lat, fill = AGSN), data = mapa2010) + geom_polygon(colour = 'black')
The data is exactly the same, there is no code lines between this 2 commands. I've already tried to reorder the data, but still wrong.
Anyone know why this happens, and how to solve it?
Adding the parameter group = group in aes() for second plot solve. Don't know why only the second map needs.
ggplot(aes(x = long, y = lat, fill = AGSN, group = group), data = mapa2010[order(AGSN, id, piece, order), ]) + geom_polygon(colour = 'black')

Resources