Making multi-line plots in R using ggplot2 - r

I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.

library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)

Related

Add category mean value to faceted scatter plots in ggplot

I am using facet wrap to plot Weight Gain versus Caloric Intake for four different diets. Diet is a four-level factor, Weight Gain and Caloric Intake are numeric. I am adding a regression line to each plot facet. What I want to do is add a horizontal line for the group mean weight gain for each diet in the plot (4 different mean values). The problem is when I use the geom_hline function it puts the global mean on all of the plots, which is not what I want.
I tried using stat_summary(fun.y=mean,geom="line"), but it gives me line segments joining each of the points in every plot.
Below is the code I am using that is giving me the single global mean on all plots. Also the data set I am using. I've included the labeller code for completeness but I really just need help with drawing the group mean lines.
Thanks in advance for any help.
# Calculate slopes and means to use for facet labels
#
wgSlope<-rep(NA,nlevels(vitaminData$Diet))
dietMeans<-rep(NA,nlevels(vitaminData$Diet))
for (i in 1:nlevels(vitaminData$Diet)){
dietMeans[i]<-mean(filter(vitaminData,Diet==i)$WeightGain)
#
# Get regression lines and coefficients for each facet
#
lm<-lm(WeightGain~CaloricIntake,data=filter(vitaminData,Diet==i))
wgSlope[i]<-lm$coefficients[2]
}
#
# Build facet labels
#
dietLabel<-c(`1`=
paste("Diet 1, Slope=",round(wgSlope[1],2),", Mean=",round(dietMeans[1],1)),
`2`=paste("Diet 2, Slope=",round(wgSlope[2],2),", Mean=",round(dietMeans[2],1)),
`3`=paste("Diet 3, Slope =",round(wgSlope[3],2),", Mean=",round(dietMeans[3],1)),
`4`=paste("Diet 4, Slope =",round(wgSlope[4],2),", Mean=",round(dietMeans[4],1)))
#
# Draw the plots
#
ggplot(data=vitaminData,
aes(y=WeightGain,x=CaloricIntake,color=Diet))+
theme_bw()+
geom_point(aes(color=Diet,fill=Diet,shape=Diet))+
geom_smooth(method="lm",se=FALSE,linetype=2,alpha=0.5)+
labs(x="Caloric Intake",y="Weight Gain")+
scale_color_manual(values=c("red","blue","orange","darkgreen"))+
geom_hline(yintercept=mean(vitaminData$WeightGain))+
facet_wrap(~Diet,labeller=labeller(Diet=dietLabel))+
theme(legend.position="none")
Diet WeightGain CaloricIntake
<fct> <dbl> <dbl>
1 1 48 35
2 1 67 44
3 1 78 44
4 1 69 51
5 1 53 47
6 2 65 40
7 2 49 45
8 2 37 37
9 2 73 53
10 2 63 42
11 3 79 51
12 3 52 41
13 3 63 47
14 3 65 47
15 3 67 48
16 4 59 53
17 4 50 52
18 4 59 52
19 4 42 45
20 4 34 38
Here's an approach using dplyr. (Add library(dplyr) or library(tidyverse) if not already loaded.)
geom_hline(data = vitaminData %>%
group_by(Diet) %>%
summarize(mean = mean(WeightGain)),
aes(yintercept = mean)) +

ggplot2 for a newbie multiple columns grouped in a bar chart? [duplicate]

I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution
As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).
I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.
As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.

Function for generating multiple line charts for all variables in a dataframe for different groups

I have 106 weeks data for 5 different LOB (Line of Business). The variables are Traffic, Spend, Clicks, etc. In total there will be 106*5 = 530 rows.
Dataframe looks like:
LOB Week Traffic Spend Clicks
A 1 34 12 5
A 2 37 32 6
A 3 41 57 7
A 4 52 42 12
A 5 27 37 8
... 106 weeks
B...106 weeks
C...106 weeks
D...106 weeks
E 1 43 22 12
E 2 65 16 14
E 3 76 18 9
E 4 25 14 11
E 5 53 15 15
... 106 weeks
I want to generate line chart for Traffic for all the 5 different LOB on the same chart, similarly for other metrics also. For this I have written a function but it is not doing what I want.
Code:
for ( i in seq(1,length( data),1) ) plot(data[,i],ylab=names(data[i]),type="l", col = "red", xlab = "Week", main = "")
Kindly suggest me how this can be done.
You can use ggplot2 :
ggplot(data, aes(x = Week, y = Traffic, color = LOB)) +
geom_line()
Please try to submit a toy example of your data so we can reproduce the code. See Here.
Edit: as suggested by #Axeman, you may want to plot all metrics together. Here is his solution for visibility:
d <- gather(data, metric, value, -Week, -LOB)
ggplot(d, aes(Week, value, color = LOB)) +
geom_line() +
facet_wrap(~metric, scales = 'free_y')

geom_bar labeling for melted data / stacked barplot

I have a problem with drawing stacked barplot with ggplot. My data looks like this:
timeInterval TotalWilling TotalAccepted SimID
1 16 12 Sim1
1 23 23 Sim2
1 63 60 Sim3
1 69 60 Sim4
1 61 60 Sim5
1 60 54 Sim6
2 16 8 Sim1
2 23 21 Sim2
2 63 52 Sim3
2 69 64 Sim4
2 61 45 Sim5
2 60 32 Sim6
3 16 14 Sim1
3 23 11 Sim2
3 63 59 Sim3
3 69 69 Sim4
3 61 28 Sim5
3 60 36 Sim6
I would like to draw a stacked barplot for each simID over a timeInterval, and Willing and Accepted should be stacked. I achieved the barplot with the following simple code:
dat <- read.csv("myDat.csv")
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) +
geom_bar(stat="identity", position = "stack")
I get the following graph:
Here my problem is that I would like to put percentages on each stack. Which means, I want to put percentage as for Willing label: (Willing/(Willing+Accepted)) and for Accepted part, ((Accepted/(Accepted+Willing)) so that I can see how many percent is willing how many is accepted such as 45 on red part of stack to 55 on blue part for each stack. I cannot seem to achieve this kind of labeling.
Any hint is appreciated.
applied from Showing data values on stacked bar chart in ggplot2
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
meltedDat$normvalue <- meltedDat$value
meltedDat$valuestr <- sprintf("%.2f%%", meltedDat$value, meltedDat$normvalue*100)
meltedDat <- ddply(meltedDat, .(timeInterval, SimID), transform, pos = cumsum(normvalue) - (0.5 * normvalue))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) + geom_bar(stat="identity", position = "stack") + geom_text(aes(x=timeInterval, y=pos, label=valuestr), size=2)
also, it looks like you may have some of your variables coded as factors.

Creating grouped bar-plot of multi-column data in R

I have the following data
Input Rtime Rcost Rsolutions Btime Bcost
1 12 proc. 1 36 614425 40 36
2 15 proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82
I want to create a grouped bar chart from this data such that x-axis contains Input field (as groups) and y axis represent the log scale for the Rtime and Btime fields (the two bars).
All solutions/examples I checked online had similar data put into a three column layout. I do not know how to use the data I have to generate the grouped bar-chart. Or if there is a way to convert this data (manually converting is not an options because it is a huge file with a lot of rows) into a R and ggplot compatible data format.
Edit :
Graph generated using gncs solution
As requested, a ggplot2 solution that also uses reshape2:
library(reshape2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",header = TRUE,sep = "")
dfm <- melt(df[,c('Input','Rtime','Btime')],id.vars = 1)
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
Note a style difference here, where since log(1) = 0, ggplot2 treats that as a bar of zero height and doesn't plot anything, whereas barplot plots a little stub (which in my opinion is a little misleading).
I think I understand the problem and this is what I would suggest (short run - option):
data <- read.table("data.txt", header=TRUE)
subset <- t(data.frame(data$Rtime, data$Btime))
barplot(subset, legend = c("Rtime", "Btime"), names.arg=data$Input, log="y", beside=TRUE)
Is that what you want? It is kind of dirty, but it does the job.
Update: code corrected.
As requested, a ggplot2 solution that also uses pivot_longer() https://tidyr.tidyverse.org/reference/pivot_longer.html to transform the data into a format that geom_bar() can easily plot.
library(dplyr)
library(ggplot2)
df <- read.table(text = " Input Rtime Rcost Rsolutions Btime Bcost
1 12-proc. 1 36 614425 40 36
2 15-proc. 1 51 534037 50 51
3 18-proc 5 62 1843820 66 66
4 20-proc 4 68 1645581 104400 73
5 20-proc(l) 4 64 1658509 14400 65
6 21-proc 10 78 3923623 453600 82",
header = TRUE,sep = "")
dfm <- pivot_longer(df, -Input, names_to="variable", values_to="value")
## pivot_longer takes the input data frame, excludes the Input field from the transformation, turns the remaining column names into the variable "variable" (often called the "key"), and assigns the values to the variable "value".
ggplot(dfm,aes(x = Input,y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") +
scale_y_log10()
joran's answer helped me a lot, but I had to use stat="identity" in the ggplot statement like that:
ggplot(dfm, aes(x = Input,y = value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
scale_y_log10()
My version of R is 3.2.2 and ggplot2 version 1.0.1
Thanks.

Resources