change line colour ggplot (geom_line) - r

I still very new using GGPLOT, but ive created the following graphic in which i would like to switch the colors blue and red. Should be simple enough but i cannot figure it out.
df <- structure(list(Sex = c("M", "M", "M", "M", "M", "M", "M", "W",
"W", "W", "W", "W", "W", "W"), age_cat = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("<40",
"41-50", "51-60", "61-70", "71-80", "81-90", "90+"), class = "factor"),
DD = c(42L, 88L, 289L, 558L, 527L, 174L, 22L, 27L, 36L, 206L,
347L, 321L, 160L, 29L), pop = c(36642L, 16327L, 20232L, 18068L,
14025L, 5555L, 1293L, 35887L, 16444L, 20178L, 17965L, 14437L,
7150L, 2300L), proportion = c(0.114622564270509, 0.538984504195504,
1.428430209569, 3.08833296435687, 3.75757575757576, 3.13231323132313,
1.7014694508894, 0.0752361579402012, 0.218924835806373, 1.02091386658737,
1.9315335374339, 2.22345362609961, 2.23776223776224, 1.26086956521739
), lower = c(0.082621962613957, 0.432499099174075, 1.26946115577044,
2.8408823120445, 3.44891013496043, 2.69000601596679, 1.06929480146528,
0.0495867729368698, 0.153377923598767, 0.886828947142727,
1.73530361873497, 1.98914206124244, 1.90751612365318, 0.846006018532107
), upper = c(0.154905173671422, 0.663628389658291, 1.6015714811397,
3.35102939015014, 4.08561035940466, 3.6247050150149, 2.56476800800746,
0.10944592449968, 0.302956684874059, 1.16937842460687, 2.14353263545661,
2.47728262910991, 2.60770261266057, 1.80583393021473)), row.names = c(NA,
-14L), class = "data.frame")
Below is the script i've used, in which i get (sex = M) in red and (sex = W)in bliue.
ggplot(data = prevalence2021GGPLOT,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))
How do i make sex = M blue and sex = W red??

You can use scale_color_manual to manually change the colours in ggplot2. The first colour corresponds to the first modality of your fill variable.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+ geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))+
scale_color_manual(values=c("blue", "red"))
As the result :
you can find the documentation of the ggplot2 graphics on R here :
data_to_viz
scale_color_manual

This works. I need to point to df to use your sample df. Edited it per suggestion.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
scale_color_manual(values=c(M="darkblue", W="darkred"))

Related

Adding a second y axis in R

I am envisioning to use the following dataset to create a plot that combines a clustered bar chart and line chart with the following data:
structure(list(X = 1:14, ORIGIN = c("AUS", "AUS", "DAL", "DAL",
"DFW", "DFW", "IAH", "IAH", "OKC", "OKC", "SAT", "SAT", "SHV",
"SHV"), DEST = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), .Label = c("ATL", "SEA"), class = "factor"),
flight19.x = c(293L, 93L, 284L, 93L, 558L, 284L, 441L, 175L,
171L, 31L, 262L, 31L, 175L, 0L), flight19.y = c(5526L, 5526L,
6106L, 6106L, 23808L, 23808L, 15550L, 15550L, 2055L, 2055L,
3621L, 3621L, 558L, 558L)), row.names = c(NA, -14L), class = "data.frame")
In Excel, the chart I am envisioning looks something like this:
I have already tried to used the sec.axis function to generate a second axis. However, the outcome looks like the line plot still uses the first y-axis instead of the second axis:
p1 <- ggplot()+
geom_bar(data = flight19, aes(ORIGIN, flight19.x, fill = DEST),stat = "identity", position = "dodge" )+
scale_fill_viridis(name = "Destinations", discrete = TRUE)+
labs(y= "Operation Counts", x = "Airports")
p2 <- p1 + geom_line(data = flight19, aes(as.character(ORIGIN), flight19.y, group = 1))+
geom_point(data = flight19, aes(as.character(ORIGIN), flight19.y, group = 1))+
scale_y_continuous(limit = c(0,600),sec.axis = sec_axis(~.*75/10, name = "Total Monthly Operations"))
The plot shows the warning below:
Warning messages:
1: Removed 12 row(s) containing missing values (geom_path).
2: Removed 12 rows containing missing values (geom_point).
And the codes produce the plot below:
Could someone teach me how to let the line plot corresponds to the second axis?
Thanks so much in advance.
Find a suitable transformation factor, here I used 50 just to get nice y-axis labels
#create x-axis
flight19$x_axis <- paste0(flight19$ORIGIN,'\n',flight19$DEST)
# The transformation factor
#transf_fact <- max(flight19$flight19.y)/max(flight19$flight19.x)
transf_fact <- 50
ggplot(flight19, aes(x = x_axis)) +
geom_bar(aes(y = flight19.x),stat = "identity", fill = "blue") +
geom_line(aes(y = flight19.y/transf_fact,group=1), color = "orange") +
scale_y_continuous(name = "Operation Counts",
limit = c(0,600),
breaks = seq(0,600,100),
sec.axis = sec_axis(~ (.*transf_fact),
breaks = function(limit)seq(0,limit[2],5000),
labels = scales::dollar_format(prefix = "$",suffix = " k",scale = .001),
name = "Total Monthly Operations")) +
xlab("Airports") +
theme_bw()

How to plot a combined bar and line plot in ggplot2

I have the following data which I am trying to plot as combined bar and line plot (with CI)
A data frame of Feature, Count, Odds Ratio and Confidence Interval values for OR
I am trying to get a plot as
A bar plot for count over lapped with a line plot for Odds Ratio with CI bars
I tried to plot in ggplot2 using following code:
ggplot(feat)+
geom_bar(aes(x=Feat, y=Count),stat="identity", fill = "steelblue") +
geom_line(aes(x=Feat, y=OR*max(feat$Count)),stat="identity", group = 1) +
geom_point(aes(x=Feat, y=OR*max(feat$Count))) +
geom_errorbar(aes(x=Feat, ymin=CI1, ymax=CI2), width=.1, colour="orange",
position = position_dodge(0.05))
However, I am not getting the CI bars for the line graph, as can be seen in pic: Rather, I am getting them for barplot
Can someone can please help me out to sort this issue.
Thanks
Edit - Dput:
df <- structure(list(Feat = structure(1:8, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H"), class = "factor"), Count = structure(c(2L,
8L, 7L, 5L, 4L, 1L, 6L, 3L), .Label = c("13", "145", "2", "25",
"26", "3", "37", "43"), class = "factor"), OR = structure(c(4L,
2L, 1L, 5L, 3L, 7L, 6L, 8L), .Label = c("0.38", "1.24", "1.33",
"1.51", "1.91", "2.08", "2.27", "3.58"), class = "factor"), CI1 = structure(c(7L,
4L, 1L, 6L, 3L, 5L, 2L, 2L), .Label = c("0.26", "0.43", "0.85",
"0.89", "1.2", "1.24", "1.25"), class = "factor"), CI2 = structure(c(3L,
2L, 1L, 6L, 4L, 7L, 8L, 5L), .Label = c("0.53", "1.7", "1.82",
"1.98", "13.07", "2.83", "3.92", "6.13"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
Is this what you had in mind?
ratio <- max(feat$Count)/max(feat$CI2)
ggplot(feat) +
geom_bar(aes(x=Feat, y=Count),stat="identity", fill = "steelblue") +
geom_line(aes(x=Feat, y=OR*ratio),stat="identity", group = 1) +
geom_point(aes(x=Feat, y=OR*ratio)) +
geom_errorbar(aes(x=Feat, ymin=CI1*ratio, ymax=CI2*ratio), width=.1, colour="orange",
position = position_dodge(0.05)) +
scale_y_continuous("Count", sec.axis = sec_axis(~ . / ratio, name = "Odds Ratio"))
Edit: Just for fun with the legend too.
ggplot(feat) +
geom_bar(aes(x=Feat, y=Count, fill = "Count"),stat="identity") + scale_fill_manual(values="steelblue") +
geom_line(aes(x=Feat, y=OR*ratio, color = "Odds Ratio"),stat="identity", group = 1) + scale_color_manual(values="orange") +
geom_point(aes(x=Feat, y=OR*ratio)) +
geom_errorbar(aes(x=Feat, ymin=CI1*ratio, ymax=CI2*ratio), width=.1, colour="orange",
position = position_dodge(0.05)) +
scale_y_continuous("Count", sec.axis = sec_axis(~ . / ratio, name = "Odds Ratio")) +
theme(legend.key=element_blank(), legend.title=element_blank(), legend.box="horizontal",legend.position = "bottom")
Since you asked about adding p values for comparisons in the comments, here is a way you can do that. Unfortunately, because you don't really want to add **all* the comparisons, there's a little bit of hard coding to do.
library(ggplot2)
library(ggsignif)
ggplot(feat,aes(x=Feat, y=Count)) +
geom_bar(aes(fill = "Count"),stat="identity") + scale_fill_manual(values="steelblue") +
geom_line(aes(x=Feat, y=OR*ratio, color = "Odds Ratio"),stat="identity", group = 1) + scale_color_manual(values="orange") +
geom_point(aes(x=Feat, y=OR*ratio)) +
geom_errorbar(aes(x=Feat, ymin=CI1*ratio, ymax=CI2*ratio), width=.1, colour="orange",
position = position_dodge(0.05)) +
scale_y_continuous("Count", sec.axis = sec_axis(~ . / ratio, name = "Odds Ratio")) +
theme(legend.key=element_blank(), legend.title=element_blank(), legend.box="horizontal",legend.position = "bottom") +
geom_signif(comparisons = list(c("A","H"),c("B","F"),c("D","E")),
y_position = c(150,60,40),
annotation = c("***","***","n.s."))

How to colormap from melt in R

I've mapped colors in R before. But something isn't clicking.
Ideally, I'd like to map color names to the variable value "student", but I'm getting a length error. However, the number of students being mapped to colors is equal. Also, I've tried creating two separate color columns - as a string and as an id. The colors then end up getting labeled on the legend. Adding the manual scale color options doesn't do much.
Here is a sample of the data:
m3 <- structure(list(student = structure(c(7L, 11L, 9L, 2L, 8L, 4L), .Label = c("a","b", "c", "d", "e", "f", "g","h", "i", "j", "k", "l", "m", "n","o", "p"), class = "factor"), colorz = structure(4:9, .Label = c("#66CC99","#9999CC", "#CC6666", "#FF0000FF", "#FF2000FF", "#FF4000FF","#FF6000FF", "#FF8000FF", "#FF9F00FF", "#FFBF00FF", "#FFDF00FF","#FFFF00FF", "green", "red"), class = "factor"), variable = structure(c(1L,1L, 1L, 1L, 1L, 1L), .Label = c("pre", "c1", "c2","b1", "c3", "c4", "b2", "u1", "u2","u3", "u4", "total"), class = "factor"), value = c(3, 31,49, 88, 31, 40), col = c("#FF0000FF", "#FF2000FF", "#FF4000FF","#FF6000FF", "#FF8000FF", "#FF9F00FF")), .Names = c("student","colorz", "variable", "value", "col"), row.names = c(NA, 6L), class = "data.frame")
And then graphing with: ggplot(m3, aes(x=variable, y=value, group=student,linetype=student)) + geom_line(size=.75) + geom_point(size=2) + xlab("test") + ylab(paste("score")) + geom_hline(yintercept=70, linetype="dashed", size=3) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_color_identity()
The example is much smaller than the actual data.
Ideally, I'd like to be able to use something like: color.names <- setNames( c( "#FF0000FF", "#FF2000FF", "#FF4000FF", "#FF6000FF", "#FF8000FF", "#FF9F00FF","#CC6666", "#9999CC", "#66CC99", "#FFBF00FF", "#FFDF00FF", "#FFFF00FF","green","red"), c("a","b","c", "d","e","f","g","h","i","j","k","l","m","n" ))
and call the colors. I'm not sure what's messing up. It could look as if I were trying to map 12 colors to 14 values, but I've tried 14 as well.
First just assign color to student like this, aes(color = student), and then just use scale_color_manual() instead. Since you already named your color vector, ggplot will handle the matching based on names, unless a name isn't in the palette then that value will be dropped and not plotted:
ggplot(m3, aes(x=variable, y=value, group=student, linetype=student, color = student)) +
geom_line(size=.75) + geom_point(size=2) + xlab("test") +
ylab(paste("score")) + geom_hline(yintercept=70, linetype="dashed", size=3) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_color_manual(values = color.names)
Looks like this now:

Is it possible to put space between stacks in ggplot2 stacked bar?

I took this example from here:
DF <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
library(reshape2)
DF1 <- melt(DF, id.var="Rank")
library(ggplot2)
ggplot(DF1, aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
Is it possible to create a stacked bar such as the following graph using ggplot2? I do not want to differentiate stacks by different colors.
EDIT: Based on Pascal's comments,
ggplot(DF1, aes(x = Rank, y = value)) +
geom_bar(stat = "identity",lwd=2, color="white")
I still have the white borders for the bars.
This is the closest I could get to your example figure. It is not much of an improvement beyond what you've already sorted but puts less of an emphasis on the white bar borders on the grey background.
library(ggplot2)
p <- ggplot(DF1, aes(x = Rank, y = value, group = variable))
p <- p + geom_bar(stat = "identity", position = "stack", lwd = 1.5,
width = 0.5, colour = "white", fill = "black")
p <- p + theme_classic()
p <- p + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p
That produces:
If you want to keep the grey background you can find out exactly what shade of grey it is and use that colour for the line while removing the background grids (this is not the right shade).
p <- ggplot(DF1, aes(x = Rank, y = value))
p <- p + geom_bar(stat = "identity", position = "stack", lwd = 1.5,
width = 0.5, colour = "grey", fill = "black")
p <- p + theme(panel.grid = element_blank())
p
An issue with this solution is that very small groups will not be seen (e.g., when Rank = 4 variable F3 = 10; this small value is completely covered by the white bar outline).
Your sample data:
DF1 <- structure(list(Rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L), variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L), .Label = c("F1", "F2", "F3"), class = "factor"),
value = c(500L, 400L, 300L, 200L, 250L, 100L, 155L, 90L,
50L, 30L, 100L, 10L)), row.names = c(NA, -12L), .Names = c("Rank",
"variable", "value"), class = "data.frame")

Plot line graphs with with a factor variable on x-axis

I failed to plot a line graph on x axis using initials of months with this code:
yrange<-range(c(Estimate, lcl,ucl))
plot(nmonth, Estimate, type = "b", pch = 20, ylim = yrange,
xlab = "Months", ylab = expression(hat(beta) * " estimates" * " & " * " confidence " * " levels "))
lines(nmonth, ucl, lty = 2)
lines(nmonth, lcl, lty = 2)
abline(h = 0, lty = 3)
and with this as well.
ggplot(data=df1, aes(x=nmonth, y=Estimate)) + geom_line() + geom_point() + geom_line(size=0.1) + geom_line(aes(y = ucl)) + geom_line(aes(y = lcl))
Using numeric month(nmonth) I can produce a plot, but labels are not what I wished to have.
How can I plot with all initials of the months on x axis?
The data is this one:
structure(list(Estimate = c(0.00571942142644563, 0.0111649330056159,
0.0143761435860972, 0.00739757934210567, 0.00110764672100624,
0.00168566337236168, 0.00392476757483504, 0.00234423892025447,
0.000166724737089459, -0.0014580012873366, -0.00197786373686253,
-0.00216289530501664), se = c(0.004018593736177, 0.0040534199847734,
0.0041113846550833, 0.00402501059422328, 0.00393358629717884,
0.00370406599461686, 0.003796651550619, 0.00392460643968604,
0.00376380927915926, 0.00391408378704714, 0.00388845564349082,
0.00394365265230613), nmonth = 1:12, month = structure(c(1L,
2L, 3L, 4L, 3L, 1L, 1L, 4L, 5L, 6L, 7L, 8L), .Label = c("J",
"F", "M", "A", "S", "O", "N", "D"), class = "factor"), lcl = c(-0.00215702229646129,
0.00322022983546004, 0.00631782966213393, -0.000491441422571959,
-0.00660218242146429, -0.00557430597708737, -0.0035166694643782,
-0.00534798970153017, -0.00721034145006269, -0.00912960550994899,
-0.00959923679810454, -0.00989245450353666), ucl = c(0.0135958651493525,
0.0191096361757718, 0.0224344575100605, 0.0152866001067833, 0.00881747586347677,
0.00894563272181073, 0.0113662046140483, 0.0100364675420391,
0.00754379092424161, 0.00621360293527579, 0.00564350932437948,
0.00556666389350337)), .Names = c("Estimate", "se", "nmonth",
"month", "lcl", "ucl"), class = "data.frame", row.names = c(NA,
-12L))
With ggplot2, it is easier if you first melt your data this way :
df <- melt(df, id.vars=c("month","nmonth"))
Then you can directly do :
ggplot(data=df, aes(x=month, y=value, group=variable)) + geom_line(aes(color=variable))
Note that the graph is not correct because you are using only the first letter of your months names.

Resources