I've mapped colors in R before. But something isn't clicking.
Ideally, I'd like to map color names to the variable value "student", but I'm getting a length error. However, the number of students being mapped to colors is equal. Also, I've tried creating two separate color columns - as a string and as an id. The colors then end up getting labeled on the legend. Adding the manual scale color options doesn't do much.
Here is a sample of the data:
m3 <- structure(list(student = structure(c(7L, 11L, 9L, 2L, 8L, 4L), .Label = c("a","b", "c", "d", "e", "f", "g","h", "i", "j", "k", "l", "m", "n","o", "p"), class = "factor"), colorz = structure(4:9, .Label = c("#66CC99","#9999CC", "#CC6666", "#FF0000FF", "#FF2000FF", "#FF4000FF","#FF6000FF", "#FF8000FF", "#FF9F00FF", "#FFBF00FF", "#FFDF00FF","#FFFF00FF", "green", "red"), class = "factor"), variable = structure(c(1L,1L, 1L, 1L, 1L, 1L), .Label = c("pre", "c1", "c2","b1", "c3", "c4", "b2", "u1", "u2","u3", "u4", "total"), class = "factor"), value = c(3, 31,49, 88, 31, 40), col = c("#FF0000FF", "#FF2000FF", "#FF4000FF","#FF6000FF", "#FF8000FF", "#FF9F00FF")), .Names = c("student","colorz", "variable", "value", "col"), row.names = c(NA, 6L), class = "data.frame")
And then graphing with: ggplot(m3, aes(x=variable, y=value, group=student,linetype=student)) + geom_line(size=.75) + geom_point(size=2) + xlab("test") + ylab(paste("score")) + geom_hline(yintercept=70, linetype="dashed", size=3) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_color_identity()
The example is much smaller than the actual data.
Ideally, I'd like to be able to use something like: color.names <- setNames( c( "#FF0000FF", "#FF2000FF", "#FF4000FF", "#FF6000FF", "#FF8000FF", "#FF9F00FF","#CC6666", "#9999CC", "#66CC99", "#FFBF00FF", "#FFDF00FF", "#FFFF00FF","green","red"), c("a","b","c", "d","e","f","g","h","i","j","k","l","m","n" ))
and call the colors. I'm not sure what's messing up. It could look as if I were trying to map 12 colors to 14 values, but I've tried 14 as well.
First just assign color to student like this, aes(color = student), and then just use scale_color_manual() instead. Since you already named your color vector, ggplot will handle the matching based on names, unless a name isn't in the palette then that value will be dropped and not plotted:
ggplot(m3, aes(x=variable, y=value, group=student, linetype=student, color = student)) +
geom_line(size=.75) + geom_point(size=2) + xlab("test") +
ylab(paste("score")) + geom_hline(yintercept=70, linetype="dashed", size=3) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_color_manual(values = color.names)
Looks like this now:
Related
I still very new using GGPLOT, but ive created the following graphic in which i would like to switch the colors blue and red. Should be simple enough but i cannot figure it out.
df <- structure(list(Sex = c("M", "M", "M", "M", "M", "M", "M", "W",
"W", "W", "W", "W", "W", "W"), age_cat = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("<40",
"41-50", "51-60", "61-70", "71-80", "81-90", "90+"), class = "factor"),
DD = c(42L, 88L, 289L, 558L, 527L, 174L, 22L, 27L, 36L, 206L,
347L, 321L, 160L, 29L), pop = c(36642L, 16327L, 20232L, 18068L,
14025L, 5555L, 1293L, 35887L, 16444L, 20178L, 17965L, 14437L,
7150L, 2300L), proportion = c(0.114622564270509, 0.538984504195504,
1.428430209569, 3.08833296435687, 3.75757575757576, 3.13231323132313,
1.7014694508894, 0.0752361579402012, 0.218924835806373, 1.02091386658737,
1.9315335374339, 2.22345362609961, 2.23776223776224, 1.26086956521739
), lower = c(0.082621962613957, 0.432499099174075, 1.26946115577044,
2.8408823120445, 3.44891013496043, 2.69000601596679, 1.06929480146528,
0.0495867729368698, 0.153377923598767, 0.886828947142727,
1.73530361873497, 1.98914206124244, 1.90751612365318, 0.846006018532107
), upper = c(0.154905173671422, 0.663628389658291, 1.6015714811397,
3.35102939015014, 4.08561035940466, 3.6247050150149, 2.56476800800746,
0.10944592449968, 0.302956684874059, 1.16937842460687, 2.14353263545661,
2.47728262910991, 2.60770261266057, 1.80583393021473)), row.names = c(NA,
-14L), class = "data.frame")
Below is the script i've used, in which i get (sex = M) in red and (sex = W)in bliue.
ggplot(data = prevalence2021GGPLOT,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))
How do i make sex = M blue and sex = W red??
You can use scale_color_manual to manually change the colours in ggplot2. The first colour corresponds to the first modality of your fill variable.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+ geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))+
scale_color_manual(values=c("blue", "red"))
As the result :
you can find the documentation of the ggplot2 graphics on R here :
data_to_viz
scale_color_manual
This works. I need to point to df to use your sample df. Edited it per suggestion.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
scale_color_manual(values=c(M="darkblue", W="darkred"))
Suppose I have a LOESS regression plot where the x-axis correspond to a categorical variable:
library(ggplot2)
b <- structure(list(Expression = c(16.201081535896, 16.5138880401065,
16.4244615700828, 1.62923743262849, 3.35379087562868, 6.99935683212696,
4.81932543877313, 3.85300704208448, 7.32436891427261, 4.23627699164079,
6.95731601433845, 4.33315521361287, 5.50596153247422, 13.0788494583573,
13.6909487566244, 12.9520674350314), stage = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor")), row.names = c(NA, 16L
), class = "data.frame")
ggplot(b, aes(as.numeric(stage), Expression)) +
geom_point() +
geom_smooth(span = 0.8) +
scale_x_continuous(breaks = as.numeric(b$stage), labels = b$stage, minor_breaks = NULL)
I want to use 2 different line types at different sections of a LOESS regression.
Specifically, I would like to have a dashed line between A and B, a continuous line between B and D, and a dashed line again between D and E.
So I follow the example in:
conditional plot linetype in ggplot2
But the connection in the left and right are lost, and only the central part of the loess regression remains.
line.groups <- plyr::mapvalues(b$stage,
from = c("A", "B", "C", "D", "E"),
to = c(0, 1, 1, 1, 2))
ggplot(b, aes(as.numeric(stage), Expression)) +
geom_point() +
geom_smooth(aes(group=line.groups, linetype=line.groups), span = 0.8) +
scale_linetype_manual(values=c(2,1,2)) +
guides(linetype=FALSE) +
scale_x_continuous(breaks = as.numeric(b$stage), labels = b$stage, minor_breaks = NULL)
Is there a way to change the linetype of the geom_smooth ggplot, conditional to the x-axis (where x is a factor)?
EDIT:
I tried using three separate calls to geom_smooth for each section as suggested by a comment, but the standard error bounds won't be "smooth" between each call.
ggplot(b, aes(as.numeric(stage), Expression)) +
geom_point() +
geom_smooth(data=b[b$stage %in% c("A", "B"),], linetype = "dashed", span = 0.8) +
geom_smooth(data=b[b$stage %in% c("B", "C", "D"),], linetype = "solid", span = 0.8) +
geom_smooth(data=b[b$stage %in% c("D", "E"),], linetype = "dashed",span = 0.8) +
scale_linetype_manual(values=c(2,1,2)) +
guides(linetype=FALSE) +
scale_x_continuous(breaks = as.numeric(b$stage), labels = b$stage, minor_breaks = NULL)
Link to sub-optimal solution
Thanks
For completeness, I will post here the solution offered by user OTStats in the comments above:
ggplot(b, aes(as.numeric(stage), Expression)) +
geom_point() +
geom_smooth(data=b[b$stage %in% c("A", "B"),], linetype = "dashed", span = 0.8,se = FALSE) +
geom_smooth(data=b[b$stage %in% c("B", "C", "D"),], linetype = "solid", span = 0.8, se = FALSE) +
geom_smooth(data=b[b$stage %in% c("D", "E"),], linetype = "dashed",span = 0.8, se = FALSE) +
geom_smooth(linetype = "blank",span = 0.4) +
guides(linetype=FALSE) +
scale_x_continuous(breaks = as.numeric(b$stage), labels = b$stage, minor_breaks = NULL)
Note that the level of smoothing needs to be adjusted in the fourth call of geom_smooth to produce satisfactory results but, overall, this trick solves the question.
Link to solution
If I plot df with the code below, I can put the n for each column over the column itself, as seen in this example plot. What I would like to do is also put the percentage for each column in the label. That is the percentage of the total that the column makes up. So, for example, the label on the first column would read 127(42.9%), instead of just 127. How could I do that?
df <- structure(list(Letter = structure(1:7,
.Label = c("A", "B", "C", "D", "E", "F", "G"),
class = "factor"), Freq = c(127L, 101L, 24L, 19L, 3L, 0L, 22L)),
.Names = c("Letter", "Freq"),
row.names = c(NA, -7L),
class = "data.frame")
ggplot(df, aes(Letter, Freq, label = Freq)) +
geom_col() +
geom_text(size = 3, position = position_dodge(width = 1), vjust = -0.25)
Just create the text you want to use as a label.
df$pct = df$Freq / sum(df$Freq) * 100
df$label = sprintf("%s (%s%%)", df$Freq, round(df$pct, 1))
ggplot(df, aes(Letter, Freq, label = label)) +
geom_col() +
geom_text(size = 3, position = position_dodge(width = 1), vjust = -0.25)
My dataset consists of 36 "sites" with 12 x 3 sites being a replicate triplet. The dataset has two series "R" and "D". Some R and D - triples relate to each other, indicated by the numerical index after the letter: So, the Series R2i and D2i belong together, R3i and D3i and so on. As a twist R7i and R1i have no equivalent in the D world.
In the plot, all four site-replicates are coloured differently, but I wanted to color the related groups equally, so that related triangles appear in the same color.
In the example, the triangles D2 and R2 should be equally colored, and D3/R3 as well.
Here is the code:
sites <- structure(list(Sample = c("R21", "R22", "R23", "R31", "R32",
"R33", "D21", "D22", "D23", "D31", "D32", "D33"), X = c(-0.00591212751574749,
0.341048420056647, 0.430793063675178, 0.432479460946573, 0.239326674010454,
0.202491749301479, -0.951185318446942, -0.596668772966298, -0.939366882995036,
-0.522651768953026, -0.23338622249853, -0.176826307377661), Y = c(-0.0742136034318636,
-0.345049510288858, 0.183433103229042, -0.108409938703458, -0.0276483081985604,
-0.129547387046024, 0.26657938925131, 0.759126587423588, 0.103436047537972,
-0.178345595609023, -0.116710668776298, -0.0292021298523572),
Treatment = c("B", "B", "B", "C", "C", "C", "H", "H", "H",
"I", "I", "I"), Group = structure(c(2L, 2L, 2L, 3L, 3L, 3L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C", "D", "E",
"F", "G"), class = "factor")), .Names = c("Sample", "X",
"Y", "Treatment", "Group"), row.names = c(4L, 5L, 6L, 7L, 8L,
9L, 22L, 23L, 24L, 25L, 26L, 27L), class = "data.frame")
library(plyr)
find_hull <- function(df) df[chull(df$X, df$Y), ]
hulls <- ddply(sites , "Treatment", find_hull)
ggplot()+
geom_point(data=sites, aes(X, Y, col=Treatment), alpha=1,show_guide=FALSE) +
geom_text(data=sites, aes(X, Y, label=Sample), size=3, show_guide=FALSE) +
geom_polygon(data = hulls, aes(X, Y, colour=Treatment, fill=Treatment), lty="dashed", alpha = 0.2, show_guide=FALSE) +
theme_bw()+
coord_fixed()
Treatment gives the site-triplet and Group indicates related groups.
Dataframe hulls needs the factor levels in Treatment to correctly connects the points: If i pass Group to the color-argument, all points will be connected.
ggplot()+
geom_point(data=sites, aes(X, Y, col=Treatment), alpha=1,show_guide=FALSE) +
geom_text(data=sites, aes(X, Y, label=Sample), size=3, show_guide=FALSE) +
geom_polygon(data = hulls, aes(X, Y, colour=Group, fill=Group), lty="dashed", alpha = 0.2, show_guide=FALSE) +
theme_bw()+
coord_fixed()
So i was thinking if i can assign individual colorpalettes to each of the factors so i can assign the same color to factor levels if needed.
Any idea is appreciated, thank you!
You were very close. The solution is actually a bit easier than changing color palettes. You just need to add the group aesthetic.
ggplot() +
geom_point(data=sites, aes(X, Y, col=Treatment), alpha=1,show_guide=FALSE) +
geom_text(data=sites, aes(X, Y, label=Sample), size=3, show_guide=FALSE) +
geom_polygon(data = hulls, aes(X, Y, group = Treatment, colour = Group, fill = Group),
lty="dashed", alpha = 0.2, show_guide=FALSE) +
theme_bw() +
coord_fixed()
I failed to plot a line graph on x axis using initials of months with this code:
yrange<-range(c(Estimate, lcl,ucl))
plot(nmonth, Estimate, type = "b", pch = 20, ylim = yrange,
xlab = "Months", ylab = expression(hat(beta) * " estimates" * " & " * " confidence " * " levels "))
lines(nmonth, ucl, lty = 2)
lines(nmonth, lcl, lty = 2)
abline(h = 0, lty = 3)
and with this as well.
ggplot(data=df1, aes(x=nmonth, y=Estimate)) + geom_line() + geom_point() + geom_line(size=0.1) + geom_line(aes(y = ucl)) + geom_line(aes(y = lcl))
Using numeric month(nmonth) I can produce a plot, but labels are not what I wished to have.
How can I plot with all initials of the months on x axis?
The data is this one:
structure(list(Estimate = c(0.00571942142644563, 0.0111649330056159,
0.0143761435860972, 0.00739757934210567, 0.00110764672100624,
0.00168566337236168, 0.00392476757483504, 0.00234423892025447,
0.000166724737089459, -0.0014580012873366, -0.00197786373686253,
-0.00216289530501664), se = c(0.004018593736177, 0.0040534199847734,
0.0041113846550833, 0.00402501059422328, 0.00393358629717884,
0.00370406599461686, 0.003796651550619, 0.00392460643968604,
0.00376380927915926, 0.00391408378704714, 0.00388845564349082,
0.00394365265230613), nmonth = 1:12, month = structure(c(1L,
2L, 3L, 4L, 3L, 1L, 1L, 4L, 5L, 6L, 7L, 8L), .Label = c("J",
"F", "M", "A", "S", "O", "N", "D"), class = "factor"), lcl = c(-0.00215702229646129,
0.00322022983546004, 0.00631782966213393, -0.000491441422571959,
-0.00660218242146429, -0.00557430597708737, -0.0035166694643782,
-0.00534798970153017, -0.00721034145006269, -0.00912960550994899,
-0.00959923679810454, -0.00989245450353666), ucl = c(0.0135958651493525,
0.0191096361757718, 0.0224344575100605, 0.0152866001067833, 0.00881747586347677,
0.00894563272181073, 0.0113662046140483, 0.0100364675420391,
0.00754379092424161, 0.00621360293527579, 0.00564350932437948,
0.00556666389350337)), .Names = c("Estimate", "se", "nmonth",
"month", "lcl", "ucl"), class = "data.frame", row.names = c(NA,
-12L))
With ggplot2, it is easier if you first melt your data this way :
df <- melt(df, id.vars=c("month","nmonth"))
Then you can directly do :
ggplot(data=df, aes(x=month, y=value, group=variable)) + geom_line(aes(color=variable))
Note that the graph is not correct because you are using only the first letter of your months names.