I am trying to show the distribution of data between three different methods(FAP, One PIT (onetrans), Two PIT (twotrans), shown in facets below) for measuring the forest fuels. My count on the y-axis is the number of sample points that estimate the grouped value on the x-axis (Total.kg.m2). The Total.kg.m2 is a continuous variable. I don't particularly care how big the binwidth is on the x-axis is but I want only values that are exactly zero to be above the "0" label. My current graph [1] is misrepresentative because there are no sample points that estimate "0" for the FAP method. Below is some example data and my code. How can I do this more effectively? My dataframe is called "cwd" but I have included a subset at the bottom.
My current graph:
The code for my current graph:
method_names <- c(`FAP` = "FAP", `onetrans` = "PIT - One Transect ", `twotrans` ="PIT - Two Transects")
ggplot(sampleData, aes(Total.kg.m2)) +
geom_histogram(bins=40, color = "black", fill = "white") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
legend.position = "none",axis.text=element_text(size=10), axis.title =
element_text(size = 12)) +
scale_x_continuous(name= expression("kg m"^"-2"), breaks =seq(0,16,1)) +
scale_y_continuous(name = "Count", breaks = seq(0, 80,10), limits= c(0,70)) +
facet_grid(.~method) +
facet_wrap(~method, ncol =1, labeller = as_labeller(method_names)) +
theme(strip.text.x = element_text(size =14),
strip.background = element_rect(color = "black", fill = "gray"))
I don't think using geom_bar gets me what I want and I tried changing the binwidth to 0.05 in geom_histogram but then I get bins too small. Essentially, I think I'm trying to change my data from continuous numeric to factors but I'm not sure how to make it work.
Here is some sample data:
sampleData
Site Treatment Unit Plot Total.Tons.ac Total.kg.m2 method
130 Thinning CO 10 7 0.4500000 0.1008000 twotrans
351 Shelterwood CO 12 1 7.2211615 1.6175402 twotrans
88 Thinning NB 3 7 1.1400000 0.2553600 twotrans
224 Shelterwood NB 2 3 2.1136105 0.4734487 onetrans
54 Thinning SB 9 11 1.8857743 0.4224134 onetrans
74 Thinning SB 1 3 0.8500000 0.1904000 twotrans
328 Shelterwood DB 7 11 0.8740906 0.1957963 twotrans
341 Shelterwood CO 10 5 2.4210886 0.5423239 twotrans
266 Shelterwood WB 9 7 1.0092961 0.2260823 onetrans
405 Shelterwood WB 9 5 7.0029263 1.5686555 FAP
332 Shelterwood NB 8 7 2.8059152 0.6285250 twotrans
126 Thinning SB 9 11 1.4900000 0.3337600 twotrans
295 Shelterwood NB 2 5 7.6567281 1.7151071 twotrans
406 Shelterwood WB 9 7 3.0703135 0.6877502 FAP
179 Thinning FB 6 9 13.2916773 2.9773357 FAP
185 Thinning FB 7 9 5.3594318 1.2005127 FAP
39 Thinning FB 7 5 0.0000000 0.0000000 onetrans
187 Thinning NB 8 1 0.9477477 0.2122955 FAP
10 Thinning FB 2 7 0.0000000 0.0000000 onetrans
102 Thinning SB 5 11 0.0000000 0.0000000 twotrans
dput(sampleData)
structure(list(Site = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("Shelterwood",
"Thinning"), class = "factor"), Treatment = structure(c(1L, 1L,
4L, 4L, 5L, 5L, 2L, 1L, 6L, 6L, 4L, 5L, 4L, 6L, 3L, 3L, 3L, 4L,
3L, 5L), .Label = c("CO", "DB", "FB", "NB", "SB", "WB"), class = "factor"),
Unit = c(10L, 12L, 3L, 2L, 9L, 1L, 7L, 10L, 9L, 9L, 8L, 9L,
2L, 9L, 6L, 7L, 7L, 8L, 2L, 5L), Plot = c(7L, 1L, 7L, 3L,
11L, 3L, 11L, 5L, 7L, 5L, 7L, 11L, 5L, 7L, 9L, 9L, 5L, 1L,
7L, 11L), Total.Tons.ac = c(0.45, 7.221161504, 1.14, 2.113610483,
1.885774282, 0.85, 0.874090569, 2.421088641, 1.009296069,
7.002926269, 2.805915201, 1.49, 7.656728085, 3.07031351,
13.29167729, 5.359431807, 0, 0.947747726, 0, 0), Total.kg.m2 = c(0.1008,
1.617540177, 0.25536, 0.473448748, 0.422413439, 0.1904, 0.195796287,
0.542323856, 0.22608232, 1.568655484, 0.628525005, 0.33376,
1.715107091, 0.687750226, 2.977335712, 1.200512725, 0, 0.212295491,
0, 0), method = structure(c(3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 3L), .Label = c("FAP",
"onetrans", "twotrans"), class = "factor")), .Names = c("Site",
"Treatment", "Unit", "Plot", "Total.Tons.ac", "Total.kg.m2",
"method"), row.names = c(130L, 351L, 88L, 224L, 54L, 74L, 328L,
341L, 266L, 405L, 332L, 126L, 295L, 406L, 179L, 185L, 39L, 187L,
10L, 102L), class = "data.frame")
Related
Below is the desired order: 10% runoff, 20% runoff, NOEC, LC50. Thank you in anticipation.
My code:
ggplot(pest_ana, aes(x = Pesticides, y = `Concentration (ug/L)`, fill= Concentrations)) +
geom_bar(stat = 'identity', position='dodge') + scale_fill_discrete (breaks = c("10% runoff","20% runoff","NOEC", "LC50"))+ scale_x_discrete(guide = guide_axis(n.dodge=2)) + scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
Pesticides Concentrations Concentration (ug/L) Individual
Cypermethrin NOEC 4.8 1
Deltamethrin NOEC 3.37 2
Actara NOEC 20000 3
Carbofuran NOEC 40.6 4
Methomyl NOEC 260 5
Endosulfan NOEC 10.2 6
Fenvalerate NOEC 6.02 7
Glyphosate NOEC 5000 8
Mancozeb NOEC 301 9
Cypermethrin LC50 0.03 1
Deltamethrin LC50 0.032 2
Actara LC50 322000 3
Carbofuran LC50 500 4
Methomyl LC50 4015 5
Endosulfan LC50 0.05 6
Fenvalerate LC50 15 7
Glyphosate LC50 36800 8
Mancozeb LC50 11680 9
Cypermethrin 20% runoff 3.95 1
Deltamethrin 20% runoff 0.69 2
Actara 20% runoff 3.95 3
Carbofuran 20% runoff 78.99 4
Methomyl 20% runoff 10.86 5
Endosulfan 20% runoff 41.47 6
Fenvalerate 20% runoff 8.85 7
Glyphosate 20% runoff 14.22 8
Mancozeb 20% runoff 74.05 9
Cypermethrin 10% runoff 1.97 1
Deltamethrin 10% runoff 0.35 2
Actara 10% runoff 1.97 3
Carbofuran 10% runoff 39.49 4
Methomyl 10% runoff 5.43 5
Endosulfan 10% runoff 20.73 6
Fenvalerate 10% runoff 4.42 7
Glyphosate 10% runoff 7.11 8
Mancozeb 10% runoff 37.03 9
You can use the following code
library(scales)
library(tidyverse)
df$Concentrations <- factor(df$Concentrations, levels = c("10% runoff", "20% runoff", "NOEC", "LC50"))
df %>%
ggplot(aes(x = Pesticides, y = `Concentration (ug/L)`, fill= Concentrations)) +
geom_bar(stat = 'identity', position='dodge') +
scale_fill_discrete (breaks = c("10% runoff","20% runoff","NOEC", "LC50"))+
scale_x_discrete(guide = guide_axis(n.dodge=2)) +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
Data
df = structure(list(Pesticides = structure(c(3L, 4L, 1L, 2L, 9L, 5L,
6L, 7L, 8L, 3L, 4L, 1L, 2L, 9L, 5L, 6L, 7L, 8L, 3L, 4L, 1L, 2L,
9L, 5L, 6L, 7L, 8L, 3L, 4L, 1L, 2L, 9L, 5L, 6L, 7L, 8L), .Label = c("Actara",
"Carbofuran", "Cypermethrin", "Deltamethrin", "Endosulfan", "Fenvalerate",
"Glyphosate", "Mancozeb", "Methomyl"), class = "factor"), Concentrations = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("10% runoff", "20% runoff", "NOEC", "LC50"
), class = "factor"), `Concentration (ug/L)` = c(4.8, 3.37, 20000,
40.6, 260, 10.2, 6.02, 5000, 301, 0.03, 0.032, 322000, 500, 4015,
0.05, 15, 36800, 11680, 3.95, 0.69, 3.95, 78.99, 10.86, 41.47,
8.85, 14.22, 74.05, 1.97, 0.35, 1.97, 39.49, 5.43, 20.73, 4.42,
7.11, 37.03), Individual = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA,
-36L), class = "data.frame")
Using the following code I made a violin plot for most of my variables, and added points where I didn't have sufficient information for some data. I'd like to add sample sizes to the right end of each violin, but I haven't been able to find a way to do this.
#dataset
str(threats)
'data.frame': 60 obs. of 3 variables:
$ threat : Factor w/ 7 levels "weather","competition",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Species : Factor w/ 5 levels "Bank","Barn",..: 1 1 1 1 1 1 1 1 1 1 ...
$ effect.abs : int 18 13 0 43 43 0 23 13 14 16 ...
#added to help 0 values with logarithmic axis scale
threats$effect.abs1<-threats$effect.abs+0.1
#subset of data with insufficient info for violin plot
#plotted with geom_dotplot
threats.sub<-subset(threats,
(threat=="competition") |
(threat=="disease" & Species =="Barn") |
(threat=="insect_availability") |
(threat=="weather" &
(Species=="Cliff" | Species=="Purple")) |
(threat=="incidental_loss") |
(threat=="predation" & Species=="Bank"))
ggplot() +
geom_dotplot(data=threats.sub, aes(x=Species, y=effect.abs1, fill=Species),
binaxis='y', stackdir='center', binwidth =.1) +
geom_violin(data=threats, aes(x=Species, y=effect.abs1, fill=Species)) +
coord_flip() +
facet_wrap(~threat, ncol=2, labeller = labeller(threat=facet.labels),
strip.position = "left") +
scale_y_log10(breaks=c(0.1,1,10,100), labels=c(0,1,10,100)) +
labs(x=("Threat"), y=("Absolute effect on adult survival (%)")) +
theme_bw() +
theme(axis.text=element_text(size=9, colour="black"),
axis.title=element_text(size=10, colour="black"),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid=element_blank(),
panel.border=element_rect(colour="black", size=1),
plot.margin=unit(c(.3,.3,.4,.4), "cm"),
strip.background=element_rect(fill=NA, colour=NA), #element_blank(),
legend.position="right")
My attempts to use the solution below (provided in other questions), only resulted in an error message.
give.n <- function(x){
return(c(y = mean(x), label = length(x)))
}
stat_summary(fun.data = give.n, geom = "text") #added to ggplot code above
Error in if (empty(data)) { : missing value where TRUE/FALSE needed
I would appreciate any help with this issue. I'd prefer to find a way for R to calculate the sample sizes (rather then me providing each one), as I also keep getting this following warning message when I produce this figure and I'd like to double-check that all the data is being displayed correctly.
Warning messages:
1: In max(data$density) : no non-missing arguments to max; returning -Inf
2: In max(data$density) : no non-missing arguments to max; returning -Inf
3: In max(data$density) : no non-missing arguments to max; returning -Inf
Thanks!
As requested:
structure(list(threat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L), .Label = c("weather", "competition", "incidental_loss",
"contaminants", "insect_availability", "disease", "predation"
), class = "factor"),
Species = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L,
5L, 5L, 1L, 2L, 2L, 2L, 2L), .Label = c("Bank", "Barn", "Cliff",
"Tree", "Purple"), class = "factor"),
effect.abs = c(18L,
13L, 0L, 43L, 43L, 0L, 23L, 13L, 14L, 16L, 18L, 29L, 0L, 40L,
0L, 20L, 53L, 0L, 17L, 15L, 13L, 25L, 19L, 25L, 0L, 0L, 0L, 14L,
20L, 0L, 0L, 0L, 0L, 4L, 1L, 0L, 1L, 1L, 1L, 1L, 12L, 0L, 30L,
95L, 10L, 3L, 7L, 12L, 14L, 100L, 0L, 23L, 13L, 5L, 0L, 58L,
20L, 4L, 9L, 0L)), row.names = c(NA, -60L), class = "data.frame")
The way to tackle this is to precompute your n's
E.G.
summary_df <- df %>%
group_by(threat, Species, effect.abs1) %>%
summarise(n = n())
Then add it to your graph
+ geom_label(aes(x = 100, y = effect.abs1, label = n), data = summary_df)
Thanks for helpful comments from #Jack Brookes for getting me started on this. Here is my final solution for this issue.
#first summarize n's for all data
summary_df_all <- threats %>%
group_by(threat, Species) %>%
summarise(n = n(), maxE=max(effect.abs1))
#next summarize n's for the subset of data I'm not interested in getting the n's for
summary_df_sub <- threats.sub %>%
group_by(threat, Species) %>%
summarise(n = n(), maxE=max(effect.abs1)) %>%
mutate(probability = 0)
#combine these summaries, and filter out the points that will not be displayed
summary_df_violin <- left_join(summary_df_all, summary_df_sub,
by = c("threat", "Species")) %>%
mutate(probability = ifelse(is.na(probability), 1,
probability)) %>% filter(probability > 0)
#and plot
ggplot() +
geom_dotplot(data=threats.sub, aes(x=Species, y=effect.abs1, colour=Species, fill=Species),
binaxis='y', stackdir='center', binwidth =.09) +
geom_violin(data=threats, aes(x=Species, y=effect.abs1, colour=Species, fill=Species), size=1.1) +
#geom_label(aes(x=100, y=effect.abs1, label=n), data=summary_df)
geom_text(data=summary_df_violin, aes(y=maxE.x, x=Species, label=n.x), nudge_y=.2) +
coord_flip() +
facet_wrap(~threat, ncol=2, labeller = labeller(threat=facet.labels),
strip.position = "left") +
scale_y_log10(breaks=c(0.1,1,10,100), labels=c(0,1,10,100)) +
labs(x=("Threat"), y=("Absolute effect on adult survival (%)")) +
theme_bw() +
theme(axis.text=element_text(size=9, colour="black"),
axis.title=element_text(size=10, colour="black"),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid=element_blank(),
panel.border=element_rect(colour="black", size=1),
plot.margin=unit(c(.3,.3,.4,.4), "cm"),
strip.background=element_rect(fill=NA, colour=NA),
strip.text=element_text(size=9, colour="black"),
legend.position="right")
suppose I have two boxplots.
trial1 <- ggplot(completionTime, aes(fill=Condition, x=Scenario, y=Trial1))
trial1 + geom_boxplot()+geom_point(position=position_dodge(width=0.75)) + ylim(0, 160)
trial2 <- ggplot(completionTime, aes(fill=Condition, x=Scenario, y=Trial2))
trial2 + geom_boxplot()+geom_point(position=position_dodge(width=0.75)) + ylim(0, 160)
How can I plot trial 1 and trial 2 on the same plot and same respective X? they have the same range of y.
I looked at geom_boxplot(position="identity"), but that plots the two conditions(fill) on the same X.
I want to plot two y column on the same X.
Edit: the dataset
User Condition Scenario Trial1 Trial2
1 1 ME a 67 41
2 1 ME b 70 42
3 1 ME c 40 15
4 1 ME d 65 23
5 1 ME e 45 45
6 1 SE a 100 34
7 1 SE b 54 23
8 1 SE c 70 23
9 1 SE d 56 15
10 1 SE e 30 20
11 2 ME a 42 23
12 2 ME b 22 12
13 2 ME c 28 8
14 2 ME d 22 8
15 2 ME e 38 37
16 2 SE a 59 18
17 2 SE b 65 14
18 2 SE c 75 7
19 2 SE d 37 9
20 2 SE e 31 7
dput()
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Condition = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("ME", "SE"), class = "factor"), Scenario =
structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L), .Label = c("a", "b", "c", "d", "e"), class = "factor"),
Trial1 = c(67L, 70L, 40L, 65L, 45L, 100L, 54L, 70L, 56L,
30L, 42L, 22L, 28L, 22L, 38L, 59L, 65L, 75L, 37L, 31L), Trial2 = c(41L,
42L, 15L, 23L, 45L, 34L, 23L, 23L, 15L, 20L, 23L, 12L, 8L,
8L, 37L, 18L, 14L, 7L, 9L, 7L)), .Names = c("User", "Condition",
"Scenario", "Trial1", "Trial2"), class = "data.frame", row.names = c(NA,
-20L))
You could try using interaction to combine two of your factors and plot against a third. For example, assuming you want to fill by condition as in your original code:
library(tidyr)
completionTime %>%
gather(trial, value, -Scenario, -Condition, -User) %>%
ggplot(aes(interaction(Scenario, trial), value)) + geom_boxplot(aes(fill = Condition))
Result:
This is my datastructure:
Accession Source Name NucSource Order color Counts Normalized
1 Str1 Our Str1 ch 1 #1C9099 66827 2.318683e-01
2 Str1_plasmid Our Str1 pl 2 #1C9099 26 9.021169e-05
3 Str2 Our Str2 ch 3 #1C9099 288211 1.000000e+00
4 Str2_plasmid Our Str2 pl 4 #1C9099 71858 2.493243e-01
5 Str3 Our Str3 ch 5 #1C9099 40600 1.408690e-01
6 Str3_plasmid Our Str3 pl 6 #1C9099 25266 8.766494e-02
7 Str4 NCBI Str4 ch 7 #A6BDDB 21339 7.403951e-02
8 Str5 NCBI Str5 ch 8 #A6BDDB 37776 1.310706e-01
9 Str6 NCBI Str6 ch 9 #A6BDDB 3596 1.247697e-02
10 Str7 NCBI Str7 ch 10 #A6BDDB 5384 1.868076e-02
11 Str7_plasmid NCBI Str7 pl 11 #A6BDDB 40903 1.419203e-01
12 Str8 NCBI Str8 ch 12 #A6BDDB 8948 3.104670e-02
13 Str9 NCBI Str9 ch 13 #A6BDDB 16557 5.744750e-02
14 Str9_plasmid NCBI Str9 pl 14 #A6BDDB 3738 1.296966e-02
15 Str10 NCBI Str10 ch 15 #A6BDDB 10067 3.492927e-02
16 Str11 NCBI Str11 ch 16 #A6BDDB 7306 2.534948e-02
17 Str12 NCBI Str12 ch 17 #A6BDDB 10313 3.578281e-02
I run the following code on it:
p<-ggplot(data=myData, aes(x=Name, y=Normalized, fill=Source)) +
theme_few() +
xlab("Strain") + ylab("Normalized counts") +
geom_bar(stat="identity", aes(fill=myData$Source), colour="black", position="dodge") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.4)) +
geom_text(aes(label=myData$NucSource), vjust=-0.5) +
theme(legend.position="right") +
scale_fill_manual(values=as.character(color.convert$color)[2:3])
print(p)
And this is the result:
What I would like to have now, is that for examples like "Str1" where I have "chr" and "pl" the two bars should be horizontally next to each other (Also for "Str2", "Str3", "Str7", "Str8"). But for cases like "Str4" where I have only "ch" there should be only one bar.
So the bars should not be on top of each other but horizontally arranged.
EDIT -- dput(head(myData, 20)):
structure(list(Accession = structure(c(16L, 17L, 12L, 13L, 14L, 15L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("CP000517",
"CP002081", "CP002427", "CP002429", "CP002430_plasmid", "CP003799",
"CP009907", "CP009908_plasmid", "CP011386", "CP012381", "CP016827",
"FAM22155", "FAM22155_plasmid", "FAM8105", "FAM8105_plasmid",
"FAM8627", "FAM8627_plasmid"), class = "factor"), Source =
structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L ), .Label = c("NCBI", "Our"), class = "factor"), Name =
structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 9L,
10L, 11L, 12L), .Label = c("FAM8627", "FAM22155", "FAM8105", "DPC
4571", "CNRZ32", "H9", "H10", "R0052", "KLDS1.8701", "MB2-1",
"CAUH18", "D76"), class = "factor"), NucSource = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label =
c("ch", "pl"), class = "factor"), Order = 1:17, color =
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L ), .Label = c("#1C9099", "#A6BDDB", "#ECE2F0"), class =
"factor"),
Counts = c(66827L, 26L, 288211L, 71858L, 40600L, 25266L,
21339L, 37776L, 3596L, 5384L, 40903L, 8948L, 16557L, 3738L,
10067L, 7306L, 10313L), Normalized = c(0.231868318697066,
9.02116851889761e-05, 1, 0.249324279781133, 0.140869016102786,
0.0876649399224873, 0.0740395057787524, 0.131070639219183,
0.0124769699976753, 0.0186807581945172, 0.141920329203257,
0.0310466984258061, 0.0574474950643799, 0.0129696645860151,
0.0349292705691316, 0.0253494835381023, 0.0357828118982273
)), .Names = c("Accession", "Source", "Name", "NucSource", "Order", "color", "Counts", "Normalized"), row.names = c(NA, 17L),
class = "data.frame")
You need to dodge on a different column than fill:
ggplot(data=myData, aes(x = Name, y = Normalized, dodge = NucSource, fill = Source)) +
geom_text(aes(label = NucSource), vjust = -0.5) +
geom_col(colour="black", position="dodge") +
labs(x = "Strain", y = "Normalized counts") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.4),
legend.position = "right")
PS: I changed some bits, because I was not sure which theme or extra packages you are using.
I have a melted data.frame, dput(x), below:
## dput(x)
x <- structure(list(variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"),
value = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Never Heard of",
"Heard of but Not at all Familiar",
"Somewhat Familiar", "Familiar", "Very Familiar", "Extremely Familiar"
), class = "factor"), freq = c(10L, 24L, 32L, 90L, 97L, 69L,
15L, 57L, 79L, 94L, 58L, 19L, 11L, 17L, 34L, 81L, 94L, 85L, 4L,
28L, 59L, 114L, 82L, 35L)), .Names = c("variable", "value", "freq"
), row.names = c(NA, -24L), class = "data.frame")
Which looks like this (for those of you who don't need a test set):
variable value freq
1 a Never Heard of 10
2 a Heard of but Not at all Familiar 24
3 a Somewhat Familiar 32
4 a Familiar 90
5 a Very Familiar 97
6 a Extremely Familiar 69
7 b Never Heard of 15
8 b Heard of but Not at all Familiar 57
9 b Somewhat Familiar 79
10 b Familiar 94
11 b Very Familiar 58
12 b Extremely Familiar 19
13 c Never Heard of 11
14 c Heard of but Not at all Familiar 17
15 c Somewhat Familiar 34
16 c Familiar 81
17 c Very Familiar 94
18 c Extremely Familiar 85
19 d Never Heard of 4
20 d Heard of but Not at all Familiar 28
21 d Somewhat Familiar 59
22 d Familiar 114
23 d Very Familiar 82
24 d Extremely Familiar 35
Now, I can make a nice and pretty plot akin to this:
ggplot(x, aes(variable, freq, fill = value)) +
geom_bar(position = "fill") +
coord_flip() +
scale_y_continuous("", formatter="percent")
Question
What I would like to do is sort a,b,c,d by the highest to lowest "freq" of "Extremely Familiar"
?relevel and ?reorder haven't provided any constructive examples for this usage.
Your help, is always appreciated.
Cheers,
BEB
Here is another way to do it:
tmp <- subset(x, value=="Extremely Familiar")
x$variable <- factor(x$variable, levels=levels(x$variable)[order(-tmp$freq)])
Here is one way:
tmpfun <- function(i) {
tmp <- x[i,]
-tmp[ tmp$value=='Extremely Familiar', 'freq' ]
}
x$variable <- reorder( x$variable, 1:nrow(x), tmpfun )