Why isn't my barplot rearranging properly when faceting with ggplot? - r

So I have made this barplot with this code, bars organised in descending order, great!
na.omit(insect_tally_native_ranges)%>%
group_by(native_ranges)%>%
dplyr::summarise(freq=sum(n))%>%
ggplot(aes(x=reorder(native_ranges,freq),y=freq))+
geom_col(color="#CD4F39",fill="#CD4F39",alpha=0.8)+
coord_flip()+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()
And now I wanted to do the same but faceting by a variable called Period, here's the code:
ggplot(native_freq_period,
aes(y=reorder(native_ranges,freq),x=freq))+
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8)+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()+
facet_wrap(~Period)
But the plot came out like this:
Which is pretty annoying because it is the same code as above and the levels for the variable native_ranges should be organised again. But instead it gives me this lumpy order that isn't even the alphabetic order. So the reorder part is reordering but not by freq! Don't understand.
Here is the data:
structure(list(native_ranges = structure(c(6L, 10L, 11L, 7L,
3L, 5L, 1L, 1L, 8L, 6L, 3L, 5L, 2L, 4L, 5L, 7L, 7L, 7L, 8L, 9L,
11L), .Label = c("Afrotropic", "Afrotropic/Neotropic", "Australasia",
"Australasia/Neotropic", "Indomalaya", "Nearctic", "Neotropic",
"Neotropic/Nearctic", "Neotropic/Nearctic/Australasia", "Palearctic",
"Palearctic/Indomalaya"), class = "factor"), Period = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L, 1L, 2L,
3L, 2L, 4L, 3L), .Label = c("1896-1925", "1926-1955", "1956-1985",
"1986-2018"), class = "factor"), freq = c(21L, 13L, 12L, 11L,
10L, 10L, 4L, 4L, 4L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L)), row.names = c(NA, -21L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = "native_ranges", drop = TRUE, indices = list(
6:7, 12L, c(4L, 10L), 13L, c(5L, 11L, 14L), c(0L, 9L), c(3L,
15L, 16L, 17L), c(8L, 18L), 19L, 1L, c(2L, 20L)), group_sizes = c(2L,
1L, 2L, 1L, 3L, 2L, 4L, 2L, 1L, 1L, 2L), biggest_group_size = 4L, labels = structure(list(
native_ranges = structure(1:11, .Label = c("Afrotropic",
"Afrotropic/Neotropic", "Australasia", "Australasia/Neotropic",
"Indomalaya", "Nearctic", "Neotropic", "Neotropic/Nearctic",
"Neotropic/Nearctic/Australasia", "Palearctic", "Palearctic/Indomalaya"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame", vars = "native_ranges", drop = TRUE))

You have to arrange the order of the variable first before plotting. Since you didn't provide any reproducible data I am using the following data
drugs <- data.frame(drug = c("a", "b", "c"), effect = c(4.2, 9.7, 6.1))
ggplot(drugs, aes(drug, effect)) +
geom_col()
Now to change the order of the variable use factor
drugs$drug <- factor(drugs$drug,levels = c("b","a","c")) #This is the order I want
ggplot(drugs, aes(drug, effect)) +
geom_col()
Here I provided the levels in factor manually. You can either provide them manually or sort the order of the variable first separately and provide. See below,
drugs$drug <- factor(drugs$drug,levels = drugs[order(drugs$effect),]$drug)
ggplot(drugs, aes(drug, effect)) +
geom_col()
This should work with facet_wrap as well.

OK, finally figured it out with help from the other answer. You need to create another column that summarizes the total frequency so you can then reorder by that column. There may be a more efficient way to do it, but I create a new summary data.frame and then join it back to the original and then reorder based on the new column.
summary_data <- data %>%
ungroup() %>%
group_by(native_ranges) %>%
summarize(total = sum(freq))
data <- data %>%
left_join(summary_data)
ggplot(data, aes(y = reorder(native_ranges, total),x = freq)) +
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8) +
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges") +
theme_minimal()+
facet_wrap(~Period)

Related

ggarrange() function overvrites the color of my boxplots

I am making two boxplots and want to arrange them beside each other. I have made each of them look like I want when displaying them separately but when I use ggarrange() the colors disappear. This is my code for the plots:
BOX1_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
To arrange the plots I then write:
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)
The easiest way of getting rid of gridlines etc I thought was by using theme_set() and I think that this might be my problem.
My code is:
theme_set(theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(),
axis.line = element_line(colour = "grey")))
I realize that theme_bw() overwrites my colors in the boxes. But I have tried removing it, switching it for theme_transparent() (this removes all my labels) and neither works. I have searched for a way of just adding a transparency to my boxes in the theme so that my colors will shine through. I am also suspicious that maybe the palette that I chose might give me the same colors in the two plots which I also do not want. To add, if it matters, I have 4 groups in the first plot and 2 in the second.
dput(BOX1_data)
structure(list(Diagnosis = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L, 3L, 3L,
8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L, 3L,
11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L,
10L, 10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L, 3L, 6L, 3L,
6L, 1L, 3L, 3L, 7L, 1L, 2L, 10L, 7L, 7L, 1L, 0L, 2L)), row.names = c(NA,
-77L), class = "data.frame")
dput(BOX2_data)
structure(list(No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L,
3L, 3L, 8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L,
3L, 11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L, 10L,
10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L), Stage = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor")), row.names = c(NA, -61L), class = "data.frame")
Grateful for any tips!
As already pointed out, it seems the OP's issue with theme_set() removing the fill colors set in your two plots was solved by updating to a new version of ggplot2. Herein, I have a solution for the second part of OP's question (that was clarified in the comments). Represented here for convenience:
Now it is just the problem that I want the palette to continue on the second plot's boxes and not restart so that I will get different colors on all boxes.
In order to do this, one has to realize that there are 4 fill colors for the first plot BOX1plot, and 2 fill colors for BOX2plot. For BOX1plot, we want the color palette to begin at the first color, but for BOX2plot, we want the palette to start on the 5th color sequence in the palette. There's no way to do this through the scale_*_brewer() functions, so the approach here will be to access the Brewer palette from RcolorBrewer::brewer.pal(), and then assign where to begin and end in that sequence based on the number of levels of each factor using scale_fill_manual() to just set the color values from the extracted Brewer color palette.
You can just "know" that you need to "use colors 1-4" for BOX1plot and "use color 5 and 6" for BOX2plot; however, it is much more elegant to just calculate this automatically based on the number of levels (in case you want to run this again). The code below does this:
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
# ... read in your data as before
# create factors (as OP did before)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
# make color palette based on Brewer "Dark2" palette
lev_diag <- length(levels(BOX1_data$Diagnosis))
lev_stage <- length(levels(BOX2_data$Stage))
lev_total <- lev_diag + lev_stage
my_colors <- brewer.pal(lev_total, "Dark2")
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_manual(values=my_colors[1:lev_diag]) +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_manual(values = my_colors[(lev_diag+1):lev_total]) +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)
If you have issues with ggarrange() I would suggest next approach using patchwork:
library(ggplot2)
library(patchwork)
#Data format
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
#Plot 1
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
#Data format
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
#Plot 2
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
#Arrange plots
BOX1plot+BOX2plot+plot_annotation(tag_levels = 'A')
The output:

Can not use is.na() function in mutate_if funciton in r

I tried to use is.na() in mutate_if() but I get an error:
Error in is_logical(.p) : object 'n_day' not found
n_day indeed in my dataframe and I thought because of the argument set of is.na() that I can not use it in mutate_if() but I don't know how to solve it.
Here's the idea if the value in n_day is NA, replace it with the value in n_cum at the same day.
Any help will be highly appreciated!
My code like this:
library(tidyverse)
t <- structure(list(city = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
time = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L), .Label = c("2012/1/1", "2012/1/2",
"2012/1/3", "2012/1/4", "2012/2/1", "2012/2/2", "2012/2/3",
"2012/2/4"), class = "factor"), n_cum = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L)), class = "data.frame", row.names = c(NA,
-16L))
t
t2 <- t %>% group_by(city) %>%
mutate(n_day = n_cum - lag(n_cum))
t2 %>% mutate_if(is.na(n_day), n_day = n_cum)
mutate_if is used to do operations on multiple columns at once(See documentation), this is not what you are looking for here as you only want to change one column.
The question can be solved using mutate and if_else :
t2 %>% mutate(n_day = if_else(is.na(n_day),n_cum,n_day))
Use mutate_at + if condition instead,
t2 %>% mutate_at(vars(n_day), ~ ifelse(is.na(.), n_cum, .))
In the case of multiple variables selection, just add them respectively into vars helper.

perform acf plot for each type of group in R

Say, here the mydata (little part)
transport<- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), .Label = c("01.01.2001", "01.02.2001", "01.03.2001",
"01.04.2001", "01.05.2001", "01.06.2001", "01.07.2001", "01.08.2001",
"01.09.2001", "01.10.2001", "01.11.2001", "01.12.2001"), class = "factor"),
Market_82 = c(7000L, 7272L, 7668L, 7869L, 8057L, 8428L, 8587L,
8823L, 8922L, 9178L, 9306L, 9439L, 3725L, 4883L, 8186L, 7525L,
6335L, 4252L, 5642L, 1326L, 8605L, 3501L, 1944L, 7332L),
transport = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("plane", "train"), class = "factor")), .Names = c("date",
"Market_82", "transport"), class = "data.frame", row.names = c(NA,
-24L))
group variable - Transport.
For each type of transport i must get acf plot of time series.
something like this
How perform acf plot for each transport?
I have a lot of groups. How to do that plots were in folder
C:/Users/admin/Documents/myplot
akrun's answer is spot on. Since you tagged the question with ggplot2 you could also use ggAcf from the forcast package.
The first step is to split your data.
transport_split <- split(transport, transport$transport)
If you want to include the respective element of column transport in the title, subtitle etc. try with Map
out <- Map(
f = function(x, y)
forecast::ggAcf(x$Market_82) + labs(title = y),
x = transport_split,
y = names(transport_split)
)
out$train
We can do this with Acf from forecast
library(forecast)
par(mfrow = c(2, 1))
lapply(split(transport['Market_82'], transport$transport), Acf)
If we also want the title, then
lst <- lapply(split(transport['Market_82'], transport$transport), acf, plot = FALSE)
par(mfrow = c(2, 1))
lapply(names(lst), function(x) plot(lst[[x]], main = x))

Suppressing data from a graph in R

I have a dataset, d, that contains personally identifiable data, I have the dataset putting an X for all values that are suppressed:
column1 column2 column3
* FSM X
* Male 2.5
* Female X
A FSM 6
A Male 10.3
A Female 11.7
B FSM 14.8
B Male 21.5
B Female 25.3
I want to plot this with an X above the bars in a bar plot, where data has been suppressed, such as:
My code is:
p <- ggplot(d, aes(x=column1, y=column3, fill=column2)) +
geom_bar(position=position_dodge(), stat="identity", colour="black") +
geom_text(aes(label=column2),position= position_dodge(width=0.9), vjust=-.5)
scale_y_continuous("Percentage",breaks=seq(0, max(d$column3), 2)))
But of course, it can't plot 'X' on the graph and says:
Error: Discrete value supplied to continuous scale
How can I get the bar plotting to ignore the 'X' and still add the label if it's present?
Data dump:
structure(list(column1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("*",
"A", "B", "C", "D", "E", "U"), class = "factor"), column2 = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("FSM", "Male", "Female"), class = "factor"),
column3 = structure(c(21L, 1L, 2L, 18L, 3L, 4L, 7L, 12L,
14L, 16L, 15L, 13L, 10L, 9L, 8L, 11L, 6L, 5L, 20L, 19L, 17L
), .Label = c("1.93889541715629", "1.97444831591173", "10.1057579318449",
"11.7305458768873", "12.7758420441347", "14.4535840188014",
"14.8471615720524", "18.5830429732869", "19.9764982373678",
"20.0873362445415", "20.9606986899563", "21.5628672150411",
"24.1579558652729", "25.3193960511034", "25.7931844888367",
"29.2576419213974", "5.45876887340302", "6.11353711790393",
"6.16921269095182", "6.98689956331878", "X"), class = "factor")), .Names = c("column1",
"column2", "column3"), row.names = c(NA, -21L), class = "data.frame")
I 'm happy to print out 0 instances where there are 0 instances, but in the case of data suppression, I want to make it clear that data has been suppressed by printing out a 'X', but the bar will also show 0 instances
First convert the height to numeric which gives NA for censored values. Then create a label column based on that. Then you need a column of zeroes for the y coordinate of the labels.
> d$column3=as.numeric(as.character(d$column3))
Warning message:
NAs introduced by coercion
> d$column4 = ifelse(is.na(d$column3),"X","")
> d$y=0
Then:
> p <- ggplot(d, aes(x=column1, y=column3, fill=column2))
> p + geom_bar(position=position_dodge(), stat="identity",
colour="black") +
geom_text(aes(label=column4,x=column1,y=y),
position=position_dodge(width=1), vjust=-0.5)
Giving:
Its a variant on labelling a geom_bar with the value of the bar. Almost a dupe.

Melting data resulting in incorrect Y-values when plotting geom_bar(position="dodge")?

I have a dataframe called split2_data (actually a drop-leveled subset of a bigger data frame).
It contains a column "Loci", which are factors that I want as x-axes, and several columns of y-values (note: All of these values are <=1) that I would like to plot beside one another in their respective x factor.
The dataframe
structure(list(Loci = structure(1:8, .Label = c("C485", "C487_PigTa",
"C536", "Carey", "Cool", "Coyote", "Deadpool", "Epstein"), class = "factor"),
All = structure(c(5L, 6L, 7L, 1L, 2L, 4L, 3L, 8L), .Label = c("0.0246",
"0.0352", "0.0563", "0.0646", "0.2349", "0.3242", "0.3278",
"0.6854"), class = "factor"), X1_only = structure(c(4L, 3L,
2L, 1L, 6L, 6L, 6L, 5L), .Label = c("0.0133", "0.7292", "0.8586",
"0.9377", "0.961", "1"), class = "factor"), X78_only = structure(c(7L,
6L, 4L, 5L, 8L, 3L, 1L, 2L), .Label = c("0.0018", "0.0175",
"0.4958", "0.6055", "0.7472", "0.7563", "0.825", "1"), class = "factor"),
X8_removed = structure(c(5L, 6L, 8L, 1L, 2L, 3L, 4L, 7L), .Label = c("0.0181",
"0.0348", "0.1482", "0.1706", "0.2217", "0.2602", "0.6748",
"0.7123"), class = "factor"), X8_only = structure(c(6L, 7L,
3L, 8L, 5L, 4L, 1L, 2L), .Label = c("0.1266", "0.1945", "0.4389",
"0.4496", "0.7078", "0.709", "0.8882", "1"), class = "factor"),
X7_removed = structure(c(6L, 4L, 5L, 2L, 1L, 3L, 7L, 8L), .Label = c("0.0159",
"0.02", "0.0541", "0.3232", "0.3972", "0.4226", "0.4919",
"0.5951"), class = "factor"), X7_only = structure(c(3L, 4L,
7L, 5L, 6L, 8L, 1L, 2L), .Label = c("0.0082", "0.1759", "0.4957",
"0.5248", "0.6665", "0.6789", "0.8372", "1"), class = "factor"),
X5_removed = structure(c(5L, 7L, 6L, 1L, 3L, 4L, 2L, 8L), .Label = c("0.0195",
"0.0316", "0.08", "0.1069", "0.1549", "0.395", "0.4405",
"0.6298"), class = "factor"), X5_only = structure(c(1L, 2L,
6L, 7L, 3L, 5L, 7L, 4L), .Label = c("0.0871", "0.2022", "0.3532",
"0.3677", "0.5292", "0.7602", "1"), class = "factor"), X4_removed = structure(c(8L,
4L, 7L, 2L, 3L, 5L, 1L, 6L), .Label = c("0.0188", "0.0194",
"0.0511", "0.1716", "0.1862", "0.6454", "0.661", "0.8003"
), class = "factor"), X4_only = structure(c(2L, 5L, 1L, 6L,
7L, 3L, 8L, 4L), .Label = c("0.0026", "0.0378", "0.2884",
"0.4386", "0.5116", "0.6549", "0.6928", "1"), class = "factor"),
X3_removed = structure(c(5L, 7L, 6L, 1L, 2L, 3L, 4L, 8L), .Label = c("0.0612",
"0.0627", "0.0808", "0.1636", "0.2728", "0.477", "0.5307",
"0.6506"), class = "factor"), X3_only = structure(c(8L, 1L,
7L, 2L, 4L, 6L, 3L, 5L), .Label = c("0.0225", "0.2111", "0.2471",
"0.5087", "0.6294", "0.768", "0.8263", "0.8951"), class = "factor"),
X2_removed = structure(c(4L, 5L, 6L, 3L, 7L, 2L, 1L, 8L), .Label = c("0.0526",
"0.0608", "0.0854", "0.2036", "0.3168", "0.3668", "0.413",
"0.7608"), class = "factor"), X2_only = structure(c(5L, 3L,
6L, 4L, 2L, 8L, 1L, 7L), .Label = c("-", "0.0014", "0.0949",
"0.1637", "0.1818", "0.5521", "0.8585", "1"), class = "factor"),
X1_removed = structure(c(5L, 7L, 3L, 6L, 1L, 4L, 2L, 8L), .Label = c("0.0258",
"0.031", "0.0496", "0.0676", "0.1053", "0.1439", "0.2823",
"0.5465"), class = "factor")), .Names = c("Loci", "All",
"X1_only", "X78_only", "X8_removed", "X8_only", "X7_removed",
"X7_only", "X5_removed", "X5_only", "X4_removed", "X4_only",
"X3_removed", "X3_only", "X2_removed", "X2_only", "X1_removed"
), row.names = 9:16, class = "data.frame")
I can't think of how to do this in base R, and after some careful study of other questions here, this is the best that I can come up with:
library(reshape)
library(ggplot2)
require(ggplot2)
split2_datam<-melt(split2_data,id="Loci")
p2<- ggplot(split2_datam, aes(x =Loci, y = value, color = variable, width=.15)) + geom_bar(position="dodge") + ylab("P-value")+ geom_hline(yintercept=0.05)+ opts(axis.text.x = theme_text(angle=90, size=8)) + scale_y_discrete(breaks=seq(0,1)) + scale_fill_grey()
p2
#when I add stat="identity", the y values don't change- they just shrink relative to the x-axis
p2<- ggplot(split2_datam, aes(x =Loci, y = value, color = variable, width=.15)) + geom_bar(position="dodge", stat="identity") + ylab("P-value")+ geom_hline(yintercept=0.05)+ opts(axis.text.x = theme_text(angle=90, size=8)) + scale_y_discrete(breaks=seq(0,1)) + scale_fill_grey()
p2
The plot:
You'll notice that the different variables are often much greater than 1. They should not be. Any idea what's causing this/how to fix?
Other things I don't yet know how to do/fix (perhaps this question should be cross-referenced?):
I don't know why the greyscale isn't working
I don't know how to make the legend scale correctly with the plot
I don't understand why my columns have an 'X' appended to them (e.g. "X1_only" instead of "1_only")
Thank you so much in advance for any suggestions!
Your data have been read in as factors, probably because there are some "-" characters mixed in with your data.
You'll want to convert them to NA when you read in your data using na.strings = "-".

Resources