Suppressing data from a graph in R

Suppressing data from a graph in R - r

I have a dataset, d, that contains personally identifiable data, I have the dataset putting an X for all values that are suppressed:
column1 column2 column3
* FSM X
* Male 2.5
* Female X
A FSM 6
A Male 10.3
A Female 11.7
B FSM 14.8
B Male 21.5
B Female 25.3
I want to plot this with an X above the bars in a bar plot, where data has been suppressed, such as:
My code is:
p <- ggplot(d, aes(x=column1, y=column3, fill=column2)) +
geom_bar(position=position_dodge(), stat="identity", colour="black") +
geom_text(aes(label=column2),position= position_dodge(width=0.9), vjust=-.5)
scale_y_continuous("Percentage",breaks=seq(0, max(d$column3), 2)))
But of course, it can't plot 'X' on the graph and says:
Error: Discrete value supplied to continuous scale
How can I get the bar plotting to ignore the 'X' and still add the label if it's present?
Data dump:
structure(list(column1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("*",
"A", "B", "C", "D", "E", "U"), class = "factor"), column2 = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("FSM", "Male", "Female"), class = "factor"),
column3 = structure(c(21L, 1L, 2L, 18L, 3L, 4L, 7L, 12L,
14L, 16L, 15L, 13L, 10L, 9L, 8L, 11L, 6L, 5L, 20L, 19L, 17L
), .Label = c("1.93889541715629", "1.97444831591173", "10.1057579318449",
"11.7305458768873", "12.7758420441347", "14.4535840188014",
"14.8471615720524", "18.5830429732869", "19.9764982373678",
"20.0873362445415", "20.9606986899563", "21.5628672150411",
"24.1579558652729", "25.3193960511034", "25.7931844888367",
"29.2576419213974", "5.45876887340302", "6.11353711790393",
"6.16921269095182", "6.98689956331878", "X"), class = "factor")), .Names = c("column1",
"column2", "column3"), row.names = c(NA, -21L), class = "data.frame")
I 'm happy to print out 0 instances where there are 0 instances, but in the case of data suppression, I want to make it clear that data has been suppressed by printing out a 'X', but the bar will also show 0 instances

First convert the height to numeric which gives NA for censored values. Then create a label column based on that. Then you need a column of zeroes for the y coordinate of the labels.
> d$column3=as.numeric(as.character(d$column3))
Warning message:
NAs introduced by coercion
> d$column4 = ifelse(is.na(d$column3),"X","")
> d$y=0
Then:
> p <- ggplot(d, aes(x=column1, y=column3, fill=column2))
> p + geom_bar(position=position_dodge(), stat="identity",
colour="black") +
geom_text(aes(label=column4,x=column1,y=y),
position=position_dodge(width=1), vjust=-0.5)
Giving:
Its a variant on labelling a geom_bar with the value of the bar. Almost a dupe.

Related

ggarrange() function overvrites the color of my boxplots

I am making two boxplots and want to arrange them beside each other. I have made each of them look like I want when displaying them separately but when I use ggarrange() the colors disappear. This is my code for the plots:
BOX1_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
To arrange the plots I then write:
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)
The easiest way of getting rid of gridlines etc I thought was by using theme_set() and I think that this might be my problem.
My code is:
theme_set(theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(),
axis.line = element_line(colour = "grey")))
I realize that theme_bw() overwrites my colors in the boxes. But I have tried removing it, switching it for theme_transparent() (this removes all my labels) and neither works. I have searched for a way of just adding a transparency to my boxes in the theme so that my colors will shine through. I am also suspicious that maybe the palette that I chose might give me the same colors in the two plots which I also do not want. To add, if it matters, I have 4 groups in the first plot and 2 in the second.
dput(BOX1_data)
structure(list(Diagnosis = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L, 3L, 3L,
8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L, 3L,
11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L,
10L, 10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L, 3L, 6L, 3L,
6L, 1L, 3L, 3L, 7L, 1L, 2L, 10L, 7L, 7L, 1L, 0L, 2L)), row.names = c(NA,
-77L), class = "data.frame")
dput(BOX2_data)
structure(list(No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L,
3L, 3L, 8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L,
3L, 11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L, 10L,
10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L), Stage = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor")), row.names = c(NA, -61L), class = "data.frame")
Grateful for any tips!

As already pointed out, it seems the OP's issue with theme_set() removing the fill colors set in your two plots was solved by updating to a new version of ggplot2. Herein, I have a solution for the second part of OP's question (that was clarified in the comments). Represented here for convenience:
Now it is just the problem that I want the palette to continue on the second plot's boxes and not restart so that I will get different colors on all boxes.
In order to do this, one has to realize that there are 4 fill colors for the first plot BOX1plot, and 2 fill colors for BOX2plot. For BOX1plot, we want the color palette to begin at the first color, but for BOX2plot, we want the palette to start on the 5th color sequence in the palette. There's no way to do this through the scale_*_brewer() functions, so the approach here will be to access the Brewer palette from RcolorBrewer::brewer.pal(), and then assign where to begin and end in that sequence based on the number of levels of each factor using scale_fill_manual() to just set the color values from the extracted Brewer color palette.
You can just "know" that you need to "use colors 1-4" for BOX1plot and "use color 5 and 6" for BOX2plot; however, it is much more elegant to just calculate this automatically based on the number of levels (in case you want to run this again). The code below does this:
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
# ... read in your data as before
# create factors (as OP did before)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
# make color palette based on Brewer "Dark2" palette
lev_diag <- length(levels(BOX1_data$Diagnosis))
lev_stage <- length(levels(BOX2_data$Stage))
lev_total <- lev_diag + lev_stage
my_colors <- brewer.pal(lev_total, "Dark2")
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_manual(values=my_colors[1:lev_diag]) +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_manual(values = my_colors[(lev_diag+1):lev_total]) +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)

If you have issues with ggarrange() I would suggest next approach using patchwork:
library(ggplot2)
library(patchwork)
#Data format
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
#Plot 1
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
#Data format
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
#Plot 2
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
#Arrange plots
BOX1plot+BOX2plot+plot_annotation(tag_levels = 'A')
The output:

How to change the order and color scheme of stacked bar charts using ggplot2?

I am trying to get these ordered so that Space is stacked on top of Time and then order them in ascending order of Time. I also want to be able to pick the colors for each stack.
Any help would be appreciated!Thanks a lot!
Data below:
structure(list(Beg = structure(c(20L, 19L, 18L, 15L, 1L, 3L,
6L, 10L, 13L, 8L, 5L, 11L, 9L, 7L, 2L, 4L, 17L, 16L, 14L, 12L,
20L, 19L, 18L, 15L, 1L, 3L, 6L, 10L, 13L, 8L, 5L, 11L, 9L, 7L,
2L, 4L, 17L, 16L, 14L, 12L), .Label = c("a", "b", "c", "d", "e",
"f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t"), class = "factor"), Cat = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Time", "Space"), class = "factor"),
Count = c(7824.92, 1006.79, 3570.93, 1484.5, 2885.32, 4194.84,
4348.94, 3603.31, 4826.33, 2225.49, 3350.02, 3778.35, 2698.51,
2247.01, 1705.17, 4742.72, 15231.15, 14083.26, 4437.68, 3109.09,
18875.45, 25816.95, 20836.93, 25501.53, 23996.55, 19427.12,
21467.89, 22472.71, 9876.27, 9548.99, 22171.83, 21179.33,
23358.26, 24763.62, 24551.94, 16726.11, 10691.68, 10537.26,
18012.88, 21453.15)), row.names = c(NA, -40L), class = "data.frame")

Adding to #NotThatKindODr's answer, you can order the bars in ascending order of time by reordering them with the fct_reorder function from the forcats package:
library(dplyr)
library(forcats)
df <- df %>%
mutate(Cat = fct_rev(Cat),
Beg = fct_reorder(Beg, Count, max, .desc = T))
ggplot(df, aes(x = Beg, y = Count, fill = Cat)) +
geom_col() +
ggtitle("All Stuff") +
theme_classic() +
coord_flip()
Which gives:

Essentially all you need to do is reverse the factors in Cat. Here I used the forcats package. Note your data is df in this code:
library(forcats)
library(dplyr)
df %>%
mutate(Cat = forcats::fct_rev(Cat)) %>%
ggplot() +
geom_col(aes(Beg, Count, fill = Cat)) +
ggtitle("All Stuff") +
coord_flip() +
theme_classic()
To pick the colors use this by adding it like any other ggplot layer. Substitute "color1" and "color2" with your color of choice:
scale_fill_manual(values = c("color1", "color2"))

Why isn't my barplot rearranging properly when faceting with ggplot?

So I have made this barplot with this code, bars organised in descending order, great!
na.omit(insect_tally_native_ranges)%>%
group_by(native_ranges)%>%
dplyr::summarise(freq=sum(n))%>%
ggplot(aes(x=reorder(native_ranges,freq),y=freq))+
geom_col(color="#CD4F39",fill="#CD4F39",alpha=0.8)+
coord_flip()+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()
And now I wanted to do the same but faceting by a variable called Period, here's the code:
ggplot(native_freq_period,
aes(y=reorder(native_ranges,freq),x=freq))+
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8)+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()+
facet_wrap(~Period)
But the plot came out like this:
Which is pretty annoying because it is the same code as above and the levels for the variable native_ranges should be organised again. But instead it gives me this lumpy order that isn't even the alphabetic order. So the reorder part is reordering but not by freq! Don't understand.
Here is the data:
structure(list(native_ranges = structure(c(6L, 10L, 11L, 7L,
3L, 5L, 1L, 1L, 8L, 6L, 3L, 5L, 2L, 4L, 5L, 7L, 7L, 7L, 8L, 9L,
11L), .Label = c("Afrotropic", "Afrotropic/Neotropic", "Australasia",
"Australasia/Neotropic", "Indomalaya", "Nearctic", "Neotropic",
"Neotropic/Nearctic", "Neotropic/Nearctic/Australasia", "Palearctic",
"Palearctic/Indomalaya"), class = "factor"), Period = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L, 1L, 2L,
3L, 2L, 4L, 3L), .Label = c("1896-1925", "1926-1955", "1956-1985",
"1986-2018"), class = "factor"), freq = c(21L, 13L, 12L, 11L,
10L, 10L, 4L, 4L, 4L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L)), row.names = c(NA, -21L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = "native_ranges", drop = TRUE, indices = list(
6:7, 12L, c(4L, 10L), 13L, c(5L, 11L, 14L), c(0L, 9L), c(3L,
15L, 16L, 17L), c(8L, 18L), 19L, 1L, c(2L, 20L)), group_sizes = c(2L,
1L, 2L, 1L, 3L, 2L, 4L, 2L, 1L, 1L, 2L), biggest_group_size = 4L, labels = structure(list(
native_ranges = structure(1:11, .Label = c("Afrotropic",
"Afrotropic/Neotropic", "Australasia", "Australasia/Neotropic",
"Indomalaya", "Nearctic", "Neotropic", "Neotropic/Nearctic",
"Neotropic/Nearctic/Australasia", "Palearctic", "Palearctic/Indomalaya"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame", vars = "native_ranges", drop = TRUE))

You have to arrange the order of the variable first before plotting. Since you didn't provide any reproducible data I am using the following data
drugs <- data.frame(drug = c("a", "b", "c"), effect = c(4.2, 9.7, 6.1))
ggplot(drugs, aes(drug, effect)) +
geom_col()
Now to change the order of the variable use factor
drugs$drug <- factor(drugs$drug,levels = c("b","a","c")) #This is the order I want
ggplot(drugs, aes(drug, effect)) +
geom_col()
Here I provided the levels in factor manually. You can either provide them manually or sort the order of the variable first separately and provide. See below,
drugs$drug <- factor(drugs$drug,levels = drugs[order(drugs$effect),]$drug)
ggplot(drugs, aes(drug, effect)) +
geom_col()
This should work with facet_wrap as well.

OK, finally figured it out with help from the other answer. You need to create another column that summarizes the total frequency so you can then reorder by that column. There may be a more efficient way to do it, but I create a new summary data.frame and then join it back to the original and then reorder based on the new column.
summary_data <- data %>%
ungroup() %>%
group_by(native_ranges) %>%
summarize(total = sum(freq))
data <- data %>%
left_join(summary_data)
ggplot(data, aes(y = reorder(native_ranges, total),x = freq)) +
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8) +
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges") +
theme_minimal()+
facet_wrap(~Period)

Problems passing arguments to ggplot in R script

I'm trying to figure out what I'm doing wrong passing arguments to ggplot. I've come a long way with existing posts, but have hit a wall here. Probably something stupid, but here goes (I'm leaving out some of the plot formatting since that is not where the problem is):
melted data set "lagres" is the same in both scenarios.
> str(lagres)
'data.frame': 30 obs. of 4 variables:
$ ST : Factor w/ 3 levels
$ year : Factor w/ 6 levels
$ variable: Factor w/ 2 levels
$ value : num
The first plotting call works great:
ggplot(lagres, aes(quarter, value, group = interaction(ERTp, variable), linetype = variable, color = ERTp, shape = variable ))
Trying to convert this to accept arguments and be re-used in a for-loop script does NOT work, even though the structure is really the same:
timevar <- "quarter"
grpvar <- "ERTp"
fplot <- function(lagres, timevar, grpvar, ylb, tlb){
plot <- ggplot(lagres, aes_string(x=timevar, y="value", group = interaction("variable", grpvar), linetype = "variable", color = grpvar, shape = "variable")) +
geom_line(size = 0.5) + geom_point(size = 3) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(y = ylb) +
ggtitle(paste(tlb, grpvar, today, sep = ", ")) +
theme(plot.title = element_text(lineheight = .8, face = "bold", hjust = 0.5))
fplot(lagres, timevar, grpvar)
Error: geom_path: If you are using dotted or dashed lines, colour,
size and linetype must be constant over the line
The problem seems to lie with the "linetype" arg, as removing this results in an appropriate graph in terms of values/colors, but the lines connected wrong and obviously no separate line for each variable/grp.
Trying to analyze the problem further by looking at the structure of the argument, it looks like aes() and aes_string() parse the group interaction differently. Maybe this is the problem. Parsing the "aes()" formulation with raw variables, I get:
> str(aes(quarter, value, group = interaction(ERTp, variable), linetype = variable, color = ERTp, shape = variable ))
List of 6
$ x : symbol quarter
$ y : symbol value
$ group : language interaction(ERTp, variable)
$ linetype: symbol variable
$ colour : symbol ERTp
$ shape : symbol variable
Then, the "aes_string()" method with referenced arguments:
> str(aes_string(timevar, "value", group = interaction(grpvar, "variable"), linetype = "variable", color = grpvar, shape = "variable" ))
List of 6
$ group : Factor w/ 1 level "ST.variable": 1
$ linetype: symbol variable
$ colour : symbol ST
$ shape : symbol variable
$ x : symbol quarter
$ y : symbol value
So, having the group be either a "language interaction" vs. a 1-level factor, would make a difference? Can't figure out what to do about that parsing issue so the group interaction comes out properly. Saw somewhere that "paste()" could be used, but, no, that does not work. Passing ALL arguments (thus, no quoted text in the aes_string() formula) does not help either.
> dput(lagres)
structure(list(ST = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L,
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("GeraghtyM",
"Other", "WeenJ"), class = "factor"), quarter = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L), .Label = c("2015-Q2", "2015-Q3", "2015-Q4",
"2016-Q1", "2016-Q2", "2016-Q3", "2016-Q4"), class = "factor"),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ScanLag",
"TPADoorToLag"), class = "factor"), value = c(45.3333333333333,
60.2857142857143, 37.6, 0, 51.375, 95.4166666666667, 26.8,
42.75, 200, 28, 134, 68.2941176470588, 29, 42.8, 140.7, 0,
49.2222222222222, 103.833333333333, 0, 20.125, 0, 67.75,
48, 87, 93, 78, 49.5, 55, 65.6, 83, 59, 54, 153, 114, 111,
83, 8.66666666666667)), .Names = c("ST", "quarter", "variable",
"value"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 24L, 25L,
26L, 27L, 29L, 30L, 31L, 33L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L), class = "data.frame", na.action = structure(c(22L, 23L,
28L, 32L, 34L), .Names = c("22", "23", "28", "32", "34"), class = "omit"))

aes_string isn't reading the interaction code that you are using. One way to avoid this is to simply make a new "interaction" variable in your dataset within the function prior to plotting.
For example:
fplot <- function(lagres, timevar, grpvar){
lagres$combine = interaction(lagres[["variable"]], lagres[[grpvar]])
plot <- ggplot(lagres, aes_string(x=timevar, y="value",
group = "combine", linetype = "variable",
color = grpvar, shape = "variable")) +
geom_line(size = 0.5) +
geom_point(size = 3)
plot
}

`ddply` fails to apply logistic regression (GLM) by group to my dataset

I'm working out the LD50 (lethal dosage) for multiple populations from different experiments using the MASS package. It's simple enough when I subset the data and do one at a time, but I'm getting an error when I use ddply. Essentially I need an LD50 for each population at each temperature.
My data looks somewhat like this:
# dput(d)
d <- structure(list(Pop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("a", "b", "c"), class = "factor"), Temp = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("high", "low"), class = "factor"),
Dose = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), Dead = c(0L,
11L, 12L, 14L, 2L, 16L, 17L, 7L, 5L, 3L, 17L, 15L, 9L, 20L,
8L, 19L, 7L, 2L, 20L, 14L, 9L, 15L, 1L, 15L), Alive = c(20L,
9L, 8L, 6L, 18L, 4L, 3L, 13L, 15L, 17L, 3L, 5L, 11L, 0L,
12L, 1L, 13L, 18L, 0L, 6L, 11L, 5L, 19L, 5L)), .Names = c("Pop",
"Temp", "Dose", "Dead", "Alive"), class = "data.frame", row.names = c(NA,
-24L))
The following works fine:
d$Mortality <- cbind(d$Alive, d$Dead)
a <- d[d$Pop=="a" & d$Temp=="high",]
library(MASS)
dose.p(glm(Mortality ~ Dose, family="binomial", data=a), p=0.5)[1]
But when I put this into ddply I get the following error:
library(plyr)
d$index <- paste(d$Pop, d$Temp, sep="_")
ddply(d, 'index', function(x) dose.p(glm(Mortality~Dose, family="binomial", data=x), p=0.5)[1])
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
I can get the right LD50 when I use a proportion but can't figure out where I've gone wrong with my approach (and had already written this question).

Perhaps this will amaze you. But if you choose to use formula
cbind(Alive, Dead) ~ Dose
instead of
Mortality ~ Dose
the problem will be gone.
library(MASS)
library(plyr)
## `d` is as your `dput` result
## a function to apply
f <- function(x) {
fit <- glm(cbind(Alive, Dead) ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), f)
# Pop Temp V1
#1 a high 2.6946257
#2 a low 2.1834099
#3 b high 2.5000000
#4 b low 0.4830998
#5 c high 2.2899553
#6 c low 2.5000000
So what happened with Mortality ~ Dose? Let's set .inform = TRUE when calling ddply:
## `d` is as your `dput` result
d$Mortality <- cbind(d$Alive, d$Dead)
## a function to apply
g <- function(x) {
fit <- glm(Mortality ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), g, .inform = TRUE)
#Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
#Error: with piece 1:
# Pop Temp Dose Dead Alive Mortality
#1 a high 1 0 20 20
#2 a high 2 11 9 9
#3 a high 3 12 8 8
#4 a high 4 14 6 6
Now we we see that variable Mortality has lost dimension, and only the first column (Alive) is retained. For a glm with binomial response, if the response is a single vector, glm expects 0-1 binary or a factor of two levels. Now, we have integers 20, 9, 8, 6, ..., hence glm will complain
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
There is really no way to fix this issue. I have tried using a protector:
d$Mortality <- I(cbind(d$Alive, d$Dead))
but it still ends up with the same failure.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Suppressing data from a graph in R - r

Related

ggarrange() function overvrites the color of my boxplots

How to change the order and color scheme of stacked bar charts using ggplot2?

Why isn't my barplot rearranging properly when faceting with ggplot?

Problems passing arguments to ggplot in R script

`ddply` fails to apply logistic regression (GLM) by group to my dataset

Categories

Resources