Problems passing arguments to ggplot in R script - r

I'm trying to figure out what I'm doing wrong passing arguments to ggplot. I've come a long way with existing posts, but have hit a wall here. Probably something stupid, but here goes (I'm leaving out some of the plot formatting since that is not where the problem is):
melted data set "lagres" is the same in both scenarios.
> str(lagres)
'data.frame': 30 obs. of 4 variables:
$ ST : Factor w/ 3 levels
$ year : Factor w/ 6 levels
$ variable: Factor w/ 2 levels
$ value : num
The first plotting call works great:
ggplot(lagres, aes(quarter, value, group = interaction(ERTp, variable), linetype = variable, color = ERTp, shape = variable ))
Trying to convert this to accept arguments and be re-used in a for-loop script does NOT work, even though the structure is really the same:
timevar <- "quarter"
grpvar <- "ERTp"
fplot <- function(lagres, timevar, grpvar, ylb, tlb){
plot <- ggplot(lagres, aes_string(x=timevar, y="value", group = interaction("variable", grpvar), linetype = "variable", color = grpvar, shape = "variable")) +
geom_line(size = 0.5) + geom_point(size = 3) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(y = ylb) +
ggtitle(paste(tlb, grpvar, today, sep = ", ")) +
theme(plot.title = element_text(lineheight = .8, face = "bold", hjust = 0.5))
fplot(lagres, timevar, grpvar)
Error: geom_path: If you are using dotted or dashed lines, colour,
size and linetype must be constant over the line
The problem seems to lie with the "linetype" arg, as removing this results in an appropriate graph in terms of values/colors, but the lines connected wrong and obviously no separate line for each variable/grp.
Trying to analyze the problem further by looking at the structure of the argument, it looks like aes() and aes_string() parse the group interaction differently. Maybe this is the problem. Parsing the "aes()" formulation with raw variables, I get:
> str(aes(quarter, value, group = interaction(ERTp, variable), linetype = variable, color = ERTp, shape = variable ))
List of 6
$ x : symbol quarter
$ y : symbol value
$ group : language interaction(ERTp, variable)
$ linetype: symbol variable
$ colour : symbol ERTp
$ shape : symbol variable
Then, the "aes_string()" method with referenced arguments:
> str(aes_string(timevar, "value", group = interaction(grpvar, "variable"), linetype = "variable", color = grpvar, shape = "variable" ))
List of 6
$ group : Factor w/ 1 level "ST.variable": 1
$ linetype: symbol variable
$ colour : symbol ST
$ shape : symbol variable
$ x : symbol quarter
$ y : symbol value
So, having the group be either a "language interaction" vs. a 1-level factor, would make a difference? Can't figure out what to do about that parsing issue so the group interaction comes out properly. Saw somewhere that "paste()" could be used, but, no, that does not work. Passing ALL arguments (thus, no quoted text in the aes_string() formula) does not help either.
> dput(lagres)
structure(list(ST = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L,
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("GeraghtyM",
"Other", "WeenJ"), class = "factor"), quarter = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L), .Label = c("2015-Q2", "2015-Q3", "2015-Q4",
"2016-Q1", "2016-Q2", "2016-Q3", "2016-Q4"), class = "factor"),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ScanLag",
"TPADoorToLag"), class = "factor"), value = c(45.3333333333333,
60.2857142857143, 37.6, 0, 51.375, 95.4166666666667, 26.8,
42.75, 200, 28, 134, 68.2941176470588, 29, 42.8, 140.7, 0,
49.2222222222222, 103.833333333333, 0, 20.125, 0, 67.75,
48, 87, 93, 78, 49.5, 55, 65.6, 83, 59, 54, 153, 114, 111,
83, 8.66666666666667)), .Names = c("ST", "quarter", "variable",
"value"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 24L, 25L,
26L, 27L, 29L, 30L, 31L, 33L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L), class = "data.frame", na.action = structure(c(22L, 23L,
28L, 32L, 34L), .Names = c("22", "23", "28", "32", "34"), class = "omit"))

aes_string isn't reading the interaction code that you are using. One way to avoid this is to simply make a new "interaction" variable in your dataset within the function prior to plotting.
For example:
fplot <- function(lagres, timevar, grpvar){
lagres$combine = interaction(lagres[["variable"]], lagres[[grpvar]])
plot <- ggplot(lagres, aes_string(x=timevar, y="value",
group = "combine", linetype = "variable",
color = grpvar, shape = "variable")) +
geom_line(size = 0.5) +
geom_point(size = 3)
plot
}

Related

ggarrange() function overvrites the color of my boxplots

I am making two boxplots and want to arrange them beside each other. I have made each of them look like I want when displaying them separately but when I use ggarrange() the colors disappear. This is my code for the plots:
BOX1_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2_data <- read.table(file = "clipboard",
sep = "\t", header=TRUE)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
To arrange the plots I then write:
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)
The easiest way of getting rid of gridlines etc I thought was by using theme_set() and I think that this might be my problem.
My code is:
theme_set(theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(),
axis.line = element_line(colour = "grey")))
I realize that theme_bw() overwrites my colors in the boxes. But I have tried removing it, switching it for theme_transparent() (this removes all my labels) and neither works. I have searched for a way of just adding a transparency to my boxes in the theme so that my colors will shine through. I am also suspicious that maybe the palette that I chose might give me the same colors in the two plots which I also do not want. To add, if it matters, I have 4 groups in the first plot and 2 in the second.
dput(BOX1_data)
structure(list(Diagnosis = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L, 3L, 3L,
8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L, 3L,
11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L,
10L, 10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L, 3L, 6L, 3L,
6L, 1L, 3L, 3L, 7L, 1L, 2L, 10L, 7L, 7L, 1L, 0L, 2L)), row.names = c(NA,
-77L), class = "data.frame")
dput(BOX2_data)
structure(list(No.Variants = c(3L, 4L, 6L, 14L, 3L, 3L, 4L, 3L,
3L, 3L, 8L, 6L, 22L, 10L, 6L, 9L, 1L, 9L, 3L, 4L, 8L, 2L, 13L,
3L, 11L, 19L, 5L, 5L, 3L, 12L, 4L, 2L, 4L, 18L, 8L, 7L, 7L, 12L,
4L, 1L, 6L, 3L, 2L, 8L, 10L, 3L, 15L, 9L, 13L, 13L, 15L, 10L,
10L, 12L, 6L, 3L, 12L, 9L, 15L, 10L, 18L), Stage = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor")), row.names = c(NA, -61L), class = "data.frame")
Grateful for any tips!
As already pointed out, it seems the OP's issue with theme_set() removing the fill colors set in your two plots was solved by updating to a new version of ggplot2. Herein, I have a solution for the second part of OP's question (that was clarified in the comments). Represented here for convenience:
Now it is just the problem that I want the palette to continue on the second plot's boxes and not restart so that I will get different colors on all boxes.
In order to do this, one has to realize that there are 4 fill colors for the first plot BOX1plot, and 2 fill colors for BOX2plot. For BOX1plot, we want the color palette to begin at the first color, but for BOX2plot, we want the palette to start on the 5th color sequence in the palette. There's no way to do this through the scale_*_brewer() functions, so the approach here will be to access the Brewer palette from RcolorBrewer::brewer.pal(), and then assign where to begin and end in that sequence based on the number of levels of each factor using scale_fill_manual() to just set the color values from the extracted Brewer color palette.
You can just "know" that you need to "use colors 1-4" for BOX1plot and "use color 5 and 6" for BOX2plot; however, it is much more elegant to just calculate this automatically based on the number of levels (in case you want to run this again). The code below does this:
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
# ... read in your data as before
# create factors (as OP did before)
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
# make color palette based on Brewer "Dark2" palette
lev_diag <- length(levels(BOX1_data$Diagnosis))
lev_stage <- length(levels(BOX2_data$Stage))
lev_total <- lev_diag + lev_stage
my_colors <- brewer.pal(lev_total, "Dark2")
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_manual(values=my_colors[1:lev_diag]) +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_manual(values = my_colors[(lev_diag+1):lev_total]) +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
BOX_list <- list(BOX1plot, BOX2plot)
ggarrange(plotlist = BOX_list, labels = c('A', 'B'), ncol = 2)
If you have issues with ggarrange() I would suggest next approach using patchwork:
library(ggplot2)
library(patchwork)
#Data format
BOX1_data$Diagnosis <- as.factor(BOX1_data$Diagnosis)
#Plot 1
BOX1plot <- ggplot(BOX1_data, aes(x=Diagnosis, y=No.Variants, fill= Diagnosis)) + geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("AC\nN=38", "SqCC\nN=15", "SCLC\nN=8", "BL disease\nN=16"))
#Data format
BOX2_data$Stage <- as.factor(BOX2_data$Stage)
#Plot 2
BOX2plot <- ggplot(BOX2_data, aes(x=Stage, y=No.Variants, fill = Stage)) + geom_boxplot(width = 0.4) +
scale_fill_brewer(palette = "Dark2") +
scale_x_discrete(labels = c("Stage I-III\nN=24", "Stage IV\nN=37"))
#Arrange plots
BOX1plot+BOX2plot+plot_annotation(tag_levels = 'A')
The output:

perform acf plot for each type of group in R

Say, here the mydata (little part)
transport<- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), .Label = c("01.01.2001", "01.02.2001", "01.03.2001",
"01.04.2001", "01.05.2001", "01.06.2001", "01.07.2001", "01.08.2001",
"01.09.2001", "01.10.2001", "01.11.2001", "01.12.2001"), class = "factor"),
Market_82 = c(7000L, 7272L, 7668L, 7869L, 8057L, 8428L, 8587L,
8823L, 8922L, 9178L, 9306L, 9439L, 3725L, 4883L, 8186L, 7525L,
6335L, 4252L, 5642L, 1326L, 8605L, 3501L, 1944L, 7332L),
transport = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("plane", "train"), class = "factor")), .Names = c("date",
"Market_82", "transport"), class = "data.frame", row.names = c(NA,
-24L))
group variable - Transport.
For each type of transport i must get acf plot of time series.
something like this
How perform acf plot for each transport?
I have a lot of groups. How to do that plots were in folder
C:/Users/admin/Documents/myplot
akrun's answer is spot on. Since you tagged the question with ggplot2 you could also use ggAcf from the forcast package.
The first step is to split your data.
transport_split <- split(transport, transport$transport)
If you want to include the respective element of column transport in the title, subtitle etc. try with Map
out <- Map(
f = function(x, y)
forecast::ggAcf(x$Market_82) + labs(title = y),
x = transport_split,
y = names(transport_split)
)
out$train
We can do this with Acf from forecast
library(forecast)
par(mfrow = c(2, 1))
lapply(split(transport['Market_82'], transport$transport), Acf)
If we also want the title, then
lst <- lapply(split(transport['Market_82'], transport$transport), acf, plot = FALSE)
par(mfrow = c(2, 1))
lapply(names(lst), function(x) plot(lst[[x]], main = x))

Why isn't my barplot rearranging properly when faceting with ggplot?

So I have made this barplot with this code, bars organised in descending order, great!
na.omit(insect_tally_native_ranges)%>%
group_by(native_ranges)%>%
dplyr::summarise(freq=sum(n))%>%
ggplot(aes(x=reorder(native_ranges,freq),y=freq))+
geom_col(color="#CD4F39",fill="#CD4F39",alpha=0.8)+
coord_flip()+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()
And now I wanted to do the same but faceting by a variable called Period, here's the code:
ggplot(native_freq_period,
aes(y=reorder(native_ranges,freq),x=freq))+
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8)+
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges")+
theme_minimal()+
facet_wrap(~Period)
But the plot came out like this:
Which is pretty annoying because it is the same code as above and the levels for the variable native_ranges should be organised again. But instead it gives me this lumpy order that isn't even the alphabetic order. So the reorder part is reordering but not by freq! Don't understand.
Here is the data:
structure(list(native_ranges = structure(c(6L, 10L, 11L, 7L,
3L, 5L, 1L, 1L, 8L, 6L, 3L, 5L, 2L, 4L, 5L, 7L, 7L, 7L, 8L, 9L,
11L), .Label = c("Afrotropic", "Afrotropic/Neotropic", "Australasia",
"Australasia/Neotropic", "Indomalaya", "Nearctic", "Neotropic",
"Neotropic/Nearctic", "Neotropic/Nearctic/Australasia", "Palearctic",
"Palearctic/Indomalaya"), class = "factor"), Period = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L, 1L, 2L,
3L, 2L, 4L, 3L), .Label = c("1896-1925", "1926-1955", "1956-1985",
"1986-2018"), class = "factor"), freq = c(21L, 13L, 12L, 11L,
10L, 10L, 4L, 4L, 4L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L)), row.names = c(NA, -21L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = "native_ranges", drop = TRUE, indices = list(
6:7, 12L, c(4L, 10L), 13L, c(5L, 11L, 14L), c(0L, 9L), c(3L,
15L, 16L, 17L), c(8L, 18L), 19L, 1L, c(2L, 20L)), group_sizes = c(2L,
1L, 2L, 1L, 3L, 2L, 4L, 2L, 1L, 1L, 2L), biggest_group_size = 4L, labels = structure(list(
native_ranges = structure(1:11, .Label = c("Afrotropic",
"Afrotropic/Neotropic", "Australasia", "Australasia/Neotropic",
"Indomalaya", "Nearctic", "Neotropic", "Neotropic/Nearctic",
"Neotropic/Nearctic/Australasia", "Palearctic", "Palearctic/Indomalaya"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame", vars = "native_ranges", drop = TRUE))
You have to arrange the order of the variable first before plotting. Since you didn't provide any reproducible data I am using the following data
drugs <- data.frame(drug = c("a", "b", "c"), effect = c(4.2, 9.7, 6.1))
ggplot(drugs, aes(drug, effect)) +
geom_col()
Now to change the order of the variable use factor
drugs$drug <- factor(drugs$drug,levels = c("b","a","c")) #This is the order I want
ggplot(drugs, aes(drug, effect)) +
geom_col()
Here I provided the levels in factor manually. You can either provide them manually or sort the order of the variable first separately and provide. See below,
drugs$drug <- factor(drugs$drug,levels = drugs[order(drugs$effect),]$drug)
ggplot(drugs, aes(drug, effect)) +
geom_col()
This should work with facet_wrap as well.
OK, finally figured it out with help from the other answer. You need to create another column that summarizes the total frequency so you can then reorder by that column. There may be a more efficient way to do it, but I create a new summary data.frame and then join it back to the original and then reorder based on the new column.
summary_data <- data %>%
ungroup() %>%
group_by(native_ranges) %>%
summarize(total = sum(freq))
data <- data %>%
left_join(summary_data)
ggplot(data, aes(y = reorder(native_ranges, total),x = freq)) +
geom_barh(stat= "identity",
color="#CD4F39",
fill="#CD4F39",
alpha=0.8) +
labs(x="Native ranges",
y="Number of invasive insect arrivals",
title="Species by native ranges") +
theme_minimal()+
facet_wrap(~Period)

Why does ggtern distort data

I can't work out why my data points in the ternary diagram appear distorted, particularly visible in Fe02 scale where none of the values approaching 50% seem to be plotting correctly. Does ggtern require some data transformation or am I missing something?
The dataset:
KiDaSm<-structure(list(Site = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Dakawa", "Fukuchani",
"Kilwa", "Mkokotoni", "Tumbe Chwaka", "Unguja Ukuu"), class = "factor"),
Sample = structure(c(7L, 8L, 9L, 10L, 11L, 14L, 15L, 16L,
17L, 19L, 20L, 21L, 23L, 24L, 25L, 26L), .Label = c("EB005",
"EB008", "EB009", "EB017", "EB018", "EB023", "EB028", "EB030",
"EB033", "EB034", "EB035", "EB036", "EB037", "EB038", "EB040",
"EBDAK002", "EBDAK006", "EBDAK007", "EBDAK009", "EBDAK012",
"EBDAK014", "EBDAK015", "EBDAK017", "EBDAK020", "EBDAK021",
"EBDAK022", "FKCH002", "FKCH003", "FKCH005", "FKCH006", "FKCH008",
"FKCH009", "FKCH010", "FKCH012", "FKCH014", "FKCH015", "FKCH016",
"FKCH017", "FKCH018", "FKCH019", "FKCH023", "MKK002", "MKK003",
"MKK007", "MKK009", "MKK011", "MKK013", "MKK014", "MKK017",
"MKK018", "MKK020", "MKK06", "TBCH001", "TBCH002", "TBCH003",
"TBCH005", "TBCH007", "TBCH008", "TBCH009", "TBCH010", "TBCH011",
"TBCH014", "TBCH017", "TBCH018", "TBCH021", "TBCH022", "UU001",
"UU003", "UU004", "UU005", "UU007", "UU008", "UU010", "UU011",
"UU012", "UU014", "UU018", "UU020", "UU022", "UU023", "UU026",
"UU031", "UU033"), class = "factor"), ID = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("ND", "Smelting", "Smithing"), class = "factor"),
Iron = c(52.2866002788889, 57.437955161, 55.880450631, 50.213473286,
53.068958017, 55.776340727, 56.764639409, 61.37738424, 75.741474131,
75.459980082, 69.785922113, 76.298245515, 75.860464737, 77.221978734,
76.602317775, 67.582636787), Aluminium = c(8.07348620588889,
6.9369729006, 6.4314347298, 7.7061493869, 7.3254949831, 7.2108549156,
7.2113019865, 8.2022565362, 4.570137602, 4.3668232665, 5.8538177888,
4.5660791632, 4.2671637947, 4.727287541, 4.7084385736, 6.0287010895
), Silicon = c(24.6786504477778, 22.516695383, 24.261662172,
26.81463386, 25.558654883, 23.062108874, 23.144722305, 26.480492462,
17.138349267, 16.917779397, 19.620246624, 16.265818105, 17.628059944,
15.696017597, 15.786928218, 22.04500569)), .Names = c("Site",
"Sample", "ID", "Iron", "Aluminium", "Silicon"), row.names = c(NA,
-16L), class = "data.frame")
My code:
library(ggtern)
ggtern(KiDaSm, aes(Iron,Silicon, Aluminium, color=Site, shape=Site )) + geom_point() +
labs(x = expression(FeO[2]), y=expression(SiO[2]), z=expression(Al[2]*O[3])) +
scale_color_manual(values = c("#FFC300", "#FF5733")) +
theme_bw()
Ternary diagram:

Suppressing data from a graph in R

I have a dataset, d, that contains personally identifiable data, I have the dataset putting an X for all values that are suppressed:
column1 column2 column3
* FSM X
* Male 2.5
* Female X
A FSM 6
A Male 10.3
A Female 11.7
B FSM 14.8
B Male 21.5
B Female 25.3
I want to plot this with an X above the bars in a bar plot, where data has been suppressed, such as:
My code is:
p <- ggplot(d, aes(x=column1, y=column3, fill=column2)) +
geom_bar(position=position_dodge(), stat="identity", colour="black") +
geom_text(aes(label=column2),position= position_dodge(width=0.9), vjust=-.5)
scale_y_continuous("Percentage",breaks=seq(0, max(d$column3), 2)))
But of course, it can't plot 'X' on the graph and says:
Error: Discrete value supplied to continuous scale
How can I get the bar plotting to ignore the 'X' and still add the label if it's present?
Data dump:
structure(list(column1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("*",
"A", "B", "C", "D", "E", "U"), class = "factor"), column2 = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("FSM", "Male", "Female"), class = "factor"),
column3 = structure(c(21L, 1L, 2L, 18L, 3L, 4L, 7L, 12L,
14L, 16L, 15L, 13L, 10L, 9L, 8L, 11L, 6L, 5L, 20L, 19L, 17L
), .Label = c("1.93889541715629", "1.97444831591173", "10.1057579318449",
"11.7305458768873", "12.7758420441347", "14.4535840188014",
"14.8471615720524", "18.5830429732869", "19.9764982373678",
"20.0873362445415", "20.9606986899563", "21.5628672150411",
"24.1579558652729", "25.3193960511034", "25.7931844888367",
"29.2576419213974", "5.45876887340302", "6.11353711790393",
"6.16921269095182", "6.98689956331878", "X"), class = "factor")), .Names = c("column1",
"column2", "column3"), row.names = c(NA, -21L), class = "data.frame")
I 'm happy to print out 0 instances where there are 0 instances, but in the case of data suppression, I want to make it clear that data has been suppressed by printing out a 'X', but the bar will also show 0 instances
First convert the height to numeric which gives NA for censored values. Then create a label column based on that. Then you need a column of zeroes for the y coordinate of the labels.
> d$column3=as.numeric(as.character(d$column3))
Warning message:
NAs introduced by coercion
> d$column4 = ifelse(is.na(d$column3),"X","")
> d$y=0
Then:
> p <- ggplot(d, aes(x=column1, y=column3, fill=column2))
> p + geom_bar(position=position_dodge(), stat="identity",
colour="black") +
geom_text(aes(label=column4,x=column1,y=y),
position=position_dodge(width=1), vjust=-0.5)
Giving:
Its a variant on labelling a geom_bar with the value of the bar. Almost a dupe.

Resources