Related
As the title says, I'm having some difficulty creating a graph with ggplot2 where paired data points are linked by connecting lines. I keep on running into the error message "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?" and no connecting lines are created. From searching online, it looks like most people who have encountered this error message can fix their issue by setting a group aesthetic within geom_line, but this is not working for me, despite that (as far as I can tell) I do have two observations per group.
Here is the graph I have so far
and here is the code I used to create the graph:
plot = ggplot(data = df2_mean, aes(x = type, y = mean, fill = type)) + geom_col() + facet_wrap(df2_mean$GH_JSI, strip.position = 'bottom') +
labs(x = element_blank(), y = "Seconds", title = "Cumulative Investigation Time", fill = element_blank()) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank(), plot.title = element_text(hjust = 0.5)) +
geom_errorbar(ymin = df2_mean$mean - df2_mean$SEM, ymax = df2_mean$mean + df2_mean$SEM, width = 0.2) +
geom_point(data = df2, mapping = aes(x = df2$type, y = df2$value)) +
coord_cartesian(ylim = c(0, max(df2$value)))
plot
Here are the dataframes used to make the graph:
> df2
GH_JSI value type ID IDnumeric
1 GH 88.2216 social 1388035-1 13880351
2 GH 152.1190 social 1388034-1 13880341
3 GH 34.1675 social 1388033-2 13880332
4 GH 150.7840 social 1388034-2 13880342
5 GH 225.4590 social 1388033-3 13880333
6 GH 184.2180 social 1388035-3 13880353
7 GH 149.4160 social 1388033-4 13880334
8 GH 77.6443 social 1388034-4 13880344
9 GH 162.3290 social 1388033-1 13880331
10 GH 110.8780 social 1388036-1 13880361
11 GH 158.4250 social 1388036-2 13880362
12 GH 225.8930 social 1388035-2 13880352
13 GH 217.2840 social 1388036-3 13880363
14 GH 94.8282 social 1388034-3 13880343
15 GH 146.5800 social 1388035-4 13880354
16 JSI 151.6180 social 1302238-1 13022381
17 JSI 127.1270 social 1302235-1 13022351
18 JSI 108.5420 social 1302235-2 13022352
19 JSI 80.6140 social 1302259-2 13022592
20 JSI 185.4190 social 1302235-3 13022353
21 JSI 184.4510 social 1302259-3 13022593
22 JSI 210.8110 social 1302235-4 13022354
23 JSI 185.4190 social 1302259-4 13022594
24 JSI 105.5060 social 1302259-1 13022591
25 JSI 113.2130 social 1302305-1 13023051
26 JSI 193.0930 social 1302305-2 13023052
27 JSI 189.3890 social 1302238-2 13022382
28 JSI 138.9060 social 1302305-3 13023053
29 JSI 151.5180 social 1302238-4 13022384
30 JSI 165.6660 social 1302305-4 13023054
31 GH 122.7890 object 1388035-1 13880351
32 GH 77.4775 object 1388034-1 13880341
33 GH 34.8348 object 1388033-2 13880332
34 GH 126.6270 object 1388034-2 13880342
35 GH 66.2996 object 1388033-3 13880333
36 GH 71.0377 object 1388035-3 13880353
37 GH 112.7790 object 1388033-4 13880334
38 GH 114.9820 object 1388034-4 13880344
39 GH 102.0690 object 1388033-1 13880331
40 GH 43.9439 object 1388036-1 13880361
41 GH 50.8842 object 1388036-2 13880362
42 GH 106.0390 object 1388035-2 13880352
43 GH 46.0127 object 1388036-3 13880363
44 GH 57.4575 object 1388034-3 13880343
45 GH 143.0760 object 1388035-4 13880354
46 JSI 135.0680 object 1302238-1 13022381
47 JSI 54.2543 object 1302235-1 13022351
48 JSI 53.2533 object 1302235-2 13022352
49 JSI 142.2090 object 1302259-2 13022592
50 JSI 30.7975 object 1302235-3 13022353
51 JSI 32.5993 object 1302259-3 13022593
52 JSI 60.0934 object 1302235-4 13022354
53 JSI 57.5909 object 1302259-4 13022594
54 JSI 66.9336 object 1302259-1 13022591
55 JSI 89.5229 object 1302305-1 13023051
56 JSI 31.3981 object 1302305-2 13023052
57 JSI 75.3420 object 1302238-2 13022382
58 JSI 103.7700 object 1302305-3 13023053
59 JSI 133.3670 object 1302238-4 13022384
60 JSI 116.7830 object 1302305-4 13023054
Note that for each "social" observation, there is a corresponding "object" observation with the same ID. These are the data points I would like to link with a connecting line.
> df2_mean
GH_JSI mean SEM type
1 GH 145.21644 14.49010 social
2 GH 85.08726 9.04591 object
3 JSI 152.75280 10.10692 social
4 JSI 78.86549 10.05670 object
And finally, here is how I attempted to add the connecting lines:
df2$IDnumeric = as.numeric(as.character(gsub("-", "", df2$ID)))
plot + geom_line(data = df2, mapping = aes(x = type, y = value, group = IDnumeric))
I've tried playing around with the arguments to geom_line quite a bit, but to no avail. I'm new to ggplot, so maybe there's something fundamental I'm missing here. Any help would be greatly appreciated!
Edit:
Here is the dput output of the dataframes
> dput(df2)
structure(list(GH_JSI = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), class = "factor", .Label = c("GH", "JSI"
)), value = c(88.2216, 152.119, 34.1675, 150.784, 225.459, 184.218,
149.416, 77.6443, 162.329, 110.878, 158.425, 225.893, 217.284,
94.8282, 146.58, 151.618, 127.127, 108.542, 80.614, 185.419,
184.451, 210.811, 185.419, 105.506, 113.213, 193.093, 189.389,
138.906, 151.518, 165.666, 122.789, 77.4775, 34.8348, 126.627,
66.2996, 71.0377, 112.779, 114.982, 102.069, 43.9439, 50.8842,
106.039, 46.0127, 57.4575, 143.076, 135.068, 54.2543, 53.2533,
142.209, 30.7975, 32.5993, 60.0934, 57.5909, 66.9336, 89.5229,
31.3981, 75.342, 103.77, 133.367, 116.783), type = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("object",
"social"), class = "factor"), ID = structure(c(24L, 20L, 17L,
21L, 18L, 26L, 19L, 23L, 16L, 28L, 29L, 25L, 30L, 22L, 27L, 5L,
1L, 2L, 9L, 3L, 10L, 4L, 11L, 8L, 12L, 13L, 6L, 14L, 7L, 15L,
24L, 20L, 17L, 21L, 18L, 26L, 19L, 23L, 16L, 28L, 29L, 25L, 30L,
22L, 27L, 5L, 1L, 2L, 9L, 3L, 10L, 4L, 11L, 8L, 12L, 13L, 6L,
14L, 7L, 15L), class = "factor", .Label = c("1302235-1", "1302235-2",
"1302235-3", "1302235-4", "1302238-1", "1302238-2", "1302238-4",
"1302259-1", "1302259-2", "1302259-3", "1302259-4", "1302305-1",
"1302305-2", "1302305-3", "1302305-4", "1388033-1", "1388033-2",
"1388033-3", "1388033-4", "1388034-1", "1388034-2", "1388034-3",
"1388034-4", "1388035-1", "1388035-2", "1388035-3", "1388035-4",
"1388036-1", "1388036-2", "1388036-3")), IDnumeric = c(13880351,
13880341, 13880332, 13880342, 13880333, 13880353, 13880334, 13880344,
13880331, 13880361, 13880362, 13880352, 13880363, 13880343, 13880354,
13022381, 13022351, 13022352, 13022592, 13022353, 13022593, 13022354,
13022594, 13022591, 13023051, 13023052, 13022382, 13023053, 13022384,
13023054, 13880351, 13880341, 13880332, 13880342, 13880333, 13880353,
13880334, 13880344, 13880331, 13880361, 13880362, 13880352, 13880363,
13880343, 13880354, 13022381, 13022351, 13022352, 13022592, 13022353,
13022593, 13022354, 13022594, 13022591, 13023051, 13023052, 13022382,
13023053, 13022384, 13023054)), row.names = c(NA, -60L), class = "data.frame")
...
> dput(df2_mean)
structure(list(GH_JSI = structure(c(1L, 1L, 2L, 2L), .Label = c("GH",
"JSI"), class = "factor"), mean = c(145.21644, 85.08726, 152.7528,
78.8654866666667), SEM = c(14.4901035586421, 9.04591008912605,
10.1069213744132, 10.0566997660923), type = structure(c(1L, 2L,
1L, 2L), .Label = c("social", "object"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
It looks like you need to fix your facet_wrap line to facet_wrap(~GH_JSI, strip.position = 'bottom') +. The version in your code suggests the faceting should be done on the GH_JSI column from df2_mean, which shares the same values but is not the same data as the column of the same name from df2.
This code works for me on your sample data:
ggplot(data = df2_mean, aes(x = type, y = mean, fill = type)) +
geom_col() +
geom_path(data = df2, aes(x = type, y = value, group = IDnumeric)) +
facet_wrap(~GH_JSI, strip.position = 'bottom') +
...
I am given a big data set with several columns. As an example
set.seed(1)
x <- 1:15
y <- letters[1:3][sample(1:3, 15, replace = T)]
z <- letters[10:13][sample(1:3, 15, replace = T)]
r <- letters[20:24][sample(1:3, 15, replace = T)]
df <- data.frame("Number"=x, "Section"=y,"Chapter"=z,"Rating"=r)
dput(df)
structure(list(Number = 1:15, Area = structure(c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), Section = structure(c(2L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 2L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", row.names = c(NA,-15L))
I would like now to create frequency tables and graphs split by rating and a a chosen category, e.g. via a string:
Category<-"Section"
data_count <- ddply(df, .(get(Category),Rating), 'count')
data_rel_freq <- ddply(data_count, .(Rating), transform, rel_freq = freq/sum(freq))
dput(data_rel_freq)
structure(list(get.Category. = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), .Label = c("j", "k","l"), class = "factor"), Number = c(4L, 8L, 10L, 12L, 1L, 15L, 2L, 3L, 14L, 7L, 9L, 11L, 13L, 5L, 6L), Area = structure(c(3L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 1L, 3L), .Label = c("a", b", "c"), class = "factor"), Section = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), freq = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), rel_freq = c(0.5, 0.5, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667)), class = "data.frame", row.names = c(NA, -15L))
Using ggplot
ggplot(data_rel_freq,aes(x = Rating, y = rel_freq,fill = get(Category)))+
geom_bar(position = "fill",stat = "identity",color="black") +
scale_y_continuous(labels = percent_format())+
labs(x = "Rating", y="Relative Frequency")
The issue is now that "get(Category)" is now treated as a new column
get.Category. Number Area Section Rating freq rel_freq
1 k 4 c k A 1 0.5000000
2 k 8 b k A 1 0.5000000
3 j 10 a j B 1 0.1428571
4 j 12 a j B 1 0.1428571
5 k 1 a k B 1 0.1428571
6 k 15 c k B 1 0.1428571
7 l 2 b l B 1 0.1428571
Moreover, the Number column should be summed, e.g. the other categories (here: Area) should be dropped and it we should have just one line with for Section "k" with Rating "A".
We can use count to get the frequency of the column 'Section' by evaluating the object identifier 'Category' after converting to symbol (sym) and evaluate (!!) it. Within the ggplot syntax, the aes can also take a symbol and can be evaluated as earlier
library(tidyverse)
library(scales)
library(ggplot2)
df %>%
count(!! rlang::sym(Category), Rating) %>%
group_by(Rating) %>%
mutate(rel_freq = n/sum(n)) %>%
ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) +
geom_bar(position = "fill",stat = "identity",color="black") +
scale_y_continuous(labels = percent_format())+
labs(x = "Rating", y="Relative Frequency")
-output
I have the following graph:
And would like to make what I thought would be a very simple change: I would like to remove the top, right and bottom sides of the left facet label border lines.
How do I do I remove those lines, or draw the equivalent of the right hand lines? I would rather not muck about with grobs, if possible, but won't say no to any solution that works.
Graph code:
library(ggplot2)
library(dplyr)
library(forcats)
posthoc1 %>%
mutate(ordering = -as.numeric(Dataset) + Test.stat,
Species2 = fct_reorder(Species2, ordering, .desc = F)) %>%
ggplot(aes(x=Coef, y=Species2, reorder(Coef, Taxa), group=Species2, colour=Taxa)) +
geom_point(size=posthoc1$Test.stat*.25, show.legend = FALSE) +
ylab("") +
theme_classic(base_size = 20) +
facet_grid(Taxa~Dataset, scales = "free_y", space = "free_y", switch = "y") +
geom_vline(xintercept = 0) +
theme(axis.text.x=element_text(colour = "black"),
strip.placement = "outside",
strip.background.x=element_rect(color = NA, fill=NA),
strip.background.y=element_rect(color = "black", fill=NA)) +
coord_cartesian(clip = "off") +
scale_x_continuous(limits=NULL)
Data:
structure(list(Dataset = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 5L, 5L, 5L, 5L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), .Label = c("All.habitat", "Aut.habitat", "Habitat.season",
"Lit.season", "Spr.habitat"), class = "factor"), Species = structure(c(1L,
2L, 3L, 5L, 6L, 10L, 11L, 12L, 13L, 1L, 3L, 5L, 6L, 13L, 1L,
2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L), .Label = c("Ar.sp1",
"Ar.sp2", "Arc.sp1", "B.pus", "Dal.sp1.bumps", "Dip.unID", "I.palladium",
"Pale", "Ph.sp3", "Port", "Somethus", "sty", "Sty.sp1"), class = "factor"),
Species2 = structure(c(2L, 9L, 1L, 4L, 5L, 7L, 11L, 12L,
13L, 2L, 1L, 4L, 5L, 13L, 2L, 9L, 4L, 5L, 6L, 10L, 8L, 7L,
11L, 13L), .Label = c("Arcitalitrus sp1", "Armadillidae sp1 ",
"Brachyiulus pusillus ", "Dalodesmidae sp1", "Diplopoda",
"Isocladosoma pallidulum ", "Ommatoiulus moreleti ", "Philosciidae sp2",
"Porcellionidae sp1", "Siphonotidae sp2", "Somethus sp1",
"Styloniscidae ", "Styloniscidae sp1"), class = "factor"),
Taxa = structure(c(3L, 3L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L), .Label = c("Amphipoda",
"Diplopoda", "Isopoda"), class = "factor"), Variable = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Autumn", "Litter",
"Spring", "Summer"), class = "factor"), Coef = c(1.911502938,
2.086917154, 1.571872993, 12.61184801, 15.6161116, -1.430032837,
-12.51944478, 12.33934516, -8.040249562, 8.08258816, 1.780142396,
12.88982576, 16.78107544, -13.22641153, 1.68810887, 2.093965381,
12.27209197, 15.08328526, -6.334640911, -11.29985948, -11.62658947,
-1.676293808, -6.246555908, -3.470297147), SE = c(0.403497472,
2.21607562, 0.348600794, 2.423896379, 0.509468128, 3.423013791,
2.382857733, 1.775086895, 2.087788334, 2.23631504, 0.33402261,
2.518562443, 0.459720131, 1.950974996, 0.2476205, 0.235648095,
1.815155489, 0.325804415, 2.564680067, 2.437104984, 2.212583358,
2.677618401, 2.324019051, 0.420436743), Test.stat = c(18.36532749,
13.27324683, 13.29039037, 20.50277493, 44.06097153, 10.55234932,
14.64951518, 13.22575401, 20.16415411, 16.55627107, 11.81407568,
15.15213717, 40.67205188, 12.62233207, 37.60085488, 16.90879258,
20.20215107, 80.30520371, 13.35250626, 13.01692428, 17.52987519,
20.03658771, 12.02467914, 53.5052683)), row.names = 10:33, class = "data.frame")
This solution is based on grobs: find positions of "strip-l" (left strips) and then substitute the rect grobs with line grobs.
p <- posthoc1 %>%
mutate(ordering = -as.numeric(Dataset) + Test.stat,
Species2 = fct_reorder(Species2, ordering, .desc = F)) %>%
ggplot(aes(x=Coef, y=Species2, reorder(Coef, Taxa), group=Species2, colour=Taxa)) +
geom_point(size=posthoc1$Test.stat*.25, show.legend = FALSE) +
ylab("") +
theme_classic(base_size = 20) +
facet_grid(Taxa~Dataset, scales = "free_y", space = "free_y", switch = "y") +
geom_vline(xintercept = 0) +
theme(axis.text.x=element_text(colour = "black"),
strip.placement = "outside",
#strip.background.x=element_rect(color = "white", fill=NULL),
strip.background.y=element_rect(color = NA)
) +
coord_cartesian(clip = "off") +
scale_x_continuous(limits=NULL)
library(grid)
q <- ggplotGrob(p)
lg <- linesGrob(x=unit(c(0,0),"npc"), y=unit(c(0,1),"npc"),
gp=gpar(col="red", lwd=4))
for (k in grep("strip-l",q$layout$name)) {
q$grobs[[k]]$grobs[[1]]$children[[1]] <- lg
}
grid.draw(q)
I am trying to show the distribution of data between three different methods(FAP, One PIT (onetrans), Two PIT (twotrans), shown in facets below) for measuring the forest fuels. My count on the y-axis is the number of sample points that estimate the grouped value on the x-axis (Total.kg.m2). The Total.kg.m2 is a continuous variable. I don't particularly care how big the binwidth is on the x-axis is but I want only values that are exactly zero to be above the "0" label. My current graph [1] is misrepresentative because there are no sample points that estimate "0" for the FAP method. Below is some example data and my code. How can I do this more effectively? My dataframe is called "cwd" but I have included a subset at the bottom.
My current graph:
The code for my current graph:
method_names <- c(`FAP` = "FAP", `onetrans` = "PIT - One Transect ", `twotrans` ="PIT - Two Transects")
ggplot(sampleData, aes(Total.kg.m2)) +
geom_histogram(bins=40, color = "black", fill = "white") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
legend.position = "none",axis.text=element_text(size=10), axis.title =
element_text(size = 12)) +
scale_x_continuous(name= expression("kg m"^"-2"), breaks =seq(0,16,1)) +
scale_y_continuous(name = "Count", breaks = seq(0, 80,10), limits= c(0,70)) +
facet_grid(.~method) +
facet_wrap(~method, ncol =1, labeller = as_labeller(method_names)) +
theme(strip.text.x = element_text(size =14),
strip.background = element_rect(color = "black", fill = "gray"))
I don't think using geom_bar gets me what I want and I tried changing the binwidth to 0.05 in geom_histogram but then I get bins too small. Essentially, I think I'm trying to change my data from continuous numeric to factors but I'm not sure how to make it work.
Here is some sample data:
sampleData
Site Treatment Unit Plot Total.Tons.ac Total.kg.m2 method
130 Thinning CO 10 7 0.4500000 0.1008000 twotrans
351 Shelterwood CO 12 1 7.2211615 1.6175402 twotrans
88 Thinning NB 3 7 1.1400000 0.2553600 twotrans
224 Shelterwood NB 2 3 2.1136105 0.4734487 onetrans
54 Thinning SB 9 11 1.8857743 0.4224134 onetrans
74 Thinning SB 1 3 0.8500000 0.1904000 twotrans
328 Shelterwood DB 7 11 0.8740906 0.1957963 twotrans
341 Shelterwood CO 10 5 2.4210886 0.5423239 twotrans
266 Shelterwood WB 9 7 1.0092961 0.2260823 onetrans
405 Shelterwood WB 9 5 7.0029263 1.5686555 FAP
332 Shelterwood NB 8 7 2.8059152 0.6285250 twotrans
126 Thinning SB 9 11 1.4900000 0.3337600 twotrans
295 Shelterwood NB 2 5 7.6567281 1.7151071 twotrans
406 Shelterwood WB 9 7 3.0703135 0.6877502 FAP
179 Thinning FB 6 9 13.2916773 2.9773357 FAP
185 Thinning FB 7 9 5.3594318 1.2005127 FAP
39 Thinning FB 7 5 0.0000000 0.0000000 onetrans
187 Thinning NB 8 1 0.9477477 0.2122955 FAP
10 Thinning FB 2 7 0.0000000 0.0000000 onetrans
102 Thinning SB 5 11 0.0000000 0.0000000 twotrans
dput(sampleData)
structure(list(Site = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("Shelterwood",
"Thinning"), class = "factor"), Treatment = structure(c(1L, 1L,
4L, 4L, 5L, 5L, 2L, 1L, 6L, 6L, 4L, 5L, 4L, 6L, 3L, 3L, 3L, 4L,
3L, 5L), .Label = c("CO", "DB", "FB", "NB", "SB", "WB"), class = "factor"),
Unit = c(10L, 12L, 3L, 2L, 9L, 1L, 7L, 10L, 9L, 9L, 8L, 9L,
2L, 9L, 6L, 7L, 7L, 8L, 2L, 5L), Plot = c(7L, 1L, 7L, 3L,
11L, 3L, 11L, 5L, 7L, 5L, 7L, 11L, 5L, 7L, 9L, 9L, 5L, 1L,
7L, 11L), Total.Tons.ac = c(0.45, 7.221161504, 1.14, 2.113610483,
1.885774282, 0.85, 0.874090569, 2.421088641, 1.009296069,
7.002926269, 2.805915201, 1.49, 7.656728085, 3.07031351,
13.29167729, 5.359431807, 0, 0.947747726, 0, 0), Total.kg.m2 = c(0.1008,
1.617540177, 0.25536, 0.473448748, 0.422413439, 0.1904, 0.195796287,
0.542323856, 0.22608232, 1.568655484, 0.628525005, 0.33376,
1.715107091, 0.687750226, 2.977335712, 1.200512725, 0, 0.212295491,
0, 0), method = structure(c(3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 3L), .Label = c("FAP",
"onetrans", "twotrans"), class = "factor")), .Names = c("Site",
"Treatment", "Unit", "Plot", "Total.Tons.ac", "Total.kg.m2",
"method"), row.names = c(130L, 351L, 88L, 224L, 54L, 74L, 328L,
341L, 266L, 405L, 332L, 126L, 295L, 406L, 179L, 185L, 39L, 187L,
10L, 102L), class = "data.frame")
I have a dataset, d, that contains personally identifiable data, I have the dataset putting an X for all values that are suppressed:
column1 column2 column3
* FSM X
* Male 2.5
* Female X
A FSM 6
A Male 10.3
A Female 11.7
B FSM 14.8
B Male 21.5
B Female 25.3
I want to plot this with an X above the bars in a bar plot, where data has been suppressed, such as:
My code is:
p <- ggplot(d, aes(x=column1, y=column3, fill=column2)) +
geom_bar(position=position_dodge(), stat="identity", colour="black") +
geom_text(aes(label=column2),position= position_dodge(width=0.9), vjust=-.5)
scale_y_continuous("Percentage",breaks=seq(0, max(d$column3), 2)))
But of course, it can't plot 'X' on the graph and says:
Error: Discrete value supplied to continuous scale
How can I get the bar plotting to ignore the 'X' and still add the label if it's present?
Data dump:
structure(list(column1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("*",
"A", "B", "C", "D", "E", "U"), class = "factor"), column2 = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("FSM", "Male", "Female"), class = "factor"),
column3 = structure(c(21L, 1L, 2L, 18L, 3L, 4L, 7L, 12L,
14L, 16L, 15L, 13L, 10L, 9L, 8L, 11L, 6L, 5L, 20L, 19L, 17L
), .Label = c("1.93889541715629", "1.97444831591173", "10.1057579318449",
"11.7305458768873", "12.7758420441347", "14.4535840188014",
"14.8471615720524", "18.5830429732869", "19.9764982373678",
"20.0873362445415", "20.9606986899563", "21.5628672150411",
"24.1579558652729", "25.3193960511034", "25.7931844888367",
"29.2576419213974", "5.45876887340302", "6.11353711790393",
"6.16921269095182", "6.98689956331878", "X"), class = "factor")), .Names = c("column1",
"column2", "column3"), row.names = c(NA, -21L), class = "data.frame")
I 'm happy to print out 0 instances where there are 0 instances, but in the case of data suppression, I want to make it clear that data has been suppressed by printing out a 'X', but the bar will also show 0 instances
First convert the height to numeric which gives NA for censored values. Then create a label column based on that. Then you need a column of zeroes for the y coordinate of the labels.
> d$column3=as.numeric(as.character(d$column3))
Warning message:
NAs introduced by coercion
> d$column4 = ifelse(is.na(d$column3),"X","")
> d$y=0
Then:
> p <- ggplot(d, aes(x=column1, y=column3, fill=column2))
> p + geom_bar(position=position_dodge(), stat="identity",
colour="black") +
geom_text(aes(label=column4,x=column1,y=y),
position=position_dodge(width=1), vjust=-0.5)
Giving:
Its a variant on labelling a geom_bar with the value of the bar. Almost a dupe.