dplyr: categorical counts from a single column across multiple variables - r

So I have been trying to do a boxplot of "yes/no" counts for hours now.
My dataset looks like this
> stack
Site Plot Treatment Meters Retrieved
2 Southern 18 Control -5.00 y
3 Southern 18 Control 9.55 y
4 Southern 18 Control 4.70 y
5 Southern 27 Control -5.00 y
6 Southern 27 Control 20.00 n
9 Southern 18 Control -0.10 y
17 Southern 18 Control 20.00 y
23 Southern 31 Control 100.00 y
53 Southern 25 Mu 3.55 n
54 Southern 20 Mu 5.90 y
55 Southern 25 Mu -0.10 y
56 Southern 29 Mu 9.55 y
58 Southern 25 Mu 4.70 y
60 Southern 20 Mu 2.90 y
61 Southern 24 Mu 5.90 n
62 Southern 24 Mu 3.55 y
63 Southern 20 Mu 3.55 y
65 Southern 24 Mu 0.55 y
66 Southern 29 Mu 8.90 y
68 Southern 25 Mu 8.90 y
69 Southern 29 Mu 0.55 y
70 Southern 24 Mu 1.70 y
72 Southern 29 Mu -5.00 y
76 Southern 29 Mu 1.70 y
77 Southern 25 Mu 9.55 y
78 Southern 25 Mu 13.20 y
79 Southern 29 Mu 3.55 y
80 Southern 25 Mu 15.00 y
81 Southern 25 Mu -5.00 n
84 Southern 24 Mu 8.90 y
85 Southern 20 Mu 6.55 y
86 Southern 29 Mu 2.90 y
92 Southern 24 Mu -0.10 y
93 Southern 20 Mu 100.00 y
I want to get counts of both y(yes) and n(no) of the variable "Retrieved" while grouping for "Treatment" and "Meters".
So it should look something like this
Treatment Meters Yes No
Control -5.00 2 0
Control 9.55 1 2
Control 4.70 1 1
Control 20.00 0 2
Mu 3.55 4 0
Mu 5.90 0 1
Mu -0.10 2 2
Mu 9.55 1 0
With this data I want to do a stacked boxplot with x=Meters, y= count and Treatment as grid or something. like this
This is my code but it's not working
plot_data <- stack %>%
count(Retrieved, Treatment, Meters) %>%
group_by(Treatment, Meters) %>%
mutate(count= n)
plot_data
ggplot(plot_data, aes(x = Meters, y = count, fill = Treatment)) +
geom_col(position = "fill") +
geom_label(aes(label = count(count)), position = "fill", color = "white", vjust = 1, show.legend = FALSE) +
scale_y_continuous(labels = count)
Could you please tell me what I'm doing wrong.

geom_bar is for precisely this case, and you won't even need to use group_by or count. (From the docs: "geom_bar makes the height of the bar proportional to the number of cases in each group".)
This should do what you're looking for:
ggplot(stack, aes(x = Meters, fill = Treatment)) +
geom_bar(position = "stack")
However, the bars will be very narrow because "Meters" is continuous and has a large range. You could address this by converting it into a factor. One way to do that would be to do this first:
data <- data %>%
mutate(Meters = as.factor(Meters))
If you want to get the counts in the format that you mentioned (in addition to creating the plot), you could do:
data %>%
count(Treatment, Meters, Retrieved) %>%
spread(Retrieved, n, fill = 0) %>%
rename(Yes = y, No = n)
count does group_by for you, so I didn't need to carry that over from your code. Then, spread creates the separate columns for y and n. Finally, I rename those columns to Yes and No.

Related

How can you split ggplot from 12 individual bars to 3 groups of 4?

I have a bar graph with 12 individual bars. I would like to split them into their 3 respective groups, each with their own color so that they are recognized as the same group. I have been using ColorBrewer Set 3, because it is photocopy safe. When I use it on my plot it all turns one color.
In the plot, you can see the 3 groups - ELE, KEB, and SMI, each with 4 blocks. It would be great if they could be split up more cohesively.
# A tibble: 12 x 7
vid.order sum.correct n prop.correct z_score p_val sig
<chr> <int> <int> <dbl> <dbl> <dbl> <lgl>
1 ELE1 47 55 0.855 5.26 0.000000145 TRUE
2 ELE2 46 55 0.836 4.99 0.000000607 TRUE
3 ELE3 37 55 0.673 2.56 0.0104 TRUE
4 ELE4 47 55 0.855 5.26 0.000000145 TRUE
5 KEB1 40 55 0.727 3.37 0.000749 TRUE
6 KEB2 46 55 0.836 4.99 0.000000607 TRUE
7 KEB3 47 55 0.855 5.26 0.000000145 TRUE
8 KEB4 44 55 0.8 4.45 0.00000860 TRUE
9 SMI1 35 55 0.636 2.02 0.0431 TRUE
10 SMI2 46 55 0.836 4.99 0.000000607 TRUE
11 SMI3 41 55 0.745 3.64 0.000272 TRUE
12 SMI4 35 55 0.636 2.02 0.0431 TRUE
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()
Personally I would create a column with the groups (ELE, KEB or SMI) and use that in aes(fill = )
library(data.table)
library(dplyr)
library(ggplot2)
library(jtools)
#make object data.table
setDT(byBlot_sigtests)
#create a column with the groups (vid.order but without the numbers)
byBlot_sigtests[, group := gsub("[0-9]", "", vid.order)]
#plot
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct, fill = group))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()

ggplot() color each point manually

How do I create a scatter-plot in ggplot() with each points coloured manually? The necessary colours are given in my dataframe.
> head(df)
x y col
1 0.72 2757 #2AAE89
2 0.72 2757 #2DFE83
3 0.72 2757 #40FE89
4 0.70 2757 #28FE97
5 0.86 2757 #007C7D
6 0.75 2757 #24FEA1
The colour of the points must be exactly as given in the dataframe
Luckily there is a relatively easy solution by using scale_colour_identity(), see the following example:
library(ggplot2)
z <- " x y z col
1 0.72 2757 86 #2AAE89
2 0.72 2757 86 #2DFE83
3 0.72 2757 86 #40FE89
4 0.70 2757 82 #28FE97
5 0.86 2757 26 #007C7D
6 0.75 2757 79 #24FEA1"
df <- read.table(text = z, header = T)
ggplot(df, aes(x, y, colour = col)) +
geom_point() +
scale_colour_identity()
EDIT: I made a mistake in loading in the data, but the plotting syntax is still valid.

how graduate the axis of the graphic

I have two data tables:
vah_p_1
x y
0 4
0.25 5
0.27 6
0,29 7
0.31 8
0.33 10
0.34 13
0.36 16
0.37 20
0.38 23
0.39 28
0.4 37
0.41 43
0.42 55
0.43 67
0.44 81
0.45 94
0.46 118
0.47 143
0.48 187
0.49 225
vah_o_1
x y
-17.2 -9
-14.2 -8
-9.27 -7
-6.9 -6
-4.09 -5
0 -4
I need to build data for two tables in one graph(code below).
vah_p <- read.table(file='vah_p_1',header =TRUE)
y <- log2(vah_p$y)
x <- vah_p$x
mat_p <- data.frame(x,y)
error_p <- lm(y ~ x, mat_p)
error_p <- tidy(error_p)
vah_o <- read.table(file='vah_o_1',header =TRUE)
y <- log2((vah_o$y)*(-1))
x <- vah_o$x
mat_o <- data.frame(x,y)
error_o <- lm(y ~ x, mat_o)
error_o <- broom::tidy(error_o)
library(ggplot2)
p <- ggplot(vah_p, aes(x = x, y = y)) +
geom_point() + geom_point(data = vah_o, aes(x = x, y = y))
p
After compilation I will get a graph.
(source: savepice.ru)
This schedule is very bad. I tried to graduate the axis the graphics that looked better, but I did not succeed. Help please.
If you would like to change the scale as I understand the problem use
ggplot() + ylim(min, max)

line connecting missing data R

I would like a line plot in R of the days a bird spent away from its nest.
I have missing data that is making it difficult to show the general trend. I want to replace the line for the days that I don't have information for with a dotted line. I have absolutely no idea how to do this. Is it possible to do in R?
> time.away.1
hrs.away days.rel
1 0.380 -2
2 0.950 -1
3 1.000 0
4 0.200 1
5 0.490 12
6 0.280 13
7 0.130 14
8 0.750 20
9 0.160 21
10 1.830 22
11 0.128 26
12 0.126 27
13 0.500 28
14 0.250 31
15 0.230 32
16 0.220 33
17 0.530 40
18 3.220 41
19 0.430 42
20 1.960 45
21 1.490 46
22 24.000 56
23 24.000 57
24 24.000 58
25 24.000 59
26 24.000 60
27 24.000 61
My attempt:
plot(hrs.away ~ days.rel, data=time.away.1,
type="o",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
Here is a way using diff to make a variable determining if a sequence is missing. Note that I renamed your data to dat
## make the flag variable
dat$type <- c(TRUE, diff(dat$days.rel) == 1)
plot(hrs.away ~ days.rel, data=dat,
type="p",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
legend("topright", c("missing", "sampled"), lty=c(2,1))
## Add line segments
len <- nrow(dat)
with(dat,
segments(x0=days.rel[-len], y0=hrs.away[-len],
x1=days.rel[-1], y1=hrs.away[-1],
lty=ifelse(type[-1], 1, 2),
lwd=ifelse(type[-1], 2, 1))
)
For the ggplot version, you can make another data.frame with the lagged variables used above,
library(ggplot2)
dat2 <- with(dat, data.frame(x=days.rel[-len], xend=days.rel[-1],
y=hrs.away[-len], yend=hrs.away[-1],
type=factor(as.integer(type[-1]))))
ggplot() +
geom_point(data=dat, aes(x=days.rel, y=hrs.away)) +
geom_segment(data=dat2, aes(x=x, xend=xend, y=y, yend=yend, lty=type, size=type)) +
scale_linetype_manual(values=2:1) +
scale_size_manual(values=c(0.5,1)) +
ylim(0, 4) + theme_bw()

Plotting gridded field

I have a gridded field that I plotted with the image function
df <- datainSUB
yr mo dy hr lon lat cell sst avg moavg
1900 6 5 17 -73.5 -60.5 83 2.4 2.15 3.15
1900 6 7 17 -74.5 -60.5 83 3.9 2.15 3.15
1900 8 17 17 -70.5 -60.5 83 -0.9 2.15 0.60
1900 8 18 17 -73.5 -60.5 83 2.1 2.15 0.60
1900 9 20 17 -71.5 -60.5 83 0.2 2.15 2.20
1900 9 21 17 -74.5 -61.5 83 1.6 2.15 2.20
gridplot <- function(df){
pdf(paste(df$mo,".pdf"))
# Compute the ordered x- and y-values
LON <- seq(-180, 180, by = space)
LAT <- seq(-90, 90, by = space)
# Build the matrix to be plotted
moavg <- matrix(NA, nrow=length(LON), ncol=length(LAT))
moavg[cbind(match(round(df$lon, -1), LON), match(round(df$lat, -1), LAT))] <- df$moavg
# Plot the image
image(LON, LAT, moavg)
map(add=T,col="saddlebrown",interior = FALSE, database="world")
dev.off()
}
I want to add a colour legend to the plot but I don't know how to do that. Maybe ggplot is better?
Many thanks
Add the following line after plotting your data:
legend(x="topright", "your legend goes here", fill="saddlebrown")

Resources