How to add line using other data to ggplot? - r

I would like to add a line to my data:
Part of my data:
dput(d[1:20,])
structure(list(MW = c(10.8, 10.9, 11, 11.7, 12.8, 16.6, 16.9,
17.1, 17.4, 17.6, 18.5, 19.1, 19.2, 19.7, 19.9, 20.1, 22.4, 22.7,
23.4, 24), Fold = c(21.6, 21.8, 22, 23.4, 25.6, 33.2, 33.8, 34.2,
34.8, 35.2, 37, 38.2, 38.4, 39.4, 39.8, 40.2, 44.8, 45.4, 46.8,
48), Column = c(33.95, 33.95, 33.95, 33.95, 33.95, 33.95, 33.95,
33.95, 33.95, 33.95, 33.95, 33.95, 33.95, 33.95, 33.95, 33.95,
33.95, 33.95, 33.95, 33.95), Bool = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("MW", "Fold",
"Column", "Bool"), row.names = c(NA, 20L), class = "data.frame")
data for line:
> dput(bb[1:20,])
structure(c(1.95, 3.2, 3.7, 4.05, 4.5, 4.7, 4.75, 5.05, 5.2,
5.2, 5.2, 5.25, 5.3, 5.35, 5.35, 5.4, 5.4, 5.45, 5.5, 5.5, 10,
33.95, 58.66, 84.42, 110.21, 134.16, 164.69, 199.1, 234.35, 257.19,
361.84, 432.74, 506.34, 581.46, 651.71, 732.59, 817.56, 896.24,
971.77, 1038.91), .Dim = c(20L, 2L), .Dimnames = list(NULL, c("b",
"a")))
And as the last the code which I use to create this plot:
first_dot <- ggplot(d, aes(x=MW, y=Column)) +
geom_point() +
scale_x_continuous(limits=c(0, 650), breaks=c(0, 200, 400, 650)) +
scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))
first_dot + geom_line(bb, aes(x=b, y=a))
I get this error all the time when I try to run it:
Error: ggplot2 doesn't know how to deal with data of class uneval
Do you have any idea what I do wrong ?
That's how the data look like without line:
And how it should look like after adding like (approximately):

Just in case somebody lands here looking for an answer to the [underlying] question of "how to plot a line from a different data frame on top of an existing plot":
The key point here is to use the data= in the call to geom_line. Assuming that the data= is not required has happened to some of us, and often is the reason it doesn't work. Correct code would be, as #Roland put in the comments to the question:
ggplot(d, aes(x=MW, y=Column)) +
geom_point() +
scale_x_continuous(limits=c(0, 650), breaks=c(0, 200, 400, 650)) +
scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))+
geom_line(data=as.data.frame(bb),aes(x=b,y=a))

Unfortunately your bb variable is a numberical table so this cannot be plotted by ggplot. Could you try the following:
first_dot + geom_line(data=data.frame(bb), aes(x=b, y=a))
Please note that you convert the bb variable to a data.frame.

Related

Identifying uniroot for multiple independent variables in R

I am trying to calculate the value of x where y = 0. I could able to do it for single x using the following code
lm.model <- lm(y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
plot(x, y)
abline(coef(lm.model))
abline(h=0, col="blue")
(threshold <- uniroot(f, interval = c(0, 100))$root)
abline(v=threshold, col="blue")
x = c(33.05, 14.22, 15.35, 13.52, 8.7, 13.73, 8.28, 21.02, 9.97,
11.98, 12.87, 5.05, 11.23, 11.65, 10.05, 12.58, 13.88, 9.66,
4.62, 4.56, 5.35, 3.7, 3.29, 4.87, 3.75, 6.55, 4.51, 7.77, 4.7,
4.18, 25.14, 18.08, 10.41)
y = c(16.22699279, 15.78620732, 9.656361014, -17.32805679, -20.85685895,
7.601993251, -4.776053714, 10.50972236, 3.853479771, 7.713563136,
8.579366561, 14.16989395, 7.484692081, -1.2807472, -12.13759458,
-0.29138513, -5.238157067, -2.033194068, -38.12157566, -33.61912493,
-9.763657548, -0.240863712, 9.090638907, 7.345492608, 6.949676888,
-19.94866471, 0.995659732, -1.162616185, 5.497998429, 1.656653092,
2.116687436, 22.23175649, 10.33039543)
But I have multiple x variables. Now how can I apply it for multiple x variables at a time?
Here is an example data
df = structure(list(y = c(16.2269927925813, 15.7862073196372, 9.65636101412767,
-17.3280567922775, -20.8568589521297, 7.6019932507973, -4.77605371404423,
10.5097223644541, 3.85347977129367, 7.71356313645697, 8.57936656085966,
14.1698939499927, 7.4846920807874, -1.28074719969249, -12.1375945758837,
-0.291385130176774, -5.23815706681139, -2.03319406769161, -38.1215756639013,
-33.6191249261727, -9.76365754821171, -0.240863712421707, 9.09063890677045,
7.34549260800693, 6.94967688778232, -19.9486647079697, 0.995659731521127,
-1.16261618452931, 5.49799842947493, 1.65665309209479, 2.11668743610013,
22.2317564898722, 10.3303954315884), x1 = c(8.56, 8.66, 9.09,
8.36, 8.3, 8.63, 8.78, 8.44, 8.34, 8.46, 8.33, 8.19, 8.58, 8.65,
8.75, 8.34, 8.77, 9.06, 9.31, 9.11, 9.26, 9.81, 9.68, 9.79, 9.26,
9.53, 8.89, 8.89, 10.37, 9.58, 10.27, 10.16, 10.27), x2 = c(164,
328.3, 0, 590.2, 406.6, 188.4, 423.8, 355.3, 337.6, 0, 0, 200.1,
0, 315.8, 547.5, 225.6, 655.7, 387.2, 0, 487.4, 400.4, 0, 234.9,
275.5, 0, 0, 613.2, 207.4, 184.4, 162.8, 220, 174.8, 0), x3 = c(4517.7,
2953.4, 2899.3, 2573.8, 3310.7, 3880.3, 3016.8, 3552.3, 2960.1,
323, 2638.5, 3343.1, 3274.7, 3218, 3268.3, 3507.9, 3709.2, 3537.5,
2634.4, 1964.6, 3333.7, 2809.7, 3326.8, 3524.5, 3893.9, 3166.7,
3992.1, 4324.7, 3077.9, 3069.9, 4218.9, 3897.4, 2693.9), x4 = c(14.7,
14.5, 15.5, 17, 16.2, 15.9, 15.7, 15.3, 13.5, 14, 15.4, 16.2,
15.6, 15.7, 15.1, 15.8, 15.3, 14.9, 15.7, 16.3, 15.21000004,
16.7, 15.6, 16.2, 15.7, 16.3, 17.3, 16.9, 15.7, 14.9, 13.81999969,
14.90754509, 12.42847157), x5 = c(28.3, 29.1, 28.3, 29.1, 28.7,
29.3, 28.9, 28.4, 29.3, 29.3, 29.1, 29, 29.9, 29.5, 28.4, 30.3,
29.1, 29.1, 29, 29.5, 29.3, 28.5, 29, 28.7, 29.4, 28.8, 29.2,
30.1, 28.3, 28.7, 24.96999931, 25.79496384, 25.3072052), x6 = c(33.05,
14.22, 15.35, 13.52, 8.7, 13.73, 8.28, 21.02, 9.97, 11.98, 12.87,
5.05, 11.23, 11.65, 10.05, 12.58, 13.88, 9.66, 4.62, 4.56, 5.35,
3.7, 3.29, 4.87, 3.75, 6.55, 4.51, 7.77, 4.7, 4.18, 25.14, 18.08,
10.41), x7 = c(13.8425, 11.1175, 8.95, 13.5375, 5.4025, 13.5625,
13.735, 14.14, 8.0875, 5.565, 12.255, 3.3075, 6.345, 4.8125,
4.0325, 11.475, 10.32, 17.71, 2.3375, 3.92, 5.7, 2.42, 8.3075,
7.4725, 7.7925, 10.8725, 8.005, 11.7475, 13.405, 8.425, 47.155,
26.1, 6.6675), x8 = c(0.95, 3.01, 1.92, 1.51, 2.61, 1.32, 3.55,
1.21, 2.14, 1.1, 1.32, 0.76, 1.34, 5.41, 9.38, 6.55, 4.44, 7.37,
9.84, 12.68, 15.52, 23.01, 18.59, 21.64, 19.69, 25.22, 22.38,
25.03, 37.42, 22.26, 2.1, 3.01, 0.82), x9 = c(26.2, 25.8, 25.8,
25.5, 26, 24.7, 22.9, 25.3, 26.3, 26.1, 22.5, 25.9, 26.4, 25.2,
25.8, 25.4, 25, 23.2, 26.4, 25.8, 26.6, 26.2, 25.8, 26.8, 25,
25.4, 25.6, 26.1, 25.7, 25.8, 24.78000069, 24.98148918, 26.39899826
), x10 = c(35.4, 39, 37.5, 36.4, 37.1, 36.2, 37.3, 36.4, 37.5,
36, 36.6, 35.6, 37.3, 38.3, 37, 37.5, 37.5, 39.6, 37.8, 36.8,
36.6, 38.4, 38.9, 38.4, 38.4, 37.7, 39.1, 37.7, 37.8, 39.4, 36.25,
35.57029343, 35.57416534), x11 = c(653.86191565, 383.1, 457.1,
591.4, 549.2, 475.2, 626.4, 308.8, 652.4, 77, 380.9, 530.5, 393,
712.1, 623.4, 515.7, 706.4, 713.4, 343.7, 559.5, 630.1, 292.3,
578.6, 628.88904574, 480.96959685, 591.35600287, 804.8, 419.6,
403.7, 361.2, 515.07101438, 434.66682808, 299.9531298), x12 = c(163.9793854,
167.9, 135, 215.8, 213, 188.4, 260.6, 191.8, 337.6, 55, 147.6,
200.1, 140.7, 315.8, 189.6, 225.6, 469.3, 201.8, 140, 297.2,
204.6, 142.5, 234.9, 275.494751, 153.7796173, 147.6174622, 433.6,
207.4, 184.4, 162.8, 219.9721832, 174.8355713, 106.8163605),
x13 = c(92, 67, 67, 50, 70, 87, 68, 86, 70, 11, 66, 79, 70,
61, 75, 78, 78, 77, 69, 35, 72, 76, 69, 84, 93, 73, 81, 99,
80, 76, 101, 86, 80), x14 = c(70, 42, 46, 34, 55, 60, 51,
65, 49, 1, 40, 56, 54, 41, 48, 57, 46, 50, 41, 22, 47, 47,
49, 57, 70, 52, 56, 70, 48, 50, 74, 66, 47), x15 = c(21,
12, 13, 10, 14, 16, 10, 13, 10, 0, 9, 14, 16, 20, 14, 14,
13, 15, 10, 7, 17, 8, 14, 14, 14, 11, 17, 19, 12, 11, 17,
17, 9), x16 = c(1076.8, 783.7, 711.8, 1041.9, 957.4, 939.3,
662.9, 768.1, 770.3, 0, 399.2, 606.2, 724.1, 960.8, 943.8,
737.8, 1477.4, 1191.7, 371.3, 956.4, 1251.7, 345.7, 1210.7,
845, 598.1, 821.7, 1310.6, 940.1, 581, 520, 313.5, 606.8,
201.2), x17 = c(163.9793854, 167.9, 128.4, 215.8, 213, 188.4,
260.6, 191.8, 337.6, 55, 147.6, 200.1, 140.7, 315.8, 189.6,
225.6, 469.3, 201.8, 140, 297.2, 204.6, 142.5, 234.9, 157.7472534,
153.7796173, 147.6174622, 133.1873627, 150.2, 184.4, 162.8,
219.9721832, 174.8355713, 106.8163605)), row.names = c(NA,
33L), class = "data.frame")
You can use purrr::map to loop through every x.
library(dplyr)
library(purrr)
thresholds <- df %>%
select(-y) %>%
map_dbl(function(x){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
plot(x, df$y)
abline(coef(lm.model))
abline(h=0, col="blue")
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
abline(v=threshold, col="blue")
return(threshold)})
For some x's, uniroot(f, interval = c(0, 100))$root yields an error: Error
in uniroot(f, interval = c(0, 100)) : f() values at end points not of opposite sign
So the tryCatch is used to return NA for the threshold associated with that x, instead of breaking the code.
Result:
> thresholds
x1 x2 x3 x4 x5 x6 x7 x8 x9
9.023314 NA NA 15.459841 28.727293 10.514728 10.493577 9.669244 25.522480
x10 x11 x12 x13 x14 x15 x16 x17
37.370852 NA NA 73.398380 50.239522 13.022176 NA NA
Edit: binding the graphs together
graphs <- df %>%
select(-y) %>%
imap(function(x, name){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
g = ggplot(mapping = aes(x)) +
geom_point(aes(y = df$y)) +
geom_line(aes(y = cc[2]*x + cc[1])) +
geom_hline(yintercept = 0, color = "blue") +
labs(title = name, y = "y", x = "x")
if(!is.na(threshold)) {g = g + geom_vline(xintercept = threshold, color = "blue")}
return(g)})
ggpubr::ggarrange(plotlist = graphs)
Result:
Obs2: i assumed that you don't need the thresholds vector defined in the first attempt, if you still need it, it's easy to add it back to the answer
Obs1: let me know if you want any aesthetic change on the graphs
Edit 2: graph with common axis
To use a common axis is better to use facets instead of ggarrange. In order to do that, we need to first save the fitted data for all variables, then plot, so the ggplot expression goes out of the map. Also, we now save the treshold info.
graphs <- df %>%
select(-y) %>%
imap(function(x, name){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
list(threshold = threshold,
data = tibble(y = df$y, "name" = name, "x" = x, "fitted" = cc[2]*x + cc[1]))})
Now we use the purrr::transpose() function to build a dataset for the data and other for the treshold. This functions does something like:
list(x1 = list(treshold, data), x2 = ...) >>> list(treshold = list(x1, x2, ...), data = list(x1, x2, ...))
df2 = graphs %>%
transpose() %>%
`$`(data) %>%
bind_rows() %>%
mutate(name = factor(name, paste0("x", 1:17)))
thresholds = graphs %>%
transpose() %>%
`$`(threshold) %>%
{tibble(int = as.numeric(.), name = names(.))} #both datasets have the name column, to be used inside `facet_wrap()`
ggplot(df2, aes(x)) +
geom_point(aes(y = y)) +
geom_line(aes(y = fitted)) +
facet_wrap(vars(name), scales = "free_x") +
geom_hline(yintercept = 0, color = "blue") +
geom_vline(aes(xintercept = int), thresholds, color = "blue", linetype = 2) +
geom_label(aes(label = round(int, 2), x = int*1, y = min(df$y)), thresholds, size = 4)
Result:
Obs1: the labels position and size can be easily altered. Another option is using the thresholds as a axis break
Obs2: this method can be slow for large datasets. A more efficient option is to save only threshold and cc inside map, and then building the dataset after it.

Plotting 95% confidence intervals of log-log linear model predicted values

I fit length and weight data to a log-log linear model and created a regression line where the response has been back transformed to the original scale.
Next, I would like to add two lines to the scatterplot representing upper and lower 95% confidence intervals.
I'm no expert in R or stats, but I'm trying to get better! What might be the best way to go about doing this? Any help would be greatly appreciated.
NOTE: the length and weight data used in this example is from the 'ChinookArg' data frame from AFS package.
library(ggplot2)
df <- data.frame(tl = c(120.1, 115, 111.2, 110.2, 110, 109.7, 105, 100.1, 98, 92.1,
99, 97.9, 94.9, 92.2, 94.9, 92.7, 92.9, 89.9, 88.7, 92, 87.7,
85.1, 85.1, 82.9, 82.9, 83.8, 82.2, 81, 78.8, 78.8, 74.9, 68.1,
66.8, 59.9, 113.8, 112.9, 108.1, 109.7, 103.7, 103.2, 99.9, 99,
103, 103, 99.4, 97.9, 97.2, 96.7, 95.1, 92.2, 93, 92.2, 91.2,
88.1, 94.6, 94.3, 92.5, 88.1, 89.8, 88.8, 87.9, 86, 87.4, 68.5,
80.5, 79, 77.6, 72.8, 77.3, 78.8, 74.5, 72.6, 73.3, 74, 75.2,
76.6, 72, 70.6, 71.8, 70.2, 68.2, 67.3, 67.7, 65.9, 66.3, 64.7,
63, 62.7, 64.2, 61.3, 64.2, 60.1, 59.4, 57.7, 57.4, 56.5, 54.1,
54.1, 56, 52, 50.8, 49.3, 43.8, 39.8, 39, 35.4, 36.9, 32.1, 31.9,
29.2, 25.2, 18),
w = c(17.9, 17.2, 16.8, 15.8, 14.3, 13.8, 12.8, 11.7, 12.8, 14.8,
9.7, 7.3, 7.8, 9.1, 11.8, 11.3, 11.9, 11.8, 10.8, 5.9, 5.9, 9,
9.8, 8.7, 7.8, 5.7, 6.7, 8.7, 8.4, 7.9, 6.5, 7.3, 5.2, 3.9, 15,
16, 13.3, 11.3, 10.9, 9.8, 9.9, 10.3, 12.6, 10, 10.2, 8.3, 7.9,
8.9, 9.4, 8.9, 8.1, 8.3, 8.3, 8.3, 6.2, 6.6, 6.6, 8.3, 6.3, 6.3,
6.8, 6.8, 5.5, 5, 6.1, 6.6, 7.2, 6.1, 4.1, 4.8, 4.6, 5, 3.7,
3, 2.5, 3.1, 2.4, 2.5, 3, 3.7, 3.5, 2.9, 2.4, 2.3, 3.5, 3.6,
3.7, 3, 2.5, 2.4, 1.6, 1.4, 2.5, 2.6, 1.9, 1.5, 1.8, 2.8, 3.1,
1.4, 1.8, 1, 0.7, 0.7, 0.7, 0.5, 3, 2.8, 0.3, 0.3, 0.3, 0.1))
model<- lm(log(w)~(log(tl)), data = df)
nmodel<- data.frame(tl = seq(from = min(df$tl), to = max(df$tl), length= 100))
nmodel$predicted<- exp(predict(model, nmodel, type = "response"))
plot <- ggplot()+
geom_line(aes(x = tl, y = predicted), col = "black", data = nmodel)+
geom_point(data = df, aes(x=tl, y=w))+
xlab("Length")+
ylab("Weight")
plot
Just add the interval argument to predict() and specify you want the confidence interval.
nmodel<- data.frame(tl = seq(from = min(df$tl), to = max(df$tl), length= 100))
model_preds <- exp(predict(model, nmodel, type = "response", interval = "confidence"))
nmodel <- cbind(nmodel, model_preds)
plot <- ggplot()+
geom_line(aes(x = tl, y = fit), col = "black", data = nmodel)+
geom_line(aes(x = tl, y = lwr), col = "red", data = nmodel)+
geom_line(aes(x = tl, y = upr), col = "red", data = nmodel)+
geom_point(data = df, aes(x=tl, y=w))+
xlab("Length")+
ylab("Weight")
plot
Note that I removed the predicted column, because when you run the predict() function as shown above, it also provides a fit column, which amounts to the same thing.

Forcing tableGrob() to fit in .tiff file

I am trying to output a tableGrob into a .tiff, but it is a bit long. I've tried to get the area of the table fit into the .tiff file, but it is always cut off unless I drastically change the font size to an unreadable size. Is there a way to force the tableGrob to fin into the .tiff file without any cut off?
table.plot <- structure(list(A = c(2.8, 0.5, 1.3, 5.7, 6.5, 1.1, 3.3, 1, 16.9,
8.6, 6.3, 22.2, 14.8, 1.3, 7.9, 12.4, 31, 9.9, 13.2, 26.2, 2),
B = c(13.7, 10.6, 12.7, 20.6, 13.2, 11.2, 14.7, 11.7, 22.3,
12.9, 12.9, 19.5, 20.6, 11.1, 17, 20.3, 43.1, 18.2, 20.9,
26.7, 10.1), C = c(0.4, 0, 0.3, 1, 0.3, 0.1, 0.5, 0.2, 1.2,
0.3, 0.3, 0.9, 1, 0.1, 0.7, 1, 3.3, 0.8, 1.1, 1.6, 0), D = c(29.7,
18, 23.9, 46.2, 33.1, 20.1, 32.6, 21.1, 93.1, 39.9, 33, 116.8,
76.1, 21.4, 49, 66.7, 166.5, 53.7, 69.1, 127.4, 21)), row.names = c("G01",
"G02", "G03", "G04", "G05", "G06", "G07", "G08", "G09", "G10",
"G11", "G12", "H01", "H02", "H03", "H04", "H05", "H06", "H07",
"H08", "Host.1"), class = "data.frame")
tt1 <- ttheme_minimal(
core = list(fg_params = list(fontsize = 8),
padding = unit(c(0.1, 0.1), "mm")),
colhead = list(bg_params = list(fill = "white"),
fg_params = list(fontsize = 8, fontface = "bold")),
rowhead = list(fg_params = list(fontsize = 8, fontface = "bold"))
)
g <- tableGrob(table.plot, theme = tt1)
g <- gtable_add_grob(g,
grobs = rectGrob(gp = gpar(fill = NA, lwd = 2)),
t = 2, b = nrow(g), l = 1, r = ncol(g))
g <- gtable_add_grob(g,
grobs = rectGrob(gp = gpar(fill = NA, lwd = 2)),
t = 1, l = 1, r = ncol(g))
save_plot("table is cut off.tiff", g, dpi = 300)
Assuming that save_plot comes from the cowplot package (there is more than one package with a function called save_plot), you can specify the height and aspect ratio like this:
save_plot("table isnt cut off.tiff", g, dpi = 300, base_height = 5.5, base_asp = 0.4)

Why does the fill order reverse for negative values in stacked bar charts? (R, ggplot2)

I am trying to plot the performance of individuals on a number of tasks. The performance is rated in categories, and I would like to show each individual's overall performance as a stacked bar chart where Y represents the percentage of answers in each performance category, with positive values for good and negative values for bad performance or missing answers. Here's a toy dataset and the current plot I've managed to produce:
df <- data.frame(SRC = rep(LETTERS[1:14],each=6),
CAT = rep(c("Excellent","VeryGood","Good","Poor","Failing","Missing"),times=14),
PERCENT = c(29.3, 23.3, 30, -13.3, -4, 0, 16.7, 15.3, 38.7, -14.7, -4.7,
-10, 12, 9.3, 30.7, -19.3, -19.3, -9.3, 2.7, 6.7, 20, -23.3,
-14, -33.3, 16, 23.3, 20.7, -10.7, -9.3, -20, 24.7, 22, 12.7,
-8, -2, -30.7, 14, 15.3, 23.3, -4, -4.7, -38.7, 4.7, 6, 60, -24,
-4.7, -0.7, 8, 13.3, 57.3, -16, -3.3, -2, 8, 11.3, 62, -12.7,
-5.3, -0.7, 9.3, 14.7, 64.7, -10, -1.3, 0, 20.1, 20.9, 32.5,
-1.5, 0, 0, 14.2, 10.4, 33.2, -6.6, -2.8, 0, 14.7, 18.7, 55.3,
-10.7, 0, -0.7))
df$CAT <- ordered(df$CAT,levels=c("Excellent","VeryGood","Good","Poor","Failing","Missing"))
ggplot(df, aes(x=SRC, y=PERCENT, fill=CAT,group=CAT,group=SRC)) +
geom_bar(position="stack", stat="identity")
This is the figure:
It's almost what I want, except CAT is ordered reversely for negative values and I want the bars to stack according to the factor levels and the fill legend for negative values as well, i.e. Poor>Failing>Missing. This surely came up before, but I couldn't find a solution here or elsewhere. Thanks in advance!
using forecats library:
library(forecats)
df %>%
mutate(CAT=fct_relevel(CAT,"Excellent","VeryGood","Good","Missing","Failing","Poor")) %>%
ggplot( aes(x=SRC, y=PERCENT, fill=CAT)) +
geom_bar(position="stack", stat="identity")

R - Generating a matrix/function from a dataset to use in a contour plot

I am new to contour plots in R and I am trying to create one to show changes in nutrient concentration with depth and salinity.
My dataset currently looks like this (link):
> head(DF)
salinity depth silicon
1 32.9 0.00 3.872717
2 32.9 0.00 3.906963
3 32.9 0.00 3.872717
4 33.4 3.56 3.119292
5 33.5 3.56 3.076484
6 33.0 0.00 3.675799
What I would like is for depth to be on the y-axis, salinity on the x-axis and the silicon concentration to be displayed based on colour.
From what I have read, in order to create a contour plot I need to turn the data I currently have into a matrix (by creating a function?).
Is this something that can be achieved? I'm not sure if I am going about this completely the wrong way, but essentially what I would like is something like this (apologies for image quality):
But with salinity instead of time and silicon concentration instead of temperature.
Thanks,
Kez
Copy-pastable data:
DF <- structure(list(salinity = c(32.9, 32.9, 32.9, 33.4, 33.5, 33,
33, 33.2, 33.3, 33.1, 33.1, 33.1, 33.7, 33.7, 34, 34, 34, 33.6,
34.3, 34.3, 34.8, 35.8, 34.7, 34.4, 34.3, 34.5, 34.4, 34.9, 34.9,
34.9, 34.8, 35, 35, 36, 34.9, 35, 35.2, 35.1, 30.2, 33.4, 34.5,
34.9, 33.4, 33.4, 35.1, 35.1, 34.6, 35.1, 34.43, 34.67, 34.67,
34.96, 34.76, 35.11, 34.14, 34.97, 25.13, 35.16, 35.11, 35.11,
35.11, 35.15), depth = c(0, 0, 0, 3.56, 3.56, 0, 0, 4.493, 4.493,
0, 0, 0, 4.362, 4.362, 9.9, 9.9, 0, 0, 5.826, 5.826, 11.725,
11.725, 11.725, 0, 0, 2.766, 2.766, 9.355, 9.355, 0, 0, 12.46,
12.46, 12.46, 0, 0, 12.427, 12.427, 1.2, 3.6, 6.2, 11, 1.1, 1.1,
4.2, 12.8, 6.9, 10.4, 1.16, 4.5, 4.5, 15.35, 1.13, 8.25, 17.92,
1.05, 14.25, 20.54, 0.97, 0.97, 7.67, 19.6), silicon = c(3.872716895,
3.90696347, 3.872716895, 3.119292237, 3.076484018, 3.675799087, 3.855593607,
3.547374429, 3.299086758, 4.591894977, 4.566210046, 4.857305936, 2.759703196,
2.5456621, 2.597031963, 2.126141553, 2.417237443, 2.331621005, 1.989155251,
1.835045662, 1.946347032, 1.937785388, 1.526826484, 1.638127854, 1.929223744,
1.698059361, 1.894977169, 1.312785388, 1.698059361, 1.329908676, 1.484018265,
1.621004566, 1.175799087, 1.167237443, 1.218607306, 1.038812785, 1.552511416,
1.141552511, 5.329861111, 1.684027778, 2.612847222, 1.840277778, 1.588541667,
1.553819444, 2.682291667, 1.692708333, 1.111111111, 1.935763889, 0.815972222,
1.197916667, 1.197916667, 1.796875, 1.258680556, 1.059027778, 1.25, 0.512152778,
1.336805556, 1.284722222, 0.998263889, 0.928819444, 0.399305556, 1.814236111
)), .Names = c("salinity", "depth", "silicon"), class = "data.frame", row.names = c(NA,
-62L))
EDIT: For anyone interested, with the help of Frank's post below I was able to create the following with my full data set:
You can use the interp function from the akima package to interpolate. Otherwise, you have to determine how to deal with areas that have missing data.
library(akima)
s <- interp(DF$salinity, DF$depth, DF$silicon, duplicate="mean",
xo=seq(min(DF$salinity), max(DF$salinity), length=50),
yo=seq(min(DF$depth), max(DF$depth), length=50))
# you can choose values other than length = 50.
# Note that I used duplicate = "mean", but you can pick your own way of handling duplicates
Then, there are a number of options for plotting, each with lots of room for customization. Here are a few choices:
filled.contour(s, color = terrain.colors)
image(s, col=rainbow(60))
library(fields); image.plot(s)
library(ggplot2)
ggs <- data.frame(salinity = rep(s$x, each=length(s$x)), depth = s$y, silicon = as.vector(t(s$z)))
p <- ggplot(ggs, aes(salinity, depth, fill=silicon))
p + geom_raster() + scale_fill_continuous(low="green", high="red") + theme_bw()

Resources