The Problem
I'm trying to find a solution to overcome a deficient experimental design in establishing sampling points. The aim is to subset the original dataset, forcing sampling points stratification based on 2 factors with several levels.
I need a general formulation of the problem that may allow me to redefine the set of criteria levels.
Note
I've found examples of subseting tables based on criteria, the most relevant is a post from Brian Diggs but I cannot find a general way to apply that solution to my particular case.
The Dataset
My data.frame have 3 columns, sample id and two factors (f1 and f2).
Criteria are based on interval of values for f1 and f2.
dat <- structure(list(id = 1:203, f1 = c(22, 20.8, 20.7, 22, 12.1, 8,
20.6, 22, 22, 21.6, 0, 22, 21.4, 15.9, 21.2, 19.1, 12.5, 16.6,
14, 21.2, 14.7, 20.7, 20.5, 5.4, 19.1, 18.9, 22, 22, 22, 0, 0,
22, 1.3, 1, 0, 9.4, 7.9, 14.5, 0, 1.5, 0, 20.3, 18, 17.3, 1,
22, 0, 15, 17.9, 4.3, 19.5, 21.2, 21.2, 14.6, 2.3, 0, 6.7, 17.9,
9.5, 19, 21.6, 16.6, 11.7, 13.7, 1.5, 1, 7.6, 3.7, 18.5, 13.5,
20.9, 18.2, 11.5, 7.3, 6.5, 21.1, 22, 20.5, 20.5, 20, 16.2, 18.6,
22, 15.1, 14.4, 10.8, 17.1, 5.7, 15.1, 12.8, 14.5, 8.8, 16.8,
18.7, 1, 6.3, 1.8, 14.6, 22, 16.2, 12.9, 9.1, 2, 7.6, 7, 11.7,
1, 1, 9.6, 11, 2, 2, 14, 14.9, 7.8, 11.4, 8.3, 7.6, 9.1, 4.5,
18, 11.4, 3.1, 4.3, 9.3, 8.1, 1.4, 5.2, 14.7, 3.6, 5, 2.7, 10.3,
11.3, 17.9, 5.2, 1, 1.5, 13.2, 0, 1, 7.4, 1.7, 11.5, 20.2, 0,
14.7, 17, 15.2, 22, 22, 22, 17.2, 15.3, 10.9, 18.7, 11.2, 18.5,
20.3, 21, 20.8, 15, 21, 16.9, 18.5, 18.5, 10.3, 12.6, 15, 19.8,
21, 17.2, 16.3, 18.3, 10.3, 17.8, 11.2, 1.5, 1, 0, 1, 14, 19.1,
6.1, 19.2, 17.1, 14.5, 18.4, 22, 20.3, 6, 13, 18.3, 8.5, 15.3,
10.6, 7.2, 6.2, 1, 7.9, 2, 20, 16.3), f2 = c(100, 100, 92.9,
38.5, 100, 90.9, 100, 100, 100, 91.7, 0, 100, 71.4, 100, 100,
53.8, 28.6, 91.7, 100, 100, 64.3, 100, 92.9, 78.6, 100, 100,
27.3, 83.3, 14.3, 0, 0, 9.1, 23.1, 12.5, 0, 100, 81.8, 100, 0,
15.4, 0, 83.3, 100, 75, 7.1, 81.8, 0, 21.4, 84.6, 25, 80, 90.9,
100, 71.4, 50, 0, 46.2, 90.9, 14.3, 66.7, 90.9, 84.6, 46.2, 91.7,
33.3, 7.7, 71.4, 27.3, 46.2, 100, 100, 100, 60, 54.5, 46.2, 53.8,
91.7, 100, 100, 66.7, 45.5, 57.1, 15.4, 75, 75, 76.9, 53.8, 25,
90.9, 84.6, 91.7, 90.9, 100, 54.5, 23.1, 63.6, 30.8, 90.9, 92.9,
100, 92.3, 90.9, 12.5, 38.5, 15.4, 84.6, 27.3, 7.1, 75, 21.4,
7.7, 15.4, 84.6, 100, 69.2, 63.6, 64.3, 53.8, 92.3, 33.3, 11.1,
61.5, 66.7, 23.1, 85.7, 81.8, 41.7, 69.2, 76.9, 38.5, 9.1, 23.1,
85.7, 90, 100, 100, 14.3, 36.4, 84.6, 0, 7.7, 61.5, 25, 50, 100,
0, 63.6, 36.4, 76.9, 100, 100, 100, 100, 90.9, 100, 100, 100,
100, 100, 83.3, 100, 100, 100, 100, 50, 54.5, 71.4, 100, 85.7,
100, 75, 100, 76.9, 83.3, 100, 92.3, 33.3, 76.9, 33.3, 0, 40,
91.7, 100, 53.8, 100, 100, 100, 100, 100, 92.3, 76.9, 23.1, 84.6,
33.3, 100, 92.3, 46.2, 100, 9.1, 53.8, 7.7, 20, 42.9)), .Names = c("id",
"f1", "f2"), class = "data.frame", row.names = c(NA, -203L))
The expected output
Sampling points should ideally be grouped following a crossed design (it is not a complete factorial design).
For Factor f1: 0, 1-15, 30-60, 80-95, 100
For Factor f2: 0, 5-10, 15-20
I need to find points given all combinations of f1 and f2 intervals, something like this fashion:
gr <- expand.grid(f1=c('0', '1-15', '30-60', '80-95', '100'),
f2=c('0', '5-10', '15-20'))
> gr
f1 f2
1 0 0
2 1-15 0
3 30-60 0
4 80-95 0
5 100 0
6 0 5-10
7 1-15 5-10
8 30-60 5-10
9 80-95 5-10
10 100 5-10
11 0 15-20
12 1-15 15-20
13 30-60 15-20
14 80-95 15-20
15 100 15-20
The solution should split dat based on lines of gr.
This is not a complete factorial design since not all combinations will fulfill this particular criteria combination but it is important to identify NA's as well.
Any help will be appreciated. Please let me know if I'm providing sufficient information.
Use cut, to split f1 and f2 into factors based on your breakpoints, paste the factor together, and then split based on the combined factor.
dat$f1.group<-cut(dat$f1,c(0,1,15,30,60,80,90,95,100))
dat$f2.group<-cut(dat$f1,c(0,5,10,15,20))
gr<-expand.grid(levels(dat$f1.group),levels(dat$f2.group))
names(gr)<-c('f1.group','f2.group')
gr$combined = paste(gr$f1.group,gr$f2.group)
dat<-merge(gr,dat)[c('id','f1','f2','combined')]
split(dat,dat$combined)
That will get you a list of data.frame, with one element for each combo defined in gr. You can them easily sample by these strata.
Related
I am trying to calculate the value of x where y = 0. I could able to do it for single x using the following code
lm.model <- lm(y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
plot(x, y)
abline(coef(lm.model))
abline(h=0, col="blue")
(threshold <- uniroot(f, interval = c(0, 100))$root)
abline(v=threshold, col="blue")
x = c(33.05, 14.22, 15.35, 13.52, 8.7, 13.73, 8.28, 21.02, 9.97,
11.98, 12.87, 5.05, 11.23, 11.65, 10.05, 12.58, 13.88, 9.66,
4.62, 4.56, 5.35, 3.7, 3.29, 4.87, 3.75, 6.55, 4.51, 7.77, 4.7,
4.18, 25.14, 18.08, 10.41)
y = c(16.22699279, 15.78620732, 9.656361014, -17.32805679, -20.85685895,
7.601993251, -4.776053714, 10.50972236, 3.853479771, 7.713563136,
8.579366561, 14.16989395, 7.484692081, -1.2807472, -12.13759458,
-0.29138513, -5.238157067, -2.033194068, -38.12157566, -33.61912493,
-9.763657548, -0.240863712, 9.090638907, 7.345492608, 6.949676888,
-19.94866471, 0.995659732, -1.162616185, 5.497998429, 1.656653092,
2.116687436, 22.23175649, 10.33039543)
But I have multiple x variables. Now how can I apply it for multiple x variables at a time?
Here is an example data
df = structure(list(y = c(16.2269927925813, 15.7862073196372, 9.65636101412767,
-17.3280567922775, -20.8568589521297, 7.6019932507973, -4.77605371404423,
10.5097223644541, 3.85347977129367, 7.71356313645697, 8.57936656085966,
14.1698939499927, 7.4846920807874, -1.28074719969249, -12.1375945758837,
-0.291385130176774, -5.23815706681139, -2.03319406769161, -38.1215756639013,
-33.6191249261727, -9.76365754821171, -0.240863712421707, 9.09063890677045,
7.34549260800693, 6.94967688778232, -19.9486647079697, 0.995659731521127,
-1.16261618452931, 5.49799842947493, 1.65665309209479, 2.11668743610013,
22.2317564898722, 10.3303954315884), x1 = c(8.56, 8.66, 9.09,
8.36, 8.3, 8.63, 8.78, 8.44, 8.34, 8.46, 8.33, 8.19, 8.58, 8.65,
8.75, 8.34, 8.77, 9.06, 9.31, 9.11, 9.26, 9.81, 9.68, 9.79, 9.26,
9.53, 8.89, 8.89, 10.37, 9.58, 10.27, 10.16, 10.27), x2 = c(164,
328.3, 0, 590.2, 406.6, 188.4, 423.8, 355.3, 337.6, 0, 0, 200.1,
0, 315.8, 547.5, 225.6, 655.7, 387.2, 0, 487.4, 400.4, 0, 234.9,
275.5, 0, 0, 613.2, 207.4, 184.4, 162.8, 220, 174.8, 0), x3 = c(4517.7,
2953.4, 2899.3, 2573.8, 3310.7, 3880.3, 3016.8, 3552.3, 2960.1,
323, 2638.5, 3343.1, 3274.7, 3218, 3268.3, 3507.9, 3709.2, 3537.5,
2634.4, 1964.6, 3333.7, 2809.7, 3326.8, 3524.5, 3893.9, 3166.7,
3992.1, 4324.7, 3077.9, 3069.9, 4218.9, 3897.4, 2693.9), x4 = c(14.7,
14.5, 15.5, 17, 16.2, 15.9, 15.7, 15.3, 13.5, 14, 15.4, 16.2,
15.6, 15.7, 15.1, 15.8, 15.3, 14.9, 15.7, 16.3, 15.21000004,
16.7, 15.6, 16.2, 15.7, 16.3, 17.3, 16.9, 15.7, 14.9, 13.81999969,
14.90754509, 12.42847157), x5 = c(28.3, 29.1, 28.3, 29.1, 28.7,
29.3, 28.9, 28.4, 29.3, 29.3, 29.1, 29, 29.9, 29.5, 28.4, 30.3,
29.1, 29.1, 29, 29.5, 29.3, 28.5, 29, 28.7, 29.4, 28.8, 29.2,
30.1, 28.3, 28.7, 24.96999931, 25.79496384, 25.3072052), x6 = c(33.05,
14.22, 15.35, 13.52, 8.7, 13.73, 8.28, 21.02, 9.97, 11.98, 12.87,
5.05, 11.23, 11.65, 10.05, 12.58, 13.88, 9.66, 4.62, 4.56, 5.35,
3.7, 3.29, 4.87, 3.75, 6.55, 4.51, 7.77, 4.7, 4.18, 25.14, 18.08,
10.41), x7 = c(13.8425, 11.1175, 8.95, 13.5375, 5.4025, 13.5625,
13.735, 14.14, 8.0875, 5.565, 12.255, 3.3075, 6.345, 4.8125,
4.0325, 11.475, 10.32, 17.71, 2.3375, 3.92, 5.7, 2.42, 8.3075,
7.4725, 7.7925, 10.8725, 8.005, 11.7475, 13.405, 8.425, 47.155,
26.1, 6.6675), x8 = c(0.95, 3.01, 1.92, 1.51, 2.61, 1.32, 3.55,
1.21, 2.14, 1.1, 1.32, 0.76, 1.34, 5.41, 9.38, 6.55, 4.44, 7.37,
9.84, 12.68, 15.52, 23.01, 18.59, 21.64, 19.69, 25.22, 22.38,
25.03, 37.42, 22.26, 2.1, 3.01, 0.82), x9 = c(26.2, 25.8, 25.8,
25.5, 26, 24.7, 22.9, 25.3, 26.3, 26.1, 22.5, 25.9, 26.4, 25.2,
25.8, 25.4, 25, 23.2, 26.4, 25.8, 26.6, 26.2, 25.8, 26.8, 25,
25.4, 25.6, 26.1, 25.7, 25.8, 24.78000069, 24.98148918, 26.39899826
), x10 = c(35.4, 39, 37.5, 36.4, 37.1, 36.2, 37.3, 36.4, 37.5,
36, 36.6, 35.6, 37.3, 38.3, 37, 37.5, 37.5, 39.6, 37.8, 36.8,
36.6, 38.4, 38.9, 38.4, 38.4, 37.7, 39.1, 37.7, 37.8, 39.4, 36.25,
35.57029343, 35.57416534), x11 = c(653.86191565, 383.1, 457.1,
591.4, 549.2, 475.2, 626.4, 308.8, 652.4, 77, 380.9, 530.5, 393,
712.1, 623.4, 515.7, 706.4, 713.4, 343.7, 559.5, 630.1, 292.3,
578.6, 628.88904574, 480.96959685, 591.35600287, 804.8, 419.6,
403.7, 361.2, 515.07101438, 434.66682808, 299.9531298), x12 = c(163.9793854,
167.9, 135, 215.8, 213, 188.4, 260.6, 191.8, 337.6, 55, 147.6,
200.1, 140.7, 315.8, 189.6, 225.6, 469.3, 201.8, 140, 297.2,
204.6, 142.5, 234.9, 275.494751, 153.7796173, 147.6174622, 433.6,
207.4, 184.4, 162.8, 219.9721832, 174.8355713, 106.8163605),
x13 = c(92, 67, 67, 50, 70, 87, 68, 86, 70, 11, 66, 79, 70,
61, 75, 78, 78, 77, 69, 35, 72, 76, 69, 84, 93, 73, 81, 99,
80, 76, 101, 86, 80), x14 = c(70, 42, 46, 34, 55, 60, 51,
65, 49, 1, 40, 56, 54, 41, 48, 57, 46, 50, 41, 22, 47, 47,
49, 57, 70, 52, 56, 70, 48, 50, 74, 66, 47), x15 = c(21,
12, 13, 10, 14, 16, 10, 13, 10, 0, 9, 14, 16, 20, 14, 14,
13, 15, 10, 7, 17, 8, 14, 14, 14, 11, 17, 19, 12, 11, 17,
17, 9), x16 = c(1076.8, 783.7, 711.8, 1041.9, 957.4, 939.3,
662.9, 768.1, 770.3, 0, 399.2, 606.2, 724.1, 960.8, 943.8,
737.8, 1477.4, 1191.7, 371.3, 956.4, 1251.7, 345.7, 1210.7,
845, 598.1, 821.7, 1310.6, 940.1, 581, 520, 313.5, 606.8,
201.2), x17 = c(163.9793854, 167.9, 128.4, 215.8, 213, 188.4,
260.6, 191.8, 337.6, 55, 147.6, 200.1, 140.7, 315.8, 189.6,
225.6, 469.3, 201.8, 140, 297.2, 204.6, 142.5, 234.9, 157.7472534,
153.7796173, 147.6174622, 133.1873627, 150.2, 184.4, 162.8,
219.9721832, 174.8355713, 106.8163605)), row.names = c(NA,
33L), class = "data.frame")
You can use purrr::map to loop through every x.
library(dplyr)
library(purrr)
thresholds <- df %>%
select(-y) %>%
map_dbl(function(x){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
plot(x, df$y)
abline(coef(lm.model))
abline(h=0, col="blue")
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
abline(v=threshold, col="blue")
return(threshold)})
For some x's, uniroot(f, interval = c(0, 100))$root yields an error: Error
in uniroot(f, interval = c(0, 100)) : f() values at end points not of opposite sign
So the tryCatch is used to return NA for the threshold associated with that x, instead of breaking the code.
Result:
> thresholds
x1 x2 x3 x4 x5 x6 x7 x8 x9
9.023314 NA NA 15.459841 28.727293 10.514728 10.493577 9.669244 25.522480
x10 x11 x12 x13 x14 x15 x16 x17
37.370852 NA NA 73.398380 50.239522 13.022176 NA NA
Edit: binding the graphs together
graphs <- df %>%
select(-y) %>%
imap(function(x, name){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
g = ggplot(mapping = aes(x)) +
geom_point(aes(y = df$y)) +
geom_line(aes(y = cc[2]*x + cc[1])) +
geom_hline(yintercept = 0, color = "blue") +
labs(title = name, y = "y", x = "x")
if(!is.na(threshold)) {g = g + geom_vline(xintercept = threshold, color = "blue")}
return(g)})
ggpubr::ggarrange(plotlist = graphs)
Result:
Obs2: i assumed that you don't need the thresholds vector defined in the first attempt, if you still need it, it's easy to add it back to the answer
Obs1: let me know if you want any aesthetic change on the graphs
Edit 2: graph with common axis
To use a common axis is better to use facets instead of ggarrange. In order to do that, we need to first save the fitted data for all variables, then plot, so the ggplot expression goes out of the map. Also, we now save the treshold info.
graphs <- df %>%
select(-y) %>%
imap(function(x, name){
lm.model <- lm(df$y ~ x)
cc <- coef(lm.model)
f <- function(x) cc[2]*x + cc[1]
threshold <- tryCatch(uniroot(f, interval = c(0, 100))$root, error = function(cond){NA})
list(threshold = threshold,
data = tibble(y = df$y, "name" = name, "x" = x, "fitted" = cc[2]*x + cc[1]))})
Now we use the purrr::transpose() function to build a dataset for the data and other for the treshold. This functions does something like:
list(x1 = list(treshold, data), x2 = ...) >>> list(treshold = list(x1, x2, ...), data = list(x1, x2, ...))
df2 = graphs %>%
transpose() %>%
`$`(data) %>%
bind_rows() %>%
mutate(name = factor(name, paste0("x", 1:17)))
thresholds = graphs %>%
transpose() %>%
`$`(threshold) %>%
{tibble(int = as.numeric(.), name = names(.))} #both datasets have the name column, to be used inside `facet_wrap()`
ggplot(df2, aes(x)) +
geom_point(aes(y = y)) +
geom_line(aes(y = fitted)) +
facet_wrap(vars(name), scales = "free_x") +
geom_hline(yintercept = 0, color = "blue") +
geom_vline(aes(xintercept = int), thresholds, color = "blue", linetype = 2) +
geom_label(aes(label = round(int, 2), x = int*1, y = min(df$y)), thresholds, size = 4)
Result:
Obs1: the labels position and size can be easily altered. Another option is using the thresholds as a axis break
Obs2: this method can be slow for large datasets. A more efficient option is to save only threshold and cc inside map, and then building the dataset after it.
I fit length and weight data to a log-log linear model and created a regression line where the response has been back transformed to the original scale.
Next, I would like to add two lines to the scatterplot representing upper and lower 95% confidence intervals.
I'm no expert in R or stats, but I'm trying to get better! What might be the best way to go about doing this? Any help would be greatly appreciated.
NOTE: the length and weight data used in this example is from the 'ChinookArg' data frame from AFS package.
library(ggplot2)
df <- data.frame(tl = c(120.1, 115, 111.2, 110.2, 110, 109.7, 105, 100.1, 98, 92.1,
99, 97.9, 94.9, 92.2, 94.9, 92.7, 92.9, 89.9, 88.7, 92, 87.7,
85.1, 85.1, 82.9, 82.9, 83.8, 82.2, 81, 78.8, 78.8, 74.9, 68.1,
66.8, 59.9, 113.8, 112.9, 108.1, 109.7, 103.7, 103.2, 99.9, 99,
103, 103, 99.4, 97.9, 97.2, 96.7, 95.1, 92.2, 93, 92.2, 91.2,
88.1, 94.6, 94.3, 92.5, 88.1, 89.8, 88.8, 87.9, 86, 87.4, 68.5,
80.5, 79, 77.6, 72.8, 77.3, 78.8, 74.5, 72.6, 73.3, 74, 75.2,
76.6, 72, 70.6, 71.8, 70.2, 68.2, 67.3, 67.7, 65.9, 66.3, 64.7,
63, 62.7, 64.2, 61.3, 64.2, 60.1, 59.4, 57.7, 57.4, 56.5, 54.1,
54.1, 56, 52, 50.8, 49.3, 43.8, 39.8, 39, 35.4, 36.9, 32.1, 31.9,
29.2, 25.2, 18),
w = c(17.9, 17.2, 16.8, 15.8, 14.3, 13.8, 12.8, 11.7, 12.8, 14.8,
9.7, 7.3, 7.8, 9.1, 11.8, 11.3, 11.9, 11.8, 10.8, 5.9, 5.9, 9,
9.8, 8.7, 7.8, 5.7, 6.7, 8.7, 8.4, 7.9, 6.5, 7.3, 5.2, 3.9, 15,
16, 13.3, 11.3, 10.9, 9.8, 9.9, 10.3, 12.6, 10, 10.2, 8.3, 7.9,
8.9, 9.4, 8.9, 8.1, 8.3, 8.3, 8.3, 6.2, 6.6, 6.6, 8.3, 6.3, 6.3,
6.8, 6.8, 5.5, 5, 6.1, 6.6, 7.2, 6.1, 4.1, 4.8, 4.6, 5, 3.7,
3, 2.5, 3.1, 2.4, 2.5, 3, 3.7, 3.5, 2.9, 2.4, 2.3, 3.5, 3.6,
3.7, 3, 2.5, 2.4, 1.6, 1.4, 2.5, 2.6, 1.9, 1.5, 1.8, 2.8, 3.1,
1.4, 1.8, 1, 0.7, 0.7, 0.7, 0.5, 3, 2.8, 0.3, 0.3, 0.3, 0.1))
model<- lm(log(w)~(log(tl)), data = df)
nmodel<- data.frame(tl = seq(from = min(df$tl), to = max(df$tl), length= 100))
nmodel$predicted<- exp(predict(model, nmodel, type = "response"))
plot <- ggplot()+
geom_line(aes(x = tl, y = predicted), col = "black", data = nmodel)+
geom_point(data = df, aes(x=tl, y=w))+
xlab("Length")+
ylab("Weight")
plot
Just add the interval argument to predict() and specify you want the confidence interval.
nmodel<- data.frame(tl = seq(from = min(df$tl), to = max(df$tl), length= 100))
model_preds <- exp(predict(model, nmodel, type = "response", interval = "confidence"))
nmodel <- cbind(nmodel, model_preds)
plot <- ggplot()+
geom_line(aes(x = tl, y = fit), col = "black", data = nmodel)+
geom_line(aes(x = tl, y = lwr), col = "red", data = nmodel)+
geom_line(aes(x = tl, y = upr), col = "red", data = nmodel)+
geom_point(data = df, aes(x=tl, y=w))+
xlab("Length")+
ylab("Weight")
plot
Note that I removed the predicted column, because when you run the predict() function as shown above, it also provides a fit column, which amounts to the same thing.
I have a problem with comparing two sets of curves by using the Kolmogorow-Smirnow-test.
What I would like the program to do, is to compare each variation of Curve 1 with each variation of Curve 2. To accomplish that, I have tried to build a for-loop that iterates through Curve 1, and within that loop another loop that iterates through Curve 2.
Unfortunately, when executing the code, I get an error message about
"not enough x-Data“
When I try running the test by comparing one variation of each curve manually, it works, so I think the problem is the combination of the two loops and the KS-test.
If anyone has experienced a similar error and was able to solve the issue, I would highly appreciate any advice on how to fix it. Thank you!
Example data.frames:
Kurve1 <- structure(list(Punkte = 1:21,
Trial.1 = c(105.5, 85.3, 63.1, 54.9, 42, 34.1, 30.7,
24.2, 20.1, 15.7, 14, 11, 9.3, 7.2, 6.6,
5.3, 4.2, 3.3, 2.6, 1.8, 0.9),
Trial.2 = c(103.8, 85.2, 64.3, 54.1, 41.8, 35.9, 29,
23.7, 20.2, 15.9, 13.5, 11, 9.3, 7.3, 6.4,
5.5, 4.3, 3.4, 2.5, 1.9, 0.9),
Trial.3 = c(104.8, 87.2, 64.9, 52.8, 40.8, 35.6, 29.1,
24.5, 20.4, 16.2, 13.7, 11.2, 9.2, 7.5,
6.4, 5.5, 4.2, 3.5, 2.5, 1.8, 0.9),
Trial.4 = c(106.9, 83.9, 67.1, 55.1, 44.1, 34.1, 29.3,
22.9, 19.4, 16.7, 13.6, 10.8, 9.4, 7.4,
6.1, 5.6, 4.4, 3.5, 2.4, 1.9, 0.9),
Trial.5 = c(104.8, 84.3, 68.7, 54.8, 45.3, 35.2, 28.9,
23.1, 20.1, 16.9, 13.3, 11, 9.6, 7.1, 6.3,
5.4, 4.5, 3.4, 2.3, 2, 0.9)),
class = "data.frame", row.names = c(NA, -21L))
Kurve2 <- structure(list(Punkte = 1:21,
Trial.1 = c(103.5, 81.2, 66.2, 54.5, 45.1, 39.1, 30.9,
27, 21.9, 19.3, 16.6, 14.9, 12.9, 11, 10.1,
9.2, 8, 7.1, 6.3, 6.2, 5),
Trial.2 = c(104, 81, 66.9, 55.2, 46, 38.7, 31.2, 27.3,
22.3, 20, 17.2, 15.2, 12.9, 11.1, 10.2,
9.1, 8, 7.1, 6.4, 5.9, 5),
Trial.3 = c(103.9, 81.9, 67.2, 53.8, 45.4, 38.5, 31.5,
26.8, 22.2, 19.8, 17.4, 15.1, 13, 10.9,
10.1, 9.2, 8.1, 7.1, 6.4, 6, 4.9),
Trial.4 = c(104.2, 84.1, 68.7, 55.4, 45.1, 36.3, 32,
26.9, 22.8, 19.8, 16.8, 14.8, 13.2, 10.9,
10.3, 9.1, 8.2, 7.2, 6.3, 6.1, 5),
Trial.5 = c(103.8, 83.2, 69.2, 55.7, 44.8, 36.4, 31.4,
26.7, 22.1, 18.9, 16.9, 14.4, 13, 11.1,
10.2, 9, 7.9, 7, 6.3, 6.1, 5.1)),
class = "data.frame", row.names = c(NA, -21L))
The code I used for the loop:
for(i in 1:ncol(Kurve1)){
for(j in 1:ncol(Kurve2)){
ks.test(Kurve1$Trial.[i], Kurve2$Trial.[j], alternative = "greater")
}
}
This will work:
for(i in 1:(ncol(Kurve1) - 2)){
for(j in (i + 1):(ncol(Kurve2) - 1)){
print(paste0("Trial.", i, " - Trial.", j))
ks_result <- ks.test(Kurve1[, paste0("Trial.", i)],
Kurve2[, paste0("Trial.", j)],
alternative="greater")
print(ks_result)
}
}
Explanation:
As it is doesn't make sense to run the KS test for the same column, and also doesn't make sense to run for both Trial.1 ~ Trial.2 and Trial.2 ~ Trial.1, etc., you have to run your outer for loop from 1 to the last but one ((ncol(Kurve1) - 2)) index for Trial.* columns, and you have to run your inner for loop from the next index as the outer loop has (i + 1) to the last index ((ncol(Kurve2) - 1)) for Trial.* columns.
You can not paste strings like Trial.[i], you have to use the paste function for that. As with that the Kurve1$paste0("Trial.", i) notation not working, you have to use the extract operator [ to get the column you need (Kurve1[, paste0("Trial.", i)])
As in a (nested) for loop the ks.test runs silently, a have added a print to be able to see the results. I have also added a line print(paste0("Trial.", i, " - Trial.", j)) to tag the actual result with the columns for which it belongs.
This question already has answers here:
Pasting two vectors with combinations of all vectors' elements
(8 answers)
Closed 3 years ago.
I'd like to create a vector/list containing a series of variables that are the result of the combination of two vectors containing (i) specific variable names and (ii) specific variable ID (same for all the variables).
Here are reported a short version of the two vectors:
the variable names:
names<-c("XPTS", "TROCK", "JFSG")
and the variable IDs:
values<-c(1, 1.1, 1.2, 1.3, 2, 2.1, 2.2, 2.3, 3, 3.1, 3.2, 3.3, 4, 4.1, 4.2, 4.3, 5, 5.1, 5.2, 5.3, 6, 6.1, 6.2, 6.3, 7, 7.1, 7.2, 7.3, 8, 8.1, 8.2, 8.3, 9, 9.1, 9.2, 9.3, 10, 10.1, 10.2, 10.3, 11, 11.1, 11.2, 11.3, 12, 12.1, 12.2, 12.3, 13, 13.1, 13.2, 13.3, 14, 14.1, 14.2, 14.3, 15, 15.1, 15.2, 15.3, 16, 16.1, 16.2, 16.3, 17, 17.1, 17.2, 17.3, 18, 18.1, 18.2, 18.3, 19, 19.1, 19.2, 19.3, 20, 20.1, 20.2, 20.3, 21, 21.1, 21.2, 21.3, 22, 22.1, 22.2, 22.3, 23, 23.1, 23.2, 23.3, 24, 24.1, 24.2, 24.3, 25, 25.1, 25.2, 25.3, 26, 26.1, 26.2, 26.3, 27, 27.1, 27.2, 27.3, 28, 28.1, 28.2, 28.3, 29, 29.1, 29.2, 29.3, 30, 30.1, 30.2, 30.3, 31, 31.1, 31.2, 31.3, 32, 32.1, 32.2, 32.3, 33, 33.1, 33.2, 33.3, 34, 34.1, 34.2, 34.3, 35, 35.1, 35.2, 35.3, 36, 36.1, 36.2, 36.3, 37, 37.1, 37.2, 37.3, 38, 38.1, 38.2, 38.3, 39, 39.1, 39.2, 39.3, 40, 40.1, 40.2, 40.3, 41, 41.1, 41.2, 41.3, 42, 42.1, 42.2, 42.3, 43, 43.1, 43.2, 43.3, 44, 44.1, 44.2, 44.3, 45, 45.1, 45.2, 45.3, 46, 46.1, 46.2, 46.3, 47, 47.1, 47.2, 47.3, 48, 48.1, 48.2, 48.3, 49, 49.1, 49.2, 49.3, 50)
I'd live to obtain a list of variable names as follows:
"XPTS_1","XPTS_1.1","XPTS_1.2", ..., "XPTS_49.3","XPTS_50","TROCK_1","TROCK_1.1",...,"TROCK_49.3","TROCK_50","JFSG_1","JFSG_1.1",...,"JFSG_49.3","JFSG_50"
The variable names are not only those reported but might change, so I'd like to have a dynamic loop for dealing with it. The one I wrote, as follows, doesn't fit my purpose:
variables_ID<-for (i in 1:length(values)) {
paste(names, values[i], sep = "_")
}
since I get only
"XPTS_50" "TROCK_50" "JFSG_50"
We can use outer
out1 <- c(t(outer(names, values, paste, sep="_")))
NOTE: transposed just to show that we get identical results with rep
Or use rep to replicate the 'names' and then paste
out2 <- paste(rep(names, each = length(values)), values, sep="_")
all.equal(out1, out2)
#[1] TRUE
head(out1)
#[1] "XPTS_1" "XPTS_1.1" "XPTS_1.2" "XPTS_1.3" "XPTS_2" "XPTS_2.1"
tail(out1)
#[1] "JFSG_48.3" "JFSG_49" "JFSG_49.1" "JFSG_49.2" "JFSG_49.3" "JFSG_50"
Or using CJ
library(data.table)
CJ(names, values)[, paste(names, values, sep="_")]
Or with tidyverse
library(tidyverse)
crossing(names, values) %>%
unite(names, names, values) %>%
pull(names)
I want to plot 12 different monthly time series data over the year range 1984-2018. I want to get a graph like the following but graphics::plot() always gives me an error - what's the solution?
Code:
datats <- ts(data, start=c(1984,1), end=c(2018,12), frequency=12)
plot(datats, ylab="PPT_MM")
Error:
Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
Plot: image of plot
My data:
data<-structure(list(Year = 1984:2018, X1 = c(24.2, 5.3, 59, 31.7,
93.7, 133.2, 42.8, 29.3, 12, 256.5, 0, 28.5, 22.4, 3.8, 295.4,
16.1, 7, 244.8, 79.5, 31.6, 45.1, 12.2, 14, 61.4, 117.4, 74.2,
51, 9.1, 0, 50.4, 0, 13.3, 45.1, 20.7, 25.7), X2 = c(0, 110.5,
0, 0, 13.8, 4, 23.2, 13.3, 6.1, 84.9, 103.5, 83.3, 56.5, 0, 219.4,
2.2, 0, 0, 7.5, 17.2, 47.9, 19.2, 20.5, 44.8, 7.3, 26.3, 64.1,
71.8, 4.6, 0, 80, 16.8, 65.1, 26.2, 2.8), X3 = c(3.4, 78.7, 59.4,
23.3, 109.3, 51.1, 217.3, 43.5, 5, 60.9, 75.3, 150.1, 73.7, 46,
118, 121, 52.5, 113, 98.9, 115.2, 83.1, 101.7, 30, 20.5, 73,
3.2, 241.9, 209.8, 1.6, 113.1, 207.4, 39, 27.4, 15.2, 91.6),
X4 = c(51.4, 278.3, 192.4, 56.7, 203.7, 193.7, 251.5, 80.5,
192.9, 20.8, 82.4, 49.6, 96.4, 208.5, 123, 113.8, 68.5, 88.9,
120.4, 153.2, 121.5, 165.1, 117, 143.9, 129.3, 145.4, 107.9,
1, 286.4, 209.3, 44, 55.3, 82.4, 40.3, 83.4), X5 = c(0.6,
84.6, 72.7, 39.4, 23.6, 37, 64.9, 57.5, 32, 13.7, 29.8, 33.1,
42.8, 21.2, 162.6, 9.8, 15.6, 15.3, 126.6, 133.8, 59.8, 100.5,
70, 41.7, 4.5, 29.7, 120.9, 37.7, 112.5, 37.7, 14.8, 34.3,
23.2, 35.6, 22.7), X6 = c(0, 0, 5.7, 61.9, 10.1, 0, 5.1,
3.1, 1.1, 6.3, 8.2, 0.9, 19.3, 0.5, 38.7, 5, 6.2, 4.3, 1.4,
0, 0.7, 0.9, 2, 2.7, 0.3, 5.2, 1.7, 0, 36.9, 0.3, 9.7, 26.3,
5.7, 2.2, 2.7), X7 = c(7, 1.4, 0.4, 3.6, 0, 3.8, 0, 1.4,
6.8, 0.5, 3.3, 4.1, 2.2, 1.2, 15.4, 2.4, 0.3, 4.3, 0, 0,
0, 1, 1.9, 26.8, 1.3, 0, 2.7, 3.4, 0, 2.7, 13, 6.1, 4.8,
5, 3.5), X8 = c(6.4, 1.9, 0.3, 11.6, 3, 14.6, 3.4, 8.9, 0,
3.1, 9.9, 3.2, 2.2, 4.3, 2.9, 4.9, 1.8, 2.5, 0.2, 26.3, 0,
3.5, 4.7, 5.2, 0.2, 0, 1.3, 0.7, 11.8, 0.6, 32.4, 4.1, 3.8,
7.2, 2.8), X9 = c(15.7, 0.2, 0, 0, 15.4, 0.3, 0, 3.4, 0.7,
0.6, 4.3, 5.1, 0.7, 0, 1.8, 0, 2.3, 0, 8.8, 21.5, 1, 1, 3,
4.3, 9.1, 1.2, 0.6, 5.9, 0, 2.5, 13.6, 3, 3.4, 6.8, 2.22),
X10 = c(154.4, 56, 2.6, 0.3, 33.5, 96.9, 48.8, 46.5, 31,
26, 110.9, 103.7, 0, 83.2, 3.3, 20.6, 41, 7.3, 21.2, 31.8,
47.6, 10.2, 14.5, 18.3, 23.9, 41.3, 32.1, 50.2, 22.3, 64,
25.4, 17.2, 7.4, 23.6, 87.6), X11 = c(211.2, 75.1, 180.8,
93.5, 120.8, 106, 208.5, 119.9, 141.2, 150.8, 406.3, 46,
187.7, 270.3, 113.9, 257, 189.8, 169, 144.3, 121.1, 161.3,
125, 117.2, 127.9, 122.8, 34.4, 116, 180.3, 119.7, 1024.7,
104.8, 104.5, 51.5, 38.2, 206.8), X12 = c(341.4, 121, 127.2,
12, 180.2, 129.5, 110.2, 156.7, 186.2, 67.3, 143.7, 87.6,
1.5, 177.3, 15.8, 108.6, 98.8, 43.6, 182.4, 24.1, 89.5, 100.5,
95, 82.4, 44.9, 127.1, 59.6, 28.2, 73.2, 919.5, 0, 122.9,
51.4, 17.9, 249.6)), class = "data.frame", row.names = c(NA,
-35L))
Function documentation (https://rdrr.io/r/stats/plot.ts.html) gives us some hints...
Apparently it is not possible to plot more than 10 charts on the same panel. Try for example:
datats <- ts(data[,-1], start=c(1984,1), end=c(2018,12), frequency=12)
plot(datats[,c(1:10)], ylab="PPT_MM")
It works! Then to use this method 10 looks like to be a limit.
You can use alternatively some ggplot:
library(reshape2)
library(dplyr)
library(ggplot2)
melt(data, id.vars = "Year") %>%
ggplot(aes(x = Year, y = value , group = 1)) +
geom_line() + geom_point() +
facet_wrap(. ~ variable)