Kolmogorov-Smirnov test in R - For-loop - r

I have a problem with comparing two sets of curves by using the Kolmogorow-Smirnow-test.
What I would like the program to do, is to compare each variation of Curve 1 with each variation of Curve 2. To accomplish that, I have tried to build a for-loop that iterates through Curve 1, and within that loop another loop that iterates through Curve 2.
Unfortunately, when executing the code, I get an error message about
"not enough x-Data“
When I try running the test by comparing one variation of each curve manually, it works, so I think the problem is the combination of the two loops and the KS-test.
If anyone has experienced a similar error and was able to solve the issue, I would highly appreciate any advice on how to fix it. Thank you!
Example data.frames:
Kurve1 <- structure(list(Punkte = 1:21,
Trial.1 = c(105.5, 85.3, 63.1, 54.9, 42, 34.1, 30.7,
24.2, 20.1, 15.7, 14, 11, 9.3, 7.2, 6.6,
5.3, 4.2, 3.3, 2.6, 1.8, 0.9),
Trial.2 = c(103.8, 85.2, 64.3, 54.1, 41.8, 35.9, 29,
23.7, 20.2, 15.9, 13.5, 11, 9.3, 7.3, 6.4,
5.5, 4.3, 3.4, 2.5, 1.9, 0.9),
Trial.3 = c(104.8, 87.2, 64.9, 52.8, 40.8, 35.6, 29.1,
24.5, 20.4, 16.2, 13.7, 11.2, 9.2, 7.5,
6.4, 5.5, 4.2, 3.5, 2.5, 1.8, 0.9),
Trial.4 = c(106.9, 83.9, 67.1, 55.1, 44.1, 34.1, 29.3,
22.9, 19.4, 16.7, 13.6, 10.8, 9.4, 7.4,
6.1, 5.6, 4.4, 3.5, 2.4, 1.9, 0.9),
Trial.5 = c(104.8, 84.3, 68.7, 54.8, 45.3, 35.2, 28.9,
23.1, 20.1, 16.9, 13.3, 11, 9.6, 7.1, 6.3,
5.4, 4.5, 3.4, 2.3, 2, 0.9)),
class = "data.frame", row.names = c(NA, -21L))
Kurve2 <- structure(list(Punkte = 1:21,
Trial.1 = c(103.5, 81.2, 66.2, 54.5, 45.1, 39.1, 30.9,
27, 21.9, 19.3, 16.6, 14.9, 12.9, 11, 10.1,
9.2, 8, 7.1, 6.3, 6.2, 5),
Trial.2 = c(104, 81, 66.9, 55.2, 46, 38.7, 31.2, 27.3,
22.3, 20, 17.2, 15.2, 12.9, 11.1, 10.2,
9.1, 8, 7.1, 6.4, 5.9, 5),
Trial.3 = c(103.9, 81.9, 67.2, 53.8, 45.4, 38.5, 31.5,
26.8, 22.2, 19.8, 17.4, 15.1, 13, 10.9,
10.1, 9.2, 8.1, 7.1, 6.4, 6, 4.9),
Trial.4 = c(104.2, 84.1, 68.7, 55.4, 45.1, 36.3, 32,
26.9, 22.8, 19.8, 16.8, 14.8, 13.2, 10.9,
10.3, 9.1, 8.2, 7.2, 6.3, 6.1, 5),
Trial.5 = c(103.8, 83.2, 69.2, 55.7, 44.8, 36.4, 31.4,
26.7, 22.1, 18.9, 16.9, 14.4, 13, 11.1,
10.2, 9, 7.9, 7, 6.3, 6.1, 5.1)),
class = "data.frame", row.names = c(NA, -21L))
The code I used for the loop:
for(i in 1:ncol(Kurve1)){
for(j in 1:ncol(Kurve2)){
ks.test(Kurve1$Trial.[i], Kurve2$Trial.[j], alternative = "greater")
}
}

This will work:
for(i in 1:(ncol(Kurve1) - 2)){
for(j in (i + 1):(ncol(Kurve2) - 1)){
print(paste0("Trial.", i, " - Trial.", j))
ks_result <- ks.test(Kurve1[, paste0("Trial.", i)],
Kurve2[, paste0("Trial.", j)],
alternative="greater")
print(ks_result)
}
}
Explanation:
As it is doesn't make sense to run the KS test for the same column, and also doesn't make sense to run for both Trial.1 ~ Trial.2 and Trial.2 ~ Trial.1, etc., you have to run your outer for loop from 1 to the last but one ((ncol(Kurve1) - 2)) index for Trial.* columns, and you have to run your inner for loop from the next index as the outer loop has (i + 1) to the last index ((ncol(Kurve2) - 1)) for Trial.* columns.
You can not paste strings like Trial.[i], you have to use the paste function for that. As with that the Kurve1$paste0("Trial.", i) notation not working, you have to use the extract operator [ to get the column you need (Kurve1[, paste0("Trial.", i)])
As in a (nested) for loop the ks.test runs silently, a have added a print to be able to see the results. I have also added a line print(paste0("Trial.", i, " - Trial.", j)) to tag the actual result with the columns for which it belongs.

Related

Saving outputs as dataframe in loop

Here I have tried to test my data for presence of acf and select a trend test method for daily climate station data ordered into multiple columns. The code intends to work as such:
find monthly average
test the acf
Select trend test method
save result of trend test for each month in excel.
so far i have managed to write a code that does for each month. But, I encountered a problem saving the result as dataframe and export into excel(Error in Result[k, ] <- Outcome : incorrect number of subscripts on matrix). Can any one help me? I have attached a sample data and code I have written.
Data:
structure(list(Date = structure(c(5479, 5480, 5481, 5482, 5483,
5484, 5485, 5486, 5487, 5488, 5489, 5490, 5491, 5492, 5493, 5494,
5495, 5496, 5497, 5498, 5499, 5500, 5501, 5502, 5503, 5504, 5505,
5506, 5507, 5508, 5509, 5510, 5511, 5512, 5513, 5514), class = "Date"),
Adaba = c(6.7, 7.6, 4.9, 6.2, 7.8, 3.1, 4.5, 4.9, 4.2, 5.8,
6.7, 6.1, 5.7, 5.8, 6.4, 5.3, 5.1, 7.6, 7.1, 5.8, 6.7, 6.5,
8.9, 7.6, 7.6, 11.3, 9.5, 11.3, 7.8, 7.6, 6.7, 7.1, 7.6,
7.5, 6.7, 6.5), Bedessa = c(15.1, 14.1, 10.8, 9.9, 10.7,
10.7, 12.4, 13.5, 13, 11.4, 12.9, 13, 13.6, 13, 10.8, 11.9,
13, 10.8, 9.7, 10.8, 9.2, 8.7, 9.2, 10.9, 9.7, 8.8, 12, 10.8,
11.4, 10.3, 10.8, 14.1, 13.5, 13, 14.1, 15.5), Beletu = c(15.3,
14.9, 15.1, 15.7, 15.5, 15.3, 14.8, 15.3, 15.5, 15.2, 14.7,
15.8, 15.9, 14.6, 13.7, 15.2, 15.3, 15.7, 16.2, 15, 15.4,
12.5, 12.6, 12.9, 13.4, 13.2, 11.5, 11.6, 11.7, 12.5, 12.6,
12.6, 12.7, 12, 10.7, 11.8)), row.names = c(NA, 36L), class = "data.frame")`enter code here`
code:
Wabi <- read.csv("Tmin_17.csv",TRUE,",")
# makes the file as data frame
class(Wabi)
# this the package to identify the date type
library(xlsx)
library(lubridate)
library(dplyr)
library(modifiedmk)
# to make new columns of month and year from the original data
Wabi$Date <- as.Date(Wabi$Date, format("%m/%d/%Y"))
# add the package lubridate
Wabi$month <- lubridate::month(Wabi$Date)
Wabi$year <- lubridate::year(Wabi$Date)
# to view the change in the original data
head(Wabi)
N=34
Result <- matrix(nrow = 100,ncol = 2)
# this function is written to sum monthly values
for (k in 1:192){
for(j in 2:17) {
colnum <- colnames(Wabi[j])
Wabi_mon <- Wabi%>%group_by(year,month)%>%summarise_at(.vars = colnum,.funs = mean)
for (i in 1:12)
{
test = acf((Wabi_mon %>% group_by(month) %>% filter(month == i))[3],lag.max = 1)
Trendtest1 <- as.data.frame(mmky(as.data.frame((Wabi_mon %>% group_by(month) %>% filter(month == i))[3])[,]))
Trendtest2 <- as.data.frame(mkttest(as.data.frame((Wabi_mon %>% group_by(month) %>% filter(month == i))[3])[,]))
if (abs(test$acf[2])>abs(((-1-1.96*(N-1)^0.5))/N-1))
Outcome <- Trendtest1
else
Outcome <- Trendtest2
Result[k,] <- Outcome
}
}
}
Result <- data.frame(Result)
class(Result)
write.xlsx(Result,file = "tmin_trend.xlsx",sheetName = "Sheet1")

Imputing based on percentage of NA values

I want to impute temperature values from 6 different weather stations. The data are measured every 30 minutes. I want to impute the values only when there are more than 20 % NA values in a day and month. So I am grouping the values per date/month, calculate the mean of NAs per date/month and then I want to filter to keep the days/months which have less than 20 % NA in order to impute on the rest. What is the best way to do that? I have problems coding the filter, because I am not really sure if it filters the way I want it. Also what is the best method to impute the missing values later on? I tried to familarize myself with the imputeTS package, but I am not sure which method I should be using. na_seadec or na_seasplit or something else?
My data (sample, created with slice_sample, n=20 from the dplyr package)
df <- structure(list(td = structure(c(1591601400, 1586611800, 1574420400,
1583326800, 1568898000, 1561969800, 1577010600, 1598238000, 1593968400,
1567800000, 1590967800, 1584981000, 1563597000, 1589117400, 1599796800,
1563467400, 1569819600, 1571014800, 1573320600, 1577154600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Temp_Dede = c(13.7, NA, NA, 6.4, 14.9, 19.1, 1.3,
14.2, 21.1, 15.1, 10, 5, 14.1, 24.2, 8.8, 25.3, 14.9, 19.7, NA,
6.2), Temp_188 = c(13.1, 12.6, 8.9, 6.3, 14.5, 18.8, 1.4, 14.2,
20.9, 13.1, 10.4, 5.1, 12.2, 24.2, 9.4, 25.9, 14.8, 18.9, NA,
6.1), Temp_275 = c(13.9, 12.6, 8.8, 6, 14.3, 18.9, 1.4, 13.5,
20.4, 12.2, 11.1, 4.6, 12.5, 23.3, 9.9, 24, 14.8, 19.2, 6.9,
5.9), Temp_807 = c(13.9, 13.1, 8.8, 6.2, 14.3, 19.1, 1.4, 14.7,
20.5, 13.3, 10.6, 4.9, 12.8, 23.1, 10.3, 24.8, 14.7, 19.1, 6.9,
6.1), Temp_1189 = c(13.7, 12.3, 8.8, 5.6, 14.1, 18.4, 1.4, 13.3,
19.9, 13.3, 10.7, 4.4, 13.6, 24, 9.8, 24.9, 14.7, 19.1, 6.9,
5.7), Temp_1599 = c(13.2, 12.7, 8.8, 5.1, 14.3, 18.3, 1.8, 14.2,
20.3, 13.2, 10.6, 4.4, 12.1, 22.9, 9.8, 25.8, 14.8, 19.2, 6.9,
5.9)), row.names = c(NA, -20L), class = "data.frame")
The code I've been using so far. I am only grouping by days in the first step. There are some months of the data which have several complete days missing, so I need to filter months with > 20 % NAs after that.
df %>% group_by(Datum) %>%
filter_at(vars(Temp_Dede, Temp_188, Temp_275, Temp_807, Temp_1189, Temp_1599),~mean(is.na(.) <0.2))
I am not sure what to do next and I am stuck.

How to identify a range of data points between a minimum and a maximum in a dataframe in R?

My measured variable V1 follows cycles: it goes up to reach a maximum and down to reach a minimum. I call "cycle" the range of data points between 2 consecutive maxima (cycle 1 is maximum 1 - minimum 1 - maximum 2, cycle 2 is maximum 2 - minimum 2 - maximum 3). The minima and maxima of each cycle are different.
My 2 questions are:
how to identify the range of data points in V1 corresponding to each cycle?
how to extract all the minima and all the maxima in V1?
I have used ggplot to identify my minima and maxima using stat_peaks() and stat_valleys(). I want to find a way of doing it without plotting it, to apply it to many data frames.
library(ggplot2)
library(ggpmisc)
#I plotted my data to visualize the minima (in yellow) and maxima (in blue) with stat_peaks and stat_valleys.
plot <- ggplot(df, aes(x=V0, y=V1))+
geom_point()+
stat_peaks(color="yellow", span=61)+
stat_valleys(color="blue", span=101)
#I used the ggplot_build function to extract the values of the highlighted peaks and valleys.
pb <- ggplot_build(plot)
I wanted to identify the 10 largest values in pb for which colour == "yellow" and the 10 lowest values in pb for which colour == "blue" but it does not work because pb is not a dataframe.
dput(df[1:200, c(1,2)])
structure(list(V0 = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1,
2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4,
3.5, 3.6, 3.7, 3.8, 3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7,
4.8, 4.9, 5, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6,
6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3,
7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6,
8.7, 8.8, 8.9, 9, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9,
10, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11,
11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12, 12.1,
12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13, 13.1, 13.2,
13.3, 13.4, 13.5, 13.6, 13.7, 13.8, 13.9, 14, 14.1, 14.2, 14.3,
14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15, 15.1, 15.2, 15.3, 15.4,
15.5, 15.6, 15.7, 15.8, 15.9, 16, 16.1, 16.2, 16.3, 16.4, 16.5,
16.6, 16.7, 16.8, 16.9, 17, 17.1, 17.2, 17.3, 17.4, 17.5, 17.6,
17.7, 17.8, 17.9, 18, 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7,
18.8, 18.9, 19, 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7, 19.8,
19.9, 20), V1 = c(32.56, 31.97, 29.08, 27.34, 25.34, 22.58,
20.93, 17.93, 14.65, 12.2, 9.88, 7, 5.52, 3.96, 3.26, 2.76, 3.23,
3.38, 3.5, 3.67, 4.24, 7.1, 9.94, 14.58, 17.57, 21.64, 23.83,
27.28, 29.48, 33.13, 34.37, 36.74, 37.13, 36.52, 35.87, 36, 35.49,
33.81, 32.89, 30.47, 29.87, 27.84, 25.83, 23.31, 21.39, 18.63,
16.42, 12.9, 10.6, 7.43, 5.95, 4.52, 3.76, 2.61, 2.94, 3.42,
2.89, 3.38, 3.64, 4.2, 5.74, 9.48, 12.71, 17.46, 19.76, 23.93,
27.46, 31.99, 34.07, 40.37, 46.48, 42.89, 48.33, 56.99, 47.16,
43.53, 39.86, 37.48, 30.36, 26.01, 23.03, 20.57, 15.92, 13.87,
11.61, 8.58, 6.52, 4.79, 3.88, 2.9, 2.94, 3.22, 3.45, 3.66, 3.89,
6.01, 8.37, 12.83, 15.06, 18.68, 21.2, 24.12, 26.97, 28.48, 26.69,
37.06, 40.15, 39.36, 35.73, 35.61, 35.83, 35.14, 31.55, 30.05,
25.34, 24.24, 23.4, 21.09, 18.32, 16.04, 13.18, 10.07, 8.23,
5.78, 4.71, 3.44, 3.48, 3.71, 3.72, 3.9, 4.56, 6.93, 9.3, 14.04,
14.66, 16.25, 18.43, 20.76, 21.86, 23.87, 26.63, 24.85, 29.98,
26.67, 26.99, 27.36, 25.08, 25.24, 26.48, 24.1, 22.66, 22.28,
23.29, 21.87, 21.02, 19.53, 22.75, 22.04, 20.64, 19.05, 19.4,
21, 18.93, 25.38, 23.59, 21.48, 21.9, 23.75, 23.38, 25.06, 25.2,
26.38, 25.22, 28.62, 27.38, 34.16, 35.94, 34.03, 28.95, 24.33,
24.76, 25.56, 24.96, 21.99, 23.53, 23.76, 24.5, 22.39, 23.01,
23.42, 24, 22.65, 21.44, 22.15, 21.72, 18.46, 17.65, 15.34, 16.11,
14.93)), row.names = c(NA, 200L), class = "data.frame")
You can add a variable to your data frame that labels the maxima and minima quite easily with the following line:
df$is_min_max <- c(FALSE, diff(as.numeric(diff(df$V1) > 0)) != 0, FALSE)
I'll explain how this works:
You can find out the difference between consecutive points in your data by doing
diff(df$V1)
so you can see where your data are going up or down by doing
as.numeric(diff(df$V1) > 0)
Which will give you a 1 between two points on an upward gradient and 0 on a downward gradient. So if you do
diff(as.numeric(diff(df$V1) > 0))
You will get a +1 or -1 at the points where the direction changes.
So if you do:
diff(as.numeric(diff(df$V1) > 0)) != 0
You will get a logical vector of the points that are local maxima and minima. Note the start and end points have been removed because we have double-diffed. Therefore we need to add a FALSE on at either end:
c(FALSE, diff(as.numeric(diff(df$V1) > 0)) != 0, FALSE)
So we could add this to your data frame as
df$is_min_max <- c(FALSE, diff(as.numeric(diff(df$V1) > 0)) != 0, FALSE)
You haven't included the actual data in your example, so I will show an example here using a simple sine wave:
df <- data.frame(x = seq(1, 20, 0.1), V1 = sin(seq(1, 20, 0.1)))
plot(df$x, df$V1)
And now we can just find our local maxima and minima...
df$is_min_max <- c(FALSE, diff(as.numeric(diff(df$V1) > 0)) != 0, FALSE)
And plot them:
points(df$x[df$is_min_max], df$V1[df$is_min_max], col = "red", cex = 3 )
Note that this will show up every change in direction, so if there are local "wobbles" in your data you will find maxima and minima there too. Removing these is possible but a little more complex.
Created on 2020-02-27 by the reprex package (v0.3.0)
The coordinates of peaks and valleys are contained in pb:
The peaks are in
pb$data[[2]]
xintercept yintercept label x y PANEL group x.label y.label shape colour size fill alpha stroke
1 7.9 0.9989413 7.9 7.9 0.9989413 1 -1 7.9 0.9989 19 yellow 1.5 NA NA 0.5
2 14.1 0.9993094 14.1 14.1 0.9993094 1 -1 14.1 0.9993 19 yellow 1.5 NA NA 0.5
The valleys are in
pb$data[[3]]
xintercept yintercept label x y PANEL group x.label y.label shape colour size fill alpha stroke
1 11 -0.9999902 11 11 -0.9999902 1 -1 11 -1 19 blue 1.5 NA NA 0.5
Note that the order of list elements may change depending on the order of ggplot function calls (layers).
Also note that the sample data provided by the OP is too small with respect to the spans given in calls stat_peaks(color="yellow", span=61) and stat_valleys(color="blue", span=101), resp.
Therefore, I have used the sample data from Allan's answer:
df <- data.frame(V0 = seq(1, 20, 0.1), V1 = sin(seq(1, 20, 0.1)))
which highlights two peaks and one valley using OP's code:
library(ggplot2)
library(ggpmisc)
plot <- ggplot(df, aes(x=V0, y=V1))+
geom_point()+
stat_peaks(color="yellow", span=61)+
stat_valleys(color="blue", span=101)
plot

Creating variables by combining a vector of names and a vector of values [duplicate]

This question already has answers here:
Pasting two vectors with combinations of all vectors' elements
(8 answers)
Closed 3 years ago.
I'd like to create a vector/list containing a series of variables that are the result of the combination of two vectors containing (i) specific variable names and (ii) specific variable ID (same for all the variables).
Here are reported a short version of the two vectors:
the variable names:
names<-c("XPTS", "TROCK", "JFSG")
and the variable IDs:
values<-c(1, 1.1, 1.2, 1.3, 2, 2.1, 2.2, 2.3, 3, 3.1, 3.2, 3.3, 4, 4.1, 4.2, 4.3, 5, 5.1, 5.2, 5.3, 6, 6.1, 6.2, 6.3, 7, 7.1, 7.2, 7.3, 8, 8.1, 8.2, 8.3, 9, 9.1, 9.2, 9.3, 10, 10.1, 10.2, 10.3, 11, 11.1, 11.2, 11.3, 12, 12.1, 12.2, 12.3, 13, 13.1, 13.2, 13.3, 14, 14.1, 14.2, 14.3, 15, 15.1, 15.2, 15.3, 16, 16.1, 16.2, 16.3, 17, 17.1, 17.2, 17.3, 18, 18.1, 18.2, 18.3, 19, 19.1, 19.2, 19.3, 20, 20.1, 20.2, 20.3, 21, 21.1, 21.2, 21.3, 22, 22.1, 22.2, 22.3, 23, 23.1, 23.2, 23.3, 24, 24.1, 24.2, 24.3, 25, 25.1, 25.2, 25.3, 26, 26.1, 26.2, 26.3, 27, 27.1, 27.2, 27.3, 28, 28.1, 28.2, 28.3, 29, 29.1, 29.2, 29.3, 30, 30.1, 30.2, 30.3, 31, 31.1, 31.2, 31.3, 32, 32.1, 32.2, 32.3, 33, 33.1, 33.2, 33.3, 34, 34.1, 34.2, 34.3, 35, 35.1, 35.2, 35.3, 36, 36.1, 36.2, 36.3, 37, 37.1, 37.2, 37.3, 38, 38.1, 38.2, 38.3, 39, 39.1, 39.2, 39.3, 40, 40.1, 40.2, 40.3, 41, 41.1, 41.2, 41.3, 42, 42.1, 42.2, 42.3, 43, 43.1, 43.2, 43.3, 44, 44.1, 44.2, 44.3, 45, 45.1, 45.2, 45.3, 46, 46.1, 46.2, 46.3, 47, 47.1, 47.2, 47.3, 48, 48.1, 48.2, 48.3, 49, 49.1, 49.2, 49.3, 50)
I'd live to obtain a list of variable names as follows:
"XPTS_1","XPTS_1.1","XPTS_1.2", ..., "XPTS_49.3","XPTS_50","TROCK_1","TROCK_1.1",...,"TROCK_49.3","TROCK_50","JFSG_1","JFSG_1.1",...,"JFSG_49.3","JFSG_50"
The variable names are not only those reported but might change, so I'd like to have a dynamic loop for dealing with it. The one I wrote, as follows, doesn't fit my purpose:
variables_ID<-for (i in 1:length(values)) {
paste(names, values[i], sep = "_")
}
since I get only
"XPTS_50" "TROCK_50" "JFSG_50"
We can use outer
out1 <- c(t(outer(names, values, paste, sep="_")))
NOTE: transposed just to show that we get identical results with rep
Or use rep to replicate the 'names' and then paste
out2 <- paste(rep(names, each = length(values)), values, sep="_")
all.equal(out1, out2)
#[1] TRUE
head(out1)
#[1] "XPTS_1" "XPTS_1.1" "XPTS_1.2" "XPTS_1.3" "XPTS_2" "XPTS_2.1"
tail(out1)
#[1] "JFSG_48.3" "JFSG_49" "JFSG_49.1" "JFSG_49.2" "JFSG_49.3" "JFSG_50"
Or using CJ
library(data.table)
CJ(names, values)[, paste(names, values, sep="_")]
Or with tidyverse
library(tidyverse)
crossing(names, values) %>%
unite(names, names, values) %>%
pull(names)

Resolving minFactor error when using nls in R

I am running nls models in R on several different datasets, using the self-starting Weibull Growth Curve function, e.g.
MOD <- nls(Response ~ SSweibull(Time, Asym, Drop, lrc, pwr), data = DATA)
With data like this, it works as expected:
GOOD.DATA <- data.frame("Time" = c(1:150), "Response" = c(31.2, 20.0, 44.3, 35.2,
31.4, 27.5, 24.1, 25.9, 23.3, 21.2, 21.3, 19.8, 18.4, 17.3, 16.3, 16.3,
16.6, 15.9, 15.9, 15.8, 15.1, 15.6, 15.1, 14.5, 14.2, 14.2, 13.7, 14.1,
13.7, 13.4, 13.0, 12.6, 12.3, 12.0, 11.7, 11.4, 11.1, 11.0, 10.8, 10.6,
10.4, 10.1, 11.6, 12.0, 11.9, 11.7, 11.5, 11.2, 11.5, 11.3, 11.1, 10.9,
10.9, 11.4, 11.2, 11.1, 10.9, 10.9, 10.7, 10.7, 10.5, 10.4, 10.4, 10.3,
10.1, 10.0, 9.9, 9.7, 9.6, 9.7, 9.6, 9.5, 9.5, 9.4, 9.3, 9.2, 9.1, 9.0,
8.9, 9.0, 8.9, 8.8, 8.8, 8.7, 8.6, 8.5, 8.4, 8.3, 8.3, 8.2, 8.1, 8.0,
8.0, 8.0, 7.9, 7.9, 7.8, 7.7, 7.6, 7.6, 7.6, 7.6, 7.5, 7.5, 7.5, 7.5,
7.4, 7.4, 7.3, 7.2, 7.2, 7.1, 7.1, 7.0, 7.0, 6.9, 6.9, 6.8, 6.8, 6.7,
6.7, 6.6, 6.6, 6.5, 6.5, 6.4, 6.4, 6.4, 6.3, 6.3, 6.2, 6.2, 6.2, 6.1
6.1, 6.1, 6.0, 6.0, 5.9, 5.9, 5.9, 5.9, 5.8, 5.8, 5.8, 5.8, 5.8, 5.8,
5.8, 5.7))
But with this data set:
BAD.DATA <- data.frame("Time" = c(1:150), "Response" = c(89.8, 67.0,
51.4, 41.2, 39.4, 38.5, 34.3, 30.9, 29.9, 34.8, 32.5, 30.1, 28.5, 27.0,
26.2, 24.7, 23.8, 23.6, 22.6, 22.0, 21.3, 20.7, 20.1, 19.6, 19.0, 18.4,
17.9, 17.5, 17.1, 23.1, 22.4, 21.9, 23.8, 23.2, 22.6, 22.0, 21.6, 21.1,
20.6, 20.1, 19.7, 19.3, 19.0, 19.2, 18.8, 18.5, 18.3, 19.5, 19.1, 18.7,
18.5, 18.3, 18.0, 17.7, 17.5, 17.3, 17.0, 16.7, 16.7, 16.9, 16.6, 16.4,
16.1, 15.9, 15.8, 15.6, 15.4, 15.2, 15.0, 14.8, 14.7, 14.5, 14.4, 14.2,
14.0, 13.9, 13.7, 13.6, 15.4, 15.2, 15.1, 15.0, 14.9, 14.7, 14.6, 14.5,
14.4, 14.3, 14.4, 14.2, 14.1, 14.0, 13.8, 13.7, 13.6, 13.5, 13.4, 13.2,
13.3, 13.2, 13.1, 13.0, 12.9, 12.8, 12.7, 12.6, 12.5, 12.5, 12.4, 12.3,
12.2, 12.1, 12.1, 11.9, 12.8, 12.7, 12.6, 12.5, 12.4, 14.2, 14.1, 14.0,
14.1, 14.0, 13.9, 13.8, 13.7, 13.7, 13.6, 13.5, 13.4, 13.3, 13.3, 13.2,
13.1, 13.0, 12.9, 12.9, 12.8, 12.7, 12.6, 12.9, 12.8, 12.7, 12.6, 12.5,
12.5, 12.4, 12.3, 12.2))
I get the error;
Error in nls(y ~ cbind(1, -exp(-exp(lrc) * x^pwr)), data = xy, algorithm = "plinear",
: step factor 0.000488281 reduced below 'minFactor' of 0.000976562
By including the control argument I am able to change the minFactor for GOOD.DATA:
MOD <- nls(Response ~ SSweibull(Time, Asym, Drop, lrc, pwr), data = GOOD.DATA,
control = nls.control(minFactor = 1/4096))
But the model was running without errors anyway. With BAD.DATA and several other datasets, including control has no effect and I just get the same error message.
Questions
How can I change the minFactor for the BAD.DATA?
What's causing the error? (i.e. what is it about the data set that triggers the error?)
Will changing the minFactor resolve this error, or is this one of R's obscure error messages and it actually indicates a different issue?
It seems the control option does not work in your case, since the code breaks at getInitial while self-starting, that is, before using your provided control parameters. One way would be to try specifying some starting parameters, instead of the naive self-starting. For nls it is often the case that playing with the initial parameters will make-or-break it, not entirely sure for the specific Weibull case though, but should be the same.
To see that you don't arrive to the actual control, you can try with nls.control(printEval = T) and see that there's no print.

Resources