How to get next number in sequence in R - r

I need to automate the process of getting the next number(s) in the given sequence.
Can we make a function which takes two inputs
a vector of numbers(3,7,13,21 e.g.)
how many next numbers
seqNext <- function(sequ, next) {
..
}
seqNext( c(3,7,13,21), 3)
# 31 43 57
seqNext( c(37,26,17,10), 1)
# 5

By the power of maths!
x1 <- c(3,7,13,21)
dat <- data.frame(x=seq_along(x1), y=x1)
predict(lm(y ~ poly(x, 2), data=dat), newdata=list(x=5:15))
# 1 2 3 4 5 6 7 8 9 10 11
# 31 43 57 73 91 111 133 157 183 211 241
When dealing with successive differences that change their sign, the pattern of output values ends up switching from decreasing to increasing:
x2 <- c(37,26,17,10)
dat <- data.frame(x=seq_along(x2), y=x2)
predict(lm(y ~ poly(x,2), data=dat), newdata=list(x=1:10))
# 1 2 3 4 5 6 7 8 9 10
#37 26 17 10 5 2 1 2 5 10
-(11) -(9) -(7) -(5) -(3) -(1) -(-1) -(-3) -(-5)
-2 -2 -2 -2 -2 -2 -2 -2
As a function:
seqNext <- function(x,n) {
L <- length(x)
dat <- data.frame(x=seq_along(x), y=x)
unname(
predict(lm(y ~ poly(x, 2), data=dat), newdata=list(x=seq(L+1,L+n)))
)
}
seqNext(x1,5)
#[1] 31 43 57 73 91
seqNext(x2,5)
#[1] 5 2 1 2 5
This is also easily extensible to circumstances where the pattern might be n orders deep, e.g.:
x3 <- c(100, 75, 45, 5, -50)
diff(x3)
#[1] -25 -30 -40 -55
diff(diff(x3))
#[1] -5 -10 -15
diff(diff(diff(x3)))
#[1] -5 -5
seqNext <- function(x,n,degree=2) {
L <- length(x)
dat <- data.frame(x=seq_along(x), y=x)
unname(
predict(lm(y ~ poly(x, degree), data=dat), newdata=list(x=seq(L+1,L+n)))
)
}
seqNext(x3,n=5,deg=3)
#[1] -125 -225 -355 -520 -725

seqNext <- function(x, n) {
k <- length(x); d <- diff(x[(k - 2):k])
x[k] + 1:n * d[2] + cumsum(1:n) * diff(d[1:2])
}
seqNext(c(3,7,13,21),3)
# [1] 31 43 57
seqNext(c(37,26,17,10),1)
# [1] 5
seqNext(c(137,126,117,110),10)
# [1] 105 102 101 102 105 110 117 126 137 150
seqNext(c(105,110,113,114),5)
# [1] 113 110 105 98 89

Related

Looping a loop to do everything at once

I am trying to simulate the following "game:
There is a population of 100 units
You randomly sample 10 of these units, record the id's of the units you saw, and then put them back into the population
You then take a second sample, record the id's of the units you saw in this second sample along with the first sample, and then put the second sample back into the population
Repeat this many times
I wrote the following code in R that performs the above procedure:
library(dplyr)
var_1 = rnorm(100,10,10)
var_2 = rnorm(100,1,10)
var_3 = rnorm(100,5,10)
response = rnorm(100,1,1)
my_data = data.frame(var_1, var_2, var_3, response)
my_data$id = 1:100
results <- list()
results2<- list()
for (i in 1:100)
{
iteration_i = i
sample_i = my_data[sample(nrow(my_data), 10), ]
results_tmp = data.frame(iteration_i, sample_i)
results[[i]] <- results_tmp
}
results_df <- do.call(rbind.data.frame, results)
test_1 <- data.frame(results_df %>%
group_by(id) %>%
filter(iteration_i == min(iteration_i)) %>%
distinct)
summary_file = data.frame(test_1 %>% group_by(iteration_i) %>% summarise(Count = n()))
cumulative = cumsum(summary_file$Count)
summary_file$Cumulative = cumulative
summary_file$unobserved = 100 - cumulative
The result looks something like this:
> summary_file
iteration_i Count Cumulative unobserved
1 1 10 10 90
2 2 8 18 82
3 3 9 27 73
4 4 8 35 65
5 5 6 41 59
6 6 5 46 54
7 7 7 53 47
8 8 7 60 40
9 9 4 64 36
10 10 3 67 33
11 11 4 71 29
12 12 4 75 25
13 13 1 76 24
14 14 4 80 20
15 15 1 81 19
16 16 2 83 17
17 17 2 85 15
18 18 1 86 14
19 20 1 87 13
20 22 1 88 12
21 23 2 90 10
22 24 1 91 9
23 25 1 92 8
24 27 2 94 6
25 28 1 95 5
26 30 1 96 4
27 35 1 97 3
28 37 1 98 2
29 44 1 99 1
30 46 1 100 0
I would now like to repeat this "game" many times.
I would like to keep the "summary_file" for each "game" (e.g. summary_file_1, summary_file_2, summary_file_3, etc.)
I would then like to create a "total" summary file that shows the number of iterations that were required in each game to observe all units.
This total_summary_file would look something like this:
game_id iterations_required
1 game_1 47
2 game_2 45
3 game_3 44
4 game_4 42
5 game_5 42
Currently, I am just copy/pasting my earlier code several times and storing the results, then I append everything at the end and calculate the summary statistics - but I am trying to find a way to "loop the loop" and do everything at once. I do not know if it is possible to introduce a command like "results_df_i <- do.call(rbind.data.frame, results_i)" into the loop and efficiently create everything at the same time instead of manually copy/pasting the earlier loop.
You're making this a lot less efficient than it could be. To get, say, 100 repeated samples of 10 from the set 1:100 (with replacement), we can do replicate(100, sample(100, 10, TRUE)).
We can then coerce this into a vector and count the number of unique values every 10 entries along the vector until we get to 100. This gives us the number of iterations required to exhaust the samples.
If we put this inside an sapply, we don't even need an explicit loop, which means we can create the results data frame in a single call:
set.seed(1)
n_games <- 10
results <- data.frame(game_id = paste("game", seq(n_games), sep = "_"),
iterations_required = sapply(seq(n_games), function(x) {
samp <- c(replicate(100, sample(100, 10, TRUE)))
sum(sapply(1:100 * 10, function(n) length(unique(samp[1:n]))) < 100)
}))
results
#> game_id iterations_required
#> 1 game_1 59
#> 2 game_2 44
#> 3 game_3 54
#> 4 game_4 59
#> 5 game_5 57
#> 6 game_6 58
#> 7 game_7 96
#> 8 game_8 60
#> 9 game_9 71
#> 10 game_10 33
Created on 2022-06-11 by the reprex package (v2.0.1)
There are lots of ways to get your desired outcome; wrapping your loop in a function and running the function multiple times is another potential solution:
library(dplyr)
var_1 = rnorm(100,10,10)
var_2 = rnorm(100,1,10)
var_3 = rnorm(100,5,10)
response = rnorm(100,1,1)
my_data = data.frame(var_1, var_2, var_3, response)
my_data$id = 1:100
results <- list()
results2<- list()
sample_func <- function(output_file_name) {
for (i in 1:100)
{
iteration_i = i
sample_i = my_data[sample(nrow(my_data), 10), ]
results_tmp = data.frame(iteration_i, sample_i)
results[[i]] <- results_tmp
}
results_df <- do.call(rbind.data.frame, results)
test_1 <- data.frame(results_df %>%
group_by(id) %>%
filter(iteration_i == min(iteration_i)) %>%
distinct)
summary_file = data.frame(test_1 %>% group_by(iteration_i) %>% summarise(Count = n()))
cumulative = cumsum(summary_file$Count)
summary_file$Cumulative = cumulative
summary_file$unobserved = 100 - cumulative
output_file_name <- summary_file
return(output_file_name)
}
list_of_names <- paste0("game_", 1:10)
output <- lapply(list_of_names, sample_func)
names(output) <- list_of_names
head(output, n = 2)
#> $game_1
#> iteration_i Count Cumulative unobserved
#> 1 1 10 10 90
#> 2 2 9 19 81
#> 3 3 8 27 73
#> 4 4 8 35 65
#> 5 5 6 41 59
#> 6 6 6 47 53
#> 7 7 6 53 47
#> 8 8 4 57 43
#> 9 9 4 61 39
#> 10 10 3 64 36
#> 11 11 2 66 34
#> 12 12 1 67 33
#> 13 13 4 71 29
#> 14 14 1 72 28
#> 15 15 2 74 26
#> 16 16 2 76 24
#> 17 17 3 79 21
#> 18 18 4 83 17
#> 19 19 2 85 15
#> 20 20 2 87 13
#> 21 21 1 88 12
#> 22 24 1 89 11
#> 23 25 2 91 9
#> 24 26 1 92 8
#> 25 27 1 93 7
#> 26 30 1 94 6
#> 27 31 1 95 5
#> 28 33 1 96 4
#> 29 34 1 97 3
#> 30 36 1 98 2
#> 31 41 1 99 1
#> 32 66 1 100 0
#>
#> $game_2
#> iteration_i Count Cumulative unobserved
#> 1 1 10 10 90
#> 2 2 10 20 80
#> 3 3 7 27 73
#> 4 4 7 34 66
#> 5 5 8 42 58
#> 6 6 5 47 53
#> 7 7 7 54 46
#> 8 8 5 59 41
#> 9 9 1 60 40
#> 10 10 7 67 33
#> 11 11 3 70 30
#> 12 12 3 73 27
#> 13 13 1 74 26
#> 14 14 3 77 23
#> 15 15 4 81 19
#> 16 16 3 84 16
#> 17 17 2 86 14
#> 18 18 1 87 13
#> 19 19 2 89 11
#> 20 20 1 90 10
#> 21 21 2 92 8
#> 22 22 1 93 7
#> 23 25 2 95 5
#> 24 27 1 96 4
#> 25 29 2 98 2
#> 26 30 1 99 1
#> 27 41 1 100 0
The output is a list, so you can use purrr to apply functions to each element (e.g. https://purrr.tidyverse.org/reference/lmap.html) or use the bind_rows() function to create a single dataframe for further use, e.g.
df2 <- bind_rows(output, .id = "game") %>%
group_by("Game" = factor(game, levels = list_of_names)) %>%
summarise(rows_in_output = n(),
number_of_iterations = max(iteration_i))
df2
#> # A tibble: 10 × 3
#> Game rows_in_output number_of_iterations
#> <fct> <int> <int>
#> 1 game_1 32 66
#> 2 game_2 27 41
#> 3 game_3 27 48
#> 4 game_4 32 50
#> 5 game_5 27 35
#> 6 game_6 27 71
#> 7 game_7 28 68
#> 8 game_8 27 48
#> 9 game_9 29 43
#> 10 game_10 29 66
Created on 2022-06-17 by the reprex package (v2.0.1)
Or you can use list2env() to get have each individual dataframe in your environment, e.g.
list2env(output, envir = .GlobalEnv)
ls()
#> [1] "df2" "game_1" "game_10" "game_2" "game_3" "game_4"
#> [7] "game_5" "game_6" "game_7" "game_8" "game_9" "list_of_names"
#> [13] "my_data" "output" "response" "results" "results2" "sample_func"
#> [19] "var_1" "var_2" "var_3"
This seems easily solvable using recursion:
fun <- function(x, i=1, size = 10){
a <- setdiff(x,sample(100, 10, TRUE)) # Remove the seen from x
if(length(a)) Recall(a, i+1) else i # if we have unobserved, call fun again
}
Now we can have as many games as we want:
data.frame(game = paste0('game',seq(10)), results = replicate(10, fun(1:100)))
game results
1 game1 62
2 game2 40
3 game3 51
4 game4 50
5 game5 34
6 game6 83
7 game7 38
8 game8 40
9 game9 53
10 game10 41
You could also do
hist(replicate(1000, fun(1:100)), breaks = 30)
Edit:
Note that this can be editted to take in any size and vector. eg:
fun <- function(x, size = 10, y=x, i=1){
a <- setdiff(x,sample(y, size, TRUE))
cat('i', i, '\t a: ',a, '\n')
if(length(a)>0) Recall(a, size, y, i+1) else i
}
set.seed(117);fun(1:10, 1)
i 1 a: 1 2 4 5 6 7 8 9 10 # 3 removed
i 2 a: 1 2 4 5 7 8 9 10 # 6 removed
i 3 a: 1 2 4 5 7 8 9 # 10 removed
i 4 a: 1 2 4 5 8 9 # 7 removed
i 5 a: 1 2 5 8 9 # 4 removed
i 6 a: 1 2 5 8 9 # Nothing removed
i 7 a: 1 5 8 9 # 2 removed
i 8 a: 1 5 8 # 9 removed
i 9 a: 1 5 # Nothing removed
i 10 a: 1 5 # Nothing removed
i 11 a: 5 # 1 removed
i 12 a: # 5 removed
Using Markov chains, we can produce the cumulative distribution function for the number of iterations required for a game (up to machine precision). The resulting CDF can be sampled directly using findInterval.
We can simplify things slightly by starting with the second iteration, since the first iteration will always result in 90 unseen units.
First, set up a matrix for all possible transitions:
m <- matrix(c(rep(90:1, each = 11), sequence(rep(11,90), 90:1, -1)), ncol = 2, dimnames = list(NULL, c("from", "to")))
m <- m[m[,2] >= 0L,]
Then create a transition matrix with row 1 representing the state where all units have been seen and row 91 representing the state where 10 units have been seen:
mTrans <- matrix(0, 91, 91)
The number of previously unseen units selected follows the hypergeometric distribution.
mTrans[m + 1L] <- dhyper(m[,1] - m[,2], m[,1], 100L - m[,1], 10L)
Row 1 represents an absorbing state since all units have been seen.
mTrans[1, 1] <- 1
mTrans contains the probabilities of each state after the second iteration.
Initialize a while loop and calculate the CDF.
mm <- mTrans %*% mTrans
maxIter <- 1000L
p <- numeric(maxIter)
iter <- 3L
while (p[iter] < 1) {
if ((iter <- iter + 1L) > maxIter) {
p <- c(p, numeric(maxIter))
maxIter <- maxIter*2L
}
mm <- mm %*% mTrans
p[iter] <- mm[91, 1]
}
p <- p[1:iter]
iter
#> [1] 345
Machine precision limits the CDF to less than 345 iterations. Plot the CDF:
plot(p, xlab = "iterations", ylab = "cumulative probability")
Using findInterval we can quickly generate a large number of random samples of the iterations required.
ngames <- 1e6L # one million games
results <- data.frame(game_id = 1:ngames, iterations_required = findInterval(runif(ngames), p))
head(results)
#> game_id iterations_required
#> 1 1 73
#> 2 2 69
#> 3 3 40
#> 4 4 41
#> 5 5 44
#> 6 6 43
Get a histogram of the sample number of iterations required.
hist(results$iterations_required)
OP here! I think I was able to find an answer to my own question:
library(dplyr)
var_1 <- rnorm(100, 10, 10)
var_2 <- rnorm(100, 1, 10)
var_3 <- rnorm(100, 5, 10)
response <- rnorm(100, 1, 1)
my_data <- data.frame(var_1, var_2, var_3, response)
my_data$id <- 1:100
simulate <- function() {
results <- list()
results2 <- list()
for (i in 1:100) {
iteration_i <- i
sample_i <- my_data[sample(nrow(my_data), 10), ]
results_tmp <- data.frame(iteration_i, sample_i)
results[[i]] <- results_tmp
}
results_df <- do.call(rbind.data.frame, results)
test_1 <- data.frame(results_df %>%
group_by(id) %>%
filter(iteration_i == min(iteration_i)) %>%
distinct)
summary_file <- data.frame(test_1 %>%
group_by(iteration_i) %>%
summarise(Count=n()))
cumulative <- cumsum(summary_file$Count)
summary_file$Cumulative <- cumulative
summary_file$unobserved <- 100 - cumulative
return(summary_file)
}
# now, loop 10 times!
results <- list()
for (i in 1:10) {
game_i <- i
s_i <- simulate()
results_tmp <- data.frame(game_i, s_i)
results[[i]] <- results_tmp
}
final_file <- do.call(rbind.data.frame, results)
Thanks for your help everyone!

how to find the same value if their length are not the same?

I would like to find which row of a value that has the same value of another one?
Here is the code:
> a
[1] 3 5 6
> num
x y z
1 112 55 0
2 23 21 1
3 121 56 2
4 132 15 3
5 123 15 4
6 132 45 5
7 132 41 6
8 179 45 7
To find out on which row of num has the same value that a has, is there a function I can use like match function? I tried this code (it would not work):
for(i in 1:length(num)){
for (j in 1: length(a)){
if (num$z[i]==a[j]){
return(row(num[i]))
}
}
}
The outputs are the warning.
See code below. lapply returns a list and the list which_rows contains the rows in which a value of a appears in each column.
which_rows <- lapply(df1, function(x) which(x %in% a))
which_rows
$x
integer(0)
$y
integer(0)
$z
[1] 4 6 7
You were almost there with the match function:
match(a, num$z)
# [1] 4 6 7
To get the full rows of num satisfying z==a:
num[match(a,num$z),]
# x y z
# 132 15 3
# 132 45 5
# 132 41 6
or
num %>% filter(z %in% a)
# x y z
# 132 15 3
# 132 45 5
# 132 41 6
To get just the y:
num$y[match(a,num$z)]
# [1] 15 45 41

How to transform NA values with the R mutate function?

I'm trying to use the function mutate is order to create a variable based on conditions regarding three others.
These conditions were created using case_when, as you may see in the code below.
But I have some conditions that uses NA valures, and these seems to be causing an error in the mutate function.
Check it out, please:
# About the variables being used:
unique(x1)
# [1] 1 0 NA
str(pemg$x1)
# num [1:1622989] 1 0 0 1 1 0 1 1 0 0 ...
unique(x2)
# [1] 16 66 38 11 8 6 14 17 53 59 10 31 50 19 48 42 44 21 54 55 56 18 57 61 13 43 7 4 15
# [30] 39 5 20 3 37 23 51 36 52 68 58 27 65 62 2 12 32 41 49 46 35 34 45 81 69 33 40 0 70
# [59] 9 47 63 29 25 22 64 24 60 30 67 26 71 72 28 1 75 80 87 77 73 78 76 79 74 83 92 102 85
# [88] 86 90 82 91 84 88 93 89 96 95 105 115 106 94 100 99 97 104 98 103 108 109 101 117 107 114 113 NA 112
# [117] 110 111
str(pemg$x2)
# num [1:1622989] 16 66 38 11 8 6 14 17 53 59 ...
unique(x3)
# [1] 6 3 4 5 0 8 2 1 11 9 10 7 NA 15
str(pemg$anoest)
# num [1:1622989] 6 3 4 5 3 0 5 8 4 2 ...
df <- mutate(df,
y = case_when(
x1 == 1 & x2 >= 7 & x3 == 0 ~ 1,
x1 == 1 & x2 >= 8 & x3 == 1 ~ 1,
x1 == 1 & x2 >= 10 & x3 == 3 ~ 1,
x1 == 1 & x2 >= 11 & x3 == 4 ~ 1,
x1 == 1 & x2 >= 12 & x3 == 5 ~ 1,
x1 == 1 & x2 >= 13 & x3 == 6 ~ 1,
x1 == 1 & x2 >= 14 & x3 == 7 ~ 1,
x1 == 1 & x2 >= 15 & x3 == 8 ~ 1,
x1 == 1 & x2 >= 16 & x3 == 9 ~ 1,
x1 == 1 & x2 >= 17 & x3 == 10 ~ 1,
x1 == 1 & x2 >= 18 & x3 == 11 ~ 1,
x1 == 1 & !is.na(x3) ~ 0,
x1 == 1 & x3 %in% 12:16 ~ 0,
x2 %in% 0:7 ~ NA,
x2 > 18 ~ NA,
x1 == 0 ~ NA,
is.na(x3) ~ NA))
# Error: Problem with `mutate()` input `defasado`.
# x must be a double vector, not a logical vector.
# i Input `defasado` is `case_when(...)`.
# Run `rlang::last_error()` to see where the error occurred.
last_error()
# <error/dplyr_error>
# Problem with `mutate()` input `y`.
# x must be a double vector, not a logical vector.
# i Input `y` is `case_when(...)`.
# Backtrace:
# 1. dplyr::mutate(...)
# 2. dplyr:::mutate.data.frame(...)
# 3. dplyr:::mutate_cols(.data, ...)
# Run `rlang::last_trace()` to see the full context.
last_trace()
# <error/dplyr_error>
# Problem with `mutate()` input `defasado`.
# x must be a double vector, not a logical vector.
# i Input `defasado` is `case_when(...)`.
# Backtrace:
# x
# 1. +-dplyr::mutate(...)
# 2. \-dplyr:::mutate.data.frame(...)
# 3. \-dplyr:::mutate_cols(.data, ...)
# <parent: error/rlang_error>
# must be a double vector, not a logical vector.
# Backtrace:
# x
# 1. +-mask$eval_all_mutate(dots[[i]])
# 2. \-dplyr::case_when(...)
# 3. \-dplyr:::replace_with(...)
# 4. \-dplyr:::check_type(val, x, name)
# 5. \-dplyr:::glubort(header, "must be {friendly_type_of(template)}, not {friendly_type_of(x)}.")
Can someone give me a hint on how to solve this?
The problem here are the results of your case_when. if_else form dplyr is stricter than ifelse from base R - all result values have to be of the same type. Since case_when is a vecotrization of multiple if_else you have to tell R which type of NA the output should be:
library(dplyr)
# does not work
dplyr::tibble(d = c(6,2,4, NA, 5)) %>%
dplyr::mutate(v = case_when(d < 4 ~ 0,
is.na(d) ~ NA))
# works
dplyr::tibble(d = c(6,2,4, NA, 5)) %>%
dplyr::mutate(v = case_when(d < 4 ~ 0,
is.na(d) ~ NA_real_))
You need to make sure your NA's are the right class. In your case, place the NA after the ~ in as.numeric(). For example:
x2 %in% 0:7 ~ as.numeric(NA)
R has different types of NA. The one you are using is of logical type, but you need the double type NA_real_ in order to be consistent with the output of your other conditions. For more information, see this: https://stat.ethz.ch/R-manual/R-patched/library/base/html/NA.html
In base R, we can construct a logical vector and assign the column values to NA based on that logical vector. Unlike case_when, we don't have to really specify the type of NA as this gets automatically converted.
df1$d[df1$d %in% 0:7] <- NA
Also, for a simple operation, it can be done in base R in a compact way

Error: Numerical expression has 6 elements: only the first used and Warning in GA

Here is my code and I am wondering why there are warning messages saying "number of items to replace is not a multiple of replacement length"?
library(GA)
library(readxl)
> data1 <-
read_excel("C:/Users/nadiahalim/OneDrive/CS954/Scheduling/data1.xlsx")
> View(data1)
> data1
output for data1
# A tibble: 6 x 4
Jobj Pj Dj Wj
<dbl> <dbl> <dbl> <dbl>
1 1 25 61 3
2 2 7 102 7
3 3 42 86 1
4 4 36 44 3
5 5 18 150 1
6 6 29 134 4
I want to reorder Jobj in such away that it will give the minimum completion time denoted by Cj. For-loops has been used in my code as follow:
Cj=0
i<- sample(data1$Jobj)
for(i in 1:i){
fitness <-function(j){
for(j in data1$Jobj){
Cj[j] <- data1$Pj[j]+sum(Cj[j-1])
}
print(Cj)
}
fitness()
Output for this code
numerical expression has 6 elements: only the first used[1] 25 32 74 110 128 157
[1] 25 32 74 110 128 157
[1] 25 32 74 110 128 157
[1] 25 32 74 110 128 157
[1] 25 32 74 110 128 157
Then, I run for GA code:
GA <- ga(type = "permutation", fitness = fitness, lower = 1, upper = 6, maxiter= 5, run = 20, optim = TRUE)
output for GA as follow and the warning stated that number of items to replace is not a multiple of replacement length
number of items to replace is not a multiple of replacement length[1] 25 32 74 110 128 157
number of items to replace is not a multiple of replacement lengthGA | iter = 3 | Mean = 25 | Best = 25
[1] 25 32 74 110 128 157
number of items to replace is not a multiple of replacement length[1] 25 32 74 110 128 157
number of items to replace is not a multiple of replacement length[1] 25 32 74 110 128 157
then I run for
summary(GA)
output for summary GA
-- Genetic Algorithm -------------------
GA settings:
Type = permutation
Population size = 50
Number of generations = 5
Elitism = 2
Crossover probability = 0.8
Mutation probability = 0.1
GA results:
Iterations = 5
Fitness function value = 25
Solutions =
x1 x2 x3 x4 x5 x6
[1,] 4 1 6 3 5 2
[2,] 6 3 1 5 4 2
[3,] 6 2 3 4 1 5
[4,] 3 6 4 1 5 2
[5,] 4 6 2 1 5 3
[6,] 5 3 4 1 2 6
[7,] 2 1 6 3 5 4
[8,] 4 1 6 3 2 5
[9,] 3 6 1 2 5 4
[10,] 3 6 2 1 5 4
...
[35,] 4 5 6 2 1 3
[36,] 3 4 5 6 2 1
When I run for summary(GA), it seems work but I am curious about the warning messages.I would be grateful if someone could help me concerning this. Thank you in advance.
What you are trying to do in fitness function can easily be achieved without for loop.
For example, the output that you get from for loop is
#[1] 25 32 74 110 128 157
and using cumsum we get the same thing.
cumsum(data1$Pj)
#[1] 25 32 74 110 128 157
The main issue however, is the fitness function should return only one value and not a vector of values like cumsum. So if you change the function to something like this
fitness <- function(j) sum(j)
You can do :
fitness(data1$Pj)
#[1] 157
library(GA)
GA <- ga(type = "permutation", fitness = fitness, lower = 1, upper = 6,
maxiter= 5, run = 20, optim = TRUE)
which returns no warning.

Generating interaction variables in R dataframes

Is there a way - other than a for loop - to generate new variables in an R dataframe, which will be all the possible 2-way interactions between the existing ones?
i.e. supposing a dataframe with three numeric variables V1, V2, V3, I would like to generate the following new variables:
Inter.V1V2 (= V1 * V2)
Inter.V1V3 (= V1 * V3)
Inter.V2V3 (= V2 * V3)
Example using for loop :
x <- read.table(textConnection('
V1 V2 V3 V4
1 9 25 18
2 5 20 10
3 4 30 12
4 4 34 16'
), header=TRUE)
dim.init <- dim(x)[2]
for (i in 1: (dim.init - 1) ) {
for (j in (i + 1) : (dim.init) ) {
x[dim(x)[2] + 1] <- x[i] * x[j]
names(x)[dim(x)[2]] <- paste("Inter.V",i,"V",j,sep="")
}
}
Here is a one liner for you that also works if you have factors:
> model.matrix(~(V1+V2+V3+V4)^2,x)
(Intercept) V1 V2 V3 V4 V1:V2 V1:V3 V1:V4 V2:V3 V2:V4 V3:V4
1 1 1 9 25 18 9 25 18 225 162 450
2 1 2 5 20 10 10 40 20 100 50 200
3 1 3 4 30 12 12 90 36 120 48 360
4 1 4 4 34 16 16 136 64 136 64 544
attr(,"assign")
[1] 0 1 2 3 4 5 6 7 8 9 10
Here you go, using combn and apply:
> x2 <- t(apply(x, 1, combn, 2, prod))
Setting the column names can be done with two paste commands:
> colnames(x2) <- paste("Inter.V", combn(1:4, 2, paste, collapse="V"), sep="")
Lastly, if you want all your variables together, just cbind them:
> x <- cbind(x, x2)
> V1 V2 V3 V4 Inter.V1V2 Inter.V1V3 Inter.V1V4 Inter.V2V3 Inter.V2V4 Inter.V3V4
1 1 9 25 18 9 25 18 225 162 450
2 2 5 20 10 10 40 20 100 50 200
3 3 4 30 12 12 90 36 120 48 360
4 4 4 34 16 16 136 64 136 64 544
I think this question should be complemented with the poly/polym function, which goes futher: it generates not only interactions between the variables, but its power until the selected degree. And orthogonal iteractions, which may be very usefull.
The directly solution to the asked problem would be:
> polym(x$V1, x$V2, x$V3, x$V4, degree = 2, raw = T)
1.0.0.0 2.0.0.0 0.1.0.0 1.1.0.0 0.2.0.0 0.0.1.0 1.0.1.0 0.1.1.0 0.0.2.0 0.0.0.1 1.0.0.1 0.1.0.1 0.0.1.1 0.0.0.2
[1,] 1 1 9 9 81 25 25 225 625 18 18 162 450 324
[2,] 2 4 5 10 25 20 40 100 400 10 20 50 200 100
[3,] 3 9 4 12 16 30 90 120 900 12 36 48 360 144
[4,] 4 16 4 16 16 34 136 136 1156 16 64 64 544 256
attr(,"degree")
[1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
The columns 4, 7, 8, 11, 12, 13 has the requested in the question. Other columns have other kinds of interactions. If you would like to get orthogonal interactions, just set raw = FALSE.

Resources