saving vectors from a loop in a list in r - r

Hi I never edited a question of mine but I'll give it a try. It's not soo extremely important what the code means actually. For me only saving the vectors "liste" in a new list is relevant :D
test <- list()
test <- replicate(5, sample(1:100, 50), simplify = FALSE) # Creates a list of 5 vectors
> test[[1]]
[1] 90 96 20 86 32 77 83 33 64 29 88 97 78 81 40 60 89 19 31 59 26 38 34 71 5 80 85
[28] 3 70 87 41 50 6 18 37 58 9 76 91 62 12 30 42 94 72 95 100 10 68 82
S <- test[[1]]
x <- diff(S) # following algorythm creates "liste" (vector) for test [[1]]
trendtest <- list()
k <- NULL
d <- NULL
t <- vector("list",length(x))
A <- vector("list",length(x))
z <- vector("list",length(x)-2)
za <- vector("list",length(x)-2)
liste <- NULL
dreisum <- sapply(1:(length(x)-2), function(i) sum(x[c(i,(i+1))]))
dreisumi <- lapply(1:(length(x)-2), function(i) dreisum[i:(length(x)-2)])
zdreisumi<- lapply(1:(length(x)-4), function(i) dreisumi[[i]] [3:length(dreisumi[[i]])]<0)
zadreisumi<- lapply(1:(length(S)-4), function(i) dreisumi[[i]][3:length(dreisumi[[i]])]>0)
Si <- lapply(1:(length(x)-2), function(i) S[i:(length(x))])
i <- 1
h <- 1
while(i<(length(x)-3) & h!=Inf){
k <- c(k,k <- (S[i]-S[i+2])/(-2))
d <- c(d,d <- (S[i+2]*i-S[i]*(i+2))/(-2))
t[[i]] <- i:(length(x))
A[[i]] <- k[length(liste)+1]*t[[i]]+d[length(liste)+1]
A[[i]][3] <- S[i+2]
z[[i]] <- Si[[i]][3:length(Si[[i]])]<A[[i]][3:length(A[[i]])]
za[[i]] <- Si[[i]][3:length(Si[[i]])]>A[[i]][3:length(A[[i]])]
if(k[length(liste)+1]>0 & S[i+3]>A[[i]][4] & is.element(TRUE,z[[i]])){h <- (min(which(z[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]>0 & S[i+3]<A[[i]][4] & is.element(TRUE,za[[i]])){h <- (min(which(za[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]<0 & S[i+3]>A[[i]][4] & is.element(TRUE,z[[i]])){h <- (min(which(z[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]<0 & S[i+3]<A[[i]][4] & is.element(TRUE,za[[i]])){h <- (min(which(za[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]>0 & S[i+3]>A[[i]][4] & (all(z[[i]]==FALSE))){h <- (min(which(zdreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]>0 & S[i+3]<A[[i]][4] & (all(za[[i]]==FALSE))){h <- (min(which(zdreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]<0 & S[i+3]>A[[i]][4] & (all(z[[i]]==FALSE))){h <- (min(which(zadreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]<0 & S[i+3]<A[[i]][4] & (all(za[[i]]==FALSE))){h <- (min(which(zadreisumi[[i]]!=FALSE))+2)}}}}}}}}
liste <- c(liste,i)
i <- i+h-1
if((length(x)-3)<=i & i<=length(x)){liste <- c(liste,i)}}
> liste
[1] 1 3 7 10 12 16 18 20 24 27 30 33 36 39 41 46
Actually the whole code is not so interesting for my problem because it works! I made the example for test[[1]] now. BUT I want that a for-loop (or whatever) takes ALL vectors in "test" and saves ALL 5 vectors "liste" in a new list (lets call it "trendtest" ... whatever :D)

The following will do what you ask for:
Delete the line trendtest <- list().
Take the code from x <- diff(S) to last line (except the very last line that only prints liste) and insert it at the position indicated by the placeholder __CODE_HERE__.
trendtest <- lapply(test, FUN = function(S) {
__CODE_HERE__
return(liste)
})
This is the "R way" of doing what you want. Alternatively, you could do the following (which is closer to your initial approach, but less the "R way"):
trendtest <- vector("list", length(test))
for (u in 1:length(test)) { # better: u in seq_along(test)
S <- test[[u]]
__CODE_HERE__
trendtest[[u]] <- liste
}
Note that there will be an error message which is due to the sample data (which doesn't fit the algorithm provided) and is unrelated to saving liste in trendtest.

Related

Is there a R function to have 2 for loops

I have a problem. Initially I solved it for scenario 1. Scenario 1 is I have a dataframe df. I need to plot mean of the variables that are numeric
df
A B C D E F
1 asd 29 sf 36 sf 44
2 fsd 24 gfd 56 gfd 34
3 gs 46 asd 39 asd 37
4 asd 50 gfg 26 gfg 23
5 sf 43 fg 56 fg 37
6 dfg 29 er 35 er 51
7 sdfg 32 tr 27 tr 28
8 fgdsgd 24 qw 31 qw 36
I have a code to plot the mean of variables that are numeric. The cod e is as shown below
p2 <- list()
cs <- names(Filter(is.numeric, df))
for(i in cs)
{
p2[i] <- mean(df[,i])
do.call(rbind,p2) %>% as.data.frame()
}
p2 <- as.data.frame(p2)
p2 <- unlist(p2)
p2 <- stack(p2)
ggplot(data=p2,aes(x=ind,y=values))+geom_bar(stat =
"identity")+ylab("Mean")
But I need to have another loop. The above scenario is only mean. Now I also need median , Sd and many more. So i have called these function as a vector
gh <- list()
mea <- c("mean","median","sd")
p2 <- list()
cs <- names(Filter(is.numeric, df))
for(i in cs)
{
for(j in mea)
{
p2[i] <- gh[[j]](df[i])
do.call(rbind,p2) %>% as.data.frame()
}
}
p2 <- as.data.frame(p2)
p2 <- unlist(p2)
p2 <- stack(p2)
ggplot(data=p2,aes(x=ind,y=values))+geom_bar(stat =
"identity")+ylab("Mean")
Could you help me in building Scenario 2 that is the above one
So ideally it should plot mean, median and sd for all the variables that are numeric
Here is how I would do it without loops using mtcars data set
apply(mtcars,2,function(x){
if (is.numeric(x)){
return(c("mean"=mean(x),"median"=median(x),"sd"=sd(x)))
}
})

Extracting chunks from a matrix by columns

Say I have a matrix with 1000 columns. I want to create a new matrix with every other n columns from the original matrix, starting from column i.
So let say that n=3 and i=5, then the columns I need from the old matrix are 5,6,7,11,12,13,17,18,19 and so on.
Using two seq()s to create the start and stop bounds, then using a mapply() on those to build your true column index intervals. Then just normal bracket notation to extract from your matrix.
set.seed(1)
# using 67342343's test case
M <- matrix(runif(100^2), ncol = 100)
n <- 3
i <- 5
starts <- seq(i, ncol(M), n*2)
stops <- seq(i+(n-1), ncol(M), n*2)
col_index <- c(mapply(seq, starts, stops)) # thanks Jaap and Sotos
col_index
[1] 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53 54 55 59 60 61 65 66 67 71 72 73 77 78
[39] 79 83 84 85 89 90 91 95 96 97
M[, col_index]
Another solution is based on the fact that R uses index recycling:
i <- 5; n <- 3
M <- matrix(runif(100^2), ncol = 100)
id <- seq(i, ncol(M), by = 1)[rep(c(TRUE, FALSE), each = n)]
M_sub <- M[, id]
I would write a function that determines the indices of the columns you want, and then call that function as needed.
col_indexes <- function(mat, start = 1, by = 1){
n <- ncol(mat)
inx <- seq(start, n, by = 2*by)
inx <- c(sapply(inx, function(i) i:(i + by -1)))
inx[inx <= n]
}
m <- matrix(0, nrow = 1, ncol = 20)
icol <- col_indexes(m, 5, 3)
icol
[1] 5 6 7 11 12 13 17 18 19
Here is a method using outer.
c(outer(5:7, seq(0L, 95L, 6L), "+"))
[1] 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53
[26] 54 55 59 60 61 65 66 67 71 72 73 77 78 79 83 84 85 89 90 91 95 96 97
To generalize this, you could do
idx <- c(outer(seq(i, i + n), seq(0L, ncol(M) - i, 2 * n), "+"))
The idea is to construct the initial set of columns (5:7 or seq(i, i + n)), calculate the starting points for every subsequent set (seq(0L, 95L, 6L) or seq(0L, ncol(M) - i, 2 * n)) then use outer to calculate the sum of every combination of these two vectors.
you can subset the matrix using [ like M[, idx].

Fisher's exact test on values from large dataframe and bypassing errors

I have a dataframe which is 214 columns long and many rows long, and I want to perform a fisher's exact test for each row using values from 4 columns.
An example subset of relevant information from my dataframe looks like:
Variant DB.count.1 DB.count.2 pop.count.1 pop.count.2
A 23 62 35 70
B 81 4 39 22
C 51 42 49 52
D NA NA 65 8
E 73 21 50 33
F 72 13 81 10
G 61 32 75 21
H NA NA 42 22
I NA NA 60 20
J 80 12 72 24
I am trying to use a for loop to:
create a contingency table for each row for the Fisher's exact test to compare DB.counts to pop.counts
run a Fisher's exact test using this contingency table to determine if there is a difference between DB.counts and pop.counts
output the p-value result to a new column on my dataframe
As you can see there are "NA" values in some positions and thus in some contingency tables, obviously this will cause an error, which is ok, but I would like for the code to output a value to the column when it encounters this error such as "." or "error" and skip to the next row/contingency table.
i.e. I would like an output which looks like this:
Variant DB.count.1 DB.count.2 pop.count.1 pop.count.2 fishers
A 23 62 35 70 0.4286
B 81 4 39 22 <0.0001
C 51 42 49 52 0.3921
D NA NA 65 8 error
E 73 21 50 33 0.0143
F 72 13 81 10 0.5032
G 61 32 75 21 0.0744
H NA NA 42 22 error
I NA NA 60 20 error
J 80 12 72 24 0.0425
The code I currently have (based on R loop over Fisher test - Error message) is:
df$fishers" <- for (i in 1:nrow(df))
{
table <- matrix(c(df[i,4], df[i,5], df[i,2], df[i,3]), ncol = 2, byrow = TRUE)
fisher.test(table, alternative="greater")
}
This seems to create the contingency tables the way I want but the problem of bypassing the errors and printing the p-vlaue to the new column remains. I have tried to use try and tryCatch but have been unsuccessful in doing so.
I am an R beginner so really appreciate any advice on how to improve my questions or any advice for my problem! Thank you!
Edit 1: I have now tried using the data.table package as below and have got what I need from data sets with no "NA" values but how do I skip the errors and make the code continue? Thanks!!!
library(data.table)
dt <- data.table(df)
dt[, p.val := fisher.test(matrix(c(pop.count.1, pop.count.2, DB.count.1, DB.count.2), ncol=2), workspace=1e9)$p.value, by=Variant]
df <- as.data.frame(dt)
You can include an if-else statement in your loop like this:
res <- NULL
for (i in 1:nrow(df)){
table <- matrix(c(df[i,4], df[i,5], df[i,2], df[i,3]), ncol = 2, byrow = TRUE)
# if any NA occurs in your table save an error in p else run the fisher test
if(any(is.na(table))) p <- "error" else p <- fisher.test(table, alternative="greater")$p.value
# save all p values in a vector
res <- c(res,p)
}
df$fishers <- res
Or put the code in a function and use apply instead of a loop:
foo <- function(y){
# include here as.numeric to be sure that your values are numeric:
table <- matrix(as.numeric(c(y[4], y[5], y[2], y[3])), ncol = 2, byrow = TRUE)
if(any(is.na(table))) p <- "error" else p <- fisher.test(table, alternative="greater")$p.value
p
}
df$fishers <- apply(df, 1, foo)

Using apply in a 'window'

Is there a way to use apply functions on 'windows' or 'ranges'? This example should serve to illustrate:
a <- 11:20
Now I want to calculate the sums of consecutive elements. i.e.
[11+12, 12+13, 13+14, ...]
The ways I can think of handling this are:
a <- 11:20
b <- NULL
for(i in 1:(length(a)-1))
{
b <- c(b, a[i] + a[i+1])
}
# b is 23 25 27 29 31 33 35 37 39
or alternatively,
d <- sapply( 1:(length(a)-1) , function(i) a[i] + a[i+1] )
# d is 23 25 27 29 31 33 35 37 39
Is there a better way to do this?
I'm hoping there's something like:
e <- windowapply( a, window=2, function(x) sum(x) ) # fictional function
# e should be 23 25 27 29 31 33 35 37 39
Here's an anternative using rollapply from zoo package
> rollapply(a, width=2, FUN=sum )
[1] 23 25 27 29 31 33 35 37 39
zoo package also offers rollsum function
> rollsum(a, 2)
[1] 23 25 27 29 31 33 35 37 39
We can define a general moving() function:
moving <- function(f){
g <- function(i , x, n , f, ...) f(x[(i-n+1):i], ...)
function(x, n, ...) {
N <- length(x)
vapply(n:N, g, x , n , f, FUN.VALUE = numeric(1), ...)
}
}
Function moving() returns function that, in turn can be used to generate any moving_f() functions:
moving_sum <- moving(sum)
moving_sum(x = 11:20, n = 2)
similarly, even passing extra arguments to moving_f()
moving_mean <- moving(mean)
moving_mean(x = rpois(22, 6), n = 5, trim = 0.1)
You can achieve your windowapply function by first creating a list of indices and then *applying over them such that they are used as extraction indices:
j <- lapply(seq_along(a), function(i) if(i<10) c(i,i+1) else i)
sapply(j, function(j) sum(a[j]))
## [1] 23 25 27 29 31 33 35 37 39

how to do Loop in R

I have a question regarding loop in R.
For example, currently at t=0, there are 100 people alive. Basically, each person will be alive with a probability of exponential (-mu) in which i put the mu=0.1.
I want to generate 10 samples to get the number of people alive at t=1. So i have done and get the following.
command:
set.seed(123)
alive <- 100
mu <- 0.1
sample <- 10
alive1 <- rbinom(sample,alive,exp(-mu))
alive1
# [1] 92 88 91 87 86 95 90 87 90 91
and now, i want to keep continuing doing it until time t=20.
command :
alive2 <- rbinom(10,alive1,exp(-mu))
alive2
alive3 <- rbinom(10,alive2,exp(-mu))
alive3
....
alive20 <-rbinom (10,alive19,exp(-mu))
alive20
output :
alive2 <- rbinom(10,alive1,exp(-mu))
alive2
# [1] 78 80 81 78 81 82 83 83 83 77
alive3 <- rbinom(10,alive2,exp(-mu))
alive3
# [1] 67 71 72 63 72 73 75 75 77 72
...
however, i do not want to keep on repeating the command especially if i want to extend my time to a longer period. how do i do the looping in r for my problem?
thanks!
set.seed(123)
alive <- vector("list", 20)
mu <- 0.1
n <- 10
alive[[1]] <- rbinom(n, 100, exp(-mu))
for(i in 2:20)
alive[[i]] <- rbinom(n, alive[[i-1]], exp(-mu))
I renamed the variable sample to n to avoid confusion with the commonly used function sample().
set.seed(123)
alive <- 100
mu <- 0.1
sample <- 10
alive1 <- rbinom(sample,alive,exp(-mu))
for ( i in 2:20)
{
assign(
paste0("alive",i),
rbinom(10,get(paste0("alive",(i-1))),exp(-mu))
)
}
Or #Backlin's suggestion of putting it in a list -
set.seed(123)
alive <- 100
mu <- 0.1
sample <- 10
Aliveset <- vector(mode = "list", length = 20)
Aliveset[[1]] <- rbinom(sample,alive,exp(-mu))
for ( i in 2:20)
{
Aliveset[[i]] <- rbinom(10,Aliveset[[i-1]],exp(-mu))
}

Resources