Loop to select pairwise series - r

I have a data.frame in R with 40 series and I want to select pairwise series to apply a function, (ie serie 1 and serie 21, serie 2 and serie 22) . However I'm getting error with the following code:
for(i in 1:ncol(Date)) {
pairwise <-Date[, c(i,i+20)]
}
I want to use pairwise in other function.
Could someone please help me?
Thank in advance

It is because you are requesting columns higher than 40 when i > 20. See this example:
set.seed(1)
DF <- data.frame(matrix(rnorm(40*100), ncol = 40))
## simple function to apply/use
foo <- function(x1, x2) return(x1 - x2)
## something to hold results
res <- matrix(ncol = ncol(DF), nrow = nrow(DF))
## loop - oops error
for(i in seq_len(ncol(DF))) {
res[,i] <- foo(DF[,i], DF[,i+20])
}
You get this error:
Error in `[.data.frame`(DF, , i + 20) : undefined columns selected
That is because i takes values 1, ..., 40. As soon as i >= 21, (i + 20) > 40 and you only have 40 columns of data. A simple modification is to loop only over the first 20 columns:
## something to hold results
res <- matrix(ncol = ncol(DF) / 2, nrow = nrow(DF))
for(i in seq_len(ncol(DF)/2)) {
res[,i] <- foo(DF[,i], DF[,i+20])
}
if all you want is col 1 and col 21, col 2 and col 22 etc. If you want all pairwise comparisons then you need to try something different, as a single loop won't work.
(Before someone pulls me up for woefully inefficient use of a loop, that example was just that, an example with no imagination applied to the function foo(). In this case, DF[, 1:20] - DF[, 21:40] will give the same result as in res.)

Related

Nested for loop in R is giving me bracket error despite using correct amount of brackets

So I made a matrix with 4 rows to represent 4 individuals (each with an ID). I'm trying to use a nested for loop to incorporate a time increment in the first loop, then in the second loop add a row for every individual for the new time and incorporate a function in the 3rd column and each increase in time will add on the value of the function to the value in the same column from the function from the previous time step. I'm starting small with 4 individuals and 5 time steps, but for some reason I'm getting an error message about an unexpected '}', but I've gone through and double checked the brackets and parentheses multiple times. I'm not sure what the issue is with the for loop or if it's going to end up doing what I intend for it to do.
uptake <- function(x){
vmax <- x[1]
km <- x[2]
s <- x[3]
result <- vmax*(s/(km+s))
return(result)
}
agents <- matrix(0, nrow = 4, ncol = 6)
colnames(agents) <- c("Time", "ID", "Uptake rate (V)", "vmax", "km", "s")
agents[,1] <- 0
agents[,2] <- c(1:4)
agents[,4] <- 1.4
agents[,5] <- 17
agents[,6] <- 1.4
for (i in seq(1, 5, 1)){
for (j in 1:nrow(agents$Time = i-1)){
agents[j,] <-rbind(agents, c(i, agents[j,2], agents[j,3] +
uptake(agents[j,4:6]), agents[j,4],
agents[j,5], agents[j,6]))}}
and this is the error code I'm getting:
Error: unexpected '}' in:
" uptake(agents[j,4:6]), agents[j,4],
agents[j,5], agents[j,6]))}"
I appreciate any advice and insight!!
Is this what you are trying to do?
for (i in seq(1, 5, 1)) {
for (j in seq_len(sum(agents[,"Time"] == i - 1))) {
agents <- rbind(
agents,
c(
i,
agents[j,2],
agents[j,3] + uptake(agents[j,4:6]),
agents[j,4],
agents[j,5],
agents[j,6]
)
)
}
}

Dataframe output from a for-loop

I am trying to populate the output of a for loop into a data frame. The loop is repeating across the columns of a dataset called "data". The output is to be put into a new dataset called "data2". I specified an empty data frame with 4 columns (i.e. ncol=4). However, the output generates only the first two columns. I also get a warning message: "In matrix(value, n, p) : data length [2403] is not a sub-multiple or multiple of the number of columns [2]"
Why does the dataframe called "data2" have 2 columns, when I have specified 4 columns? This is my code:
a <- 0
b <- 0
GM <- 0
GSD <- 0
data2 <- data.frame(ncol=4, nrow=33)
for (i in 1:ncol(data))
{
if (i==34) {break}
a[i] <- colnames(data[i])
b <- data$cycle
GM[i] <- geoMean(data[,i], na.rm=TRUE)
GSD[i] <- geoSD(data[,i], na.rm=TRUE)
data2[i,] <- c(a[i], b, GM[i], GSD[i])
}
data2
If you look at the ?data.frame() help page, you'll see that it does not take arguments nrow and ncol--those are arguments for the matrix() function.
This is how you initialize data2, and you can see it starts with 2 columns, one column is named ncol, the second column is named nrow.
data2 <- data.frame(ncol=4, nrow=33)
data2
# ncol nrow
# 1 4 33
Instead you could try data2 <- as.data.frame(matrix(NA, ncol = 4, nrow = 33)), though if you share a small sample of data and your expected result there may be more efficient ways than explicit loops to get this job done.
Generally, if you do loop, you want to do as much outside of the loop as possible. This is just guesswork without having sample data, these changes seem like a start at improving your code.
a <- colnames(data)
b <- data$cycle ## this never changes, no need to redefine every iteration
GM <- numeric(ncol(data)) ## better to initialize vectors to the correct length
GSD <- numeric(ncol(data))
data2 <- as.data.frame(matrix(NA, ncol = 4, nrow = 33))
for (i in 1:ncol(data))
{
if (i==34) {break}
GM[i] <- geoMean(data[,i], na.rm=TRUE)
GSD[i] <- geoSD(data[,i], na.rm=TRUE)
## it's weird to assign a row of data.frame at once...
## maybe you should keep it as a matrix?
data2[i,] <- c(a[i], b, GM[i], GSD[i])
}
data2

nested double loop in R

mat <- matrix(0,ncol=6, nrow=100)
d=c(1,2,4,8,16,32)
for(i in 1:6)
{
for(j in d)
{
mat[,i]=rep(j,100)
}
}
mat
I should get a 100 x 6 matrix with columns of 1,2,4,8,16,32. However, I simply get rows of 32 in every column. Does anyone have any idea how I can fix this. I do want to use loops, even if its one loop that's fine.
The answer from #neilfws is more elegant. If you are committed to using a loop for some reason you can do this
mat <- matrix(0,ncol=6, nrow=100)
d=c(1,2,4,8,16,32)
for(i in 1:6)
{
j <- d[i]
mat[,i]=rep(j,100)
}
mat
The issue is that you were looping through all of d for each column.
Based on your description: 100 rows x 6 columns, column 1 = value 1...column 6 = value 32, this should generate what you want.
matrix(data = rep(c(1,2,4,8,16,32), each = 100),
nrow = 100,
ncol = 6)

How to apply a distribution function for each row in data frame

I know similar questions have been asked in this site here, here, and here, but none of them tackles my problem.
I've a data frame which I want to apply the rdirichlet function (from gtools) to each line. So, each line shall be consider as aplha.
data = NULL
data <- data.frame(rbind(
oct = c(60, 32, 8),
sep = c(53, 35, 12),
ago = c(54, 40, 6)
))
data <- data/100*1000
library(gtools) # contains the function
sim <- 10000 # simulation
My first attenpt was to use apply, it does work, but the output is not that clear for conducting further analysis; each row computation becomes a vector:
p = apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
I also try in a loop without success:
p = NULL
for(i in 1:length(data)) {
p[i] <- rdirichlet(sim, alpha = data[i] + 1)
}
Any tip how can I solve this?
Well firstly you might want to change the data in your anonymous function in the apply to x to match the x in function(x)
apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
This works for me, as in it provides an output with three columns and 30000 rows.
Two important things here. First, vectorizing is the best way to go:
ans <- apply(data, 1, function(x) rdirichlet(sim, alpha = x + 1))
By doing this, you'll receive each row computations as vector, essentially k vs sim like.
Then you'll need to subsample things like:
margin <- ans[1:100000,1] - ans[100001:200000,1]

How to sum over range and calculate a series in R?

Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))

Resources