Use a FOR loop to run function over each column in r - r

I have several columns of data that I want to use a for loop (specifically a for loop. Please, no answers that don't involve a for loop) to run a function for each column in a matrix.
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
x y z
[1,] 9.5262742 16.22999 21.93228
[2,] 5.8183264 14.53771 21.81774
[3,] 3.9509342 17.36694 22.46594
[4,] 3.0245614 19.46411 25.80411
[5,] 5.0284351 13.89636 21.61767
[6,] 3.0291715 17.50267 26.28110
[7,] 8.4727471 16.77365 27.60535
[8,] 3.3816903 15.23395 22.01265
[9,] 0.3182083 13.97575 29.25909
[10,] 2.6499290 16.71129 27.05160
for (i in 1:ncol(tab)){
print(mean(i))
}
I have almost no familiarity with R and have had trouble finding a solution that specifically uses a for loop to run a function and output a result per column.

Well, strictly using a for loop, I think this would do what you want to!
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
for (i in 1:ncol(tab)){
print(mean(tab[, i]))
}
You need to index the matrix by using [row, column]. When you want to select all rows for a specific column (which is your case), just leave the row field empty. So that's why you have to use [, i], where i is your index.

Related

Problems with the names of the elements of a list

I have created a list whose elements are themselves a list of matrices. I want to be able to extract the vectors of observations for each variable
p13 = 0.493;p43 = 0.325;p25 = 0.335;p35 = 0.574;p12 = 0.868
std_e2 = sqrt(1-p12^2)
std_e3 = sqrt(1-(p13^2+p43^2))
std_e5 = sqrt(1-(p25^2+p35^2+2*p25*p35*(p13*p12)))
set.seed(1234)
z1<-c(0,1)
z2<-c(0,1)
z3<-c(0,1)
z4<-c(0,1)
z5<-c(0,1)
s<-expand.grid(z1,z2,z3,z4,z5); s
s<-s[-1,];s
shift<-3
scenari<-s*shift;scenari
scenario_1<-scenari[1];scenario_1
genereting_fuction<-function(n){
sample<-list()
for (i in 1:nrow(scenario_1)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(10, 100), genereting_fuction)
dati_fault[[1]]
[[1]]
X1 X2 X3 X4 X5
[1,] 2.505826 1.736593 1.0274581 -0.6038358 1.9967656
[2,] 4.127593 3.294344 2.8777777 1.2386725 3.0207723
[3,] 1.853050 1.312617 1.1875699 0.5994921 1.0471564
[4,] 4.481019 3.330629 2.1880050 -0.1087338 2.7331061
[5,] 3.916191 3.306036 0.7258404 -1.1388570 1.0293168
[6,] 3.335131 2.379439 1.2407679 0.3198553 1.6755424
[7,] 3.574675 3.769436 1.1084120 -1.0065481 2.0034434
[8,] 3.203620 2.842074 0.6550587 -0.8516120 -0.1433508
[9,] 2.552959 2.642094 2.5376430 2.0387860 3.5318055
[10,] 2.656474 1.607934 2.2760391 -1.3959822 1.0095796
I only want to save the elements of X1 in an object, and so for the other variables. .
Here you have a list of matrix with scenario in row and n columns.
genereting_fuction <- function(n, scenario, scenari){
# added argument because you assume global variable use
nr <- nrow(scenario)
sample <- vector("list", length = nr) # sample<-list()
# creating a list is better than expanding it each iteration
for (i in 1:nr){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(3, 2), function(x) genereting_fuction(x, scenario_1, scenari))
dati_fault
lapply(dati_fault, function(x) {
tmp <- lapply(x, function(y) y[,"X1"])
tmp <- do.call(rbind, tmp)
})
If you want to assemble this list of matrix, like using cbind, I suggest you just use a single big n value and not the lapply with rep inside it.
Also I bet there is easier way to simulate this number of scenari, but it's difficult to estimate without knowing the context of your code piece.
Also, try to solve your issue with a minimal example, working with a list of 100 list of 32 matrix of 5*10 is a bit messy !
Good luck !

How it iterate over a matrix using a function in R?

I have created a function to order a vector of length 2, using the following code
x = (c(6,2))
orders = function(x){
for(i in 1:(length(x)-1)){
if(x[i+1] < x[i]){
return(c(x[i+1], x[i]))} else{
(return(x))
}}}
orders(x)
I have been asked to use this function to process a dataset with 2 columns as follows. Iterate over the rows of the
data set, and if the element in the 2nd column of row i is less than the element in the first
column of row i, switch the order of the two entries in the row by making a suitable call to
the function you just wrote.
I've tried using the following code
set.seed(1128719)
data=matrix(rnorm(20),byrow=T,ncol=2)
df = for (i in 1:2) {
for(j in 1:10){
data = orders(c(x[i], x[j]))
return(data)
}
}
The output is null. I'm not quite sure where I'm going wrong.
Any suggestions?
I modified your code a bit but tried to keep the 'style' the same
Ther is no need for a loop
i in 1:(length(x)-1) always evaluates to
for i in 1:1 and i will only take the value of 1.
orders = function(x){
# Since the function will only work on vectors of length 2
# its good practice to raise an error right at the start
#
if (length(x) != 2) {
stop("x must be vector of lenght 2")
}
if (x[2] < x[1]) {
return(c(x[2], x[1]))
} else {
return(x)
}
}
orders(c(6, 2))
set.seed(1128719)
data <- matrix(rnorm(20),byrow=T,ncol=2)
The for loop itself cant be assigned to a variable
But we use the loop to mutate the matrix 'data'
in place
for (row in 1:nrow(data)) {
data[row, ] <- orders(data[row,])
}
data
Edit:
This is the input:
[,1] [,2]
[1,] -0.04142965 0.2377140
[2,] -0.76237866 -0.8004284
[3,] 0.18700893 -0.6800310
[4,] 0.76499646 0.4430643
[5,] 0.09193440 -0.2592316
[6,] 1.17478053 -0.4044760
[7,] -1.62262500 0.1652850
[8,] -1.54848857 0.7475451
[9,] -0.05907252 -0.8324074
[10,] -1.11064318 -0.1148806
This is the output i get:
[,1] [,2]
[1,] -0.04142965 0.23771403
[2,] -0.80042842 -0.76237866
[3,] -0.68003104 0.18700893
[4,] 0.44306433 0.76499646
[5,] -0.25923164 0.09193440
[6,] -0.40447603 1.17478053
[7,] -1.62262500 0.16528496
[8,] -1.54848857 0.74754509
[9,] -0.83240742 -0.05907252
[10,] -1.11064318 -0.11488062
Here are two ways of ordering the 2 columns matrix.
This is the test matrix posted in the question.
set.seed(1128719)
data <- matrix(rnorm(20), byrow = TRUE, ncol = 2)
1. With a function orders.
The function expects as input a 2 element vector. If they are out of order, return the vector with its elements reversed, else return the vector as is.
orders <- function(x){
stopifnot(length(x) == 2)
if(x[2] < x[1]){
x[2:1]
}else{
x
}
}
Test the function.
x <- c(6,2)
orders(x)
#[1] 2 6
Now with the matrix data.
df1 <- t(apply(data, 1, orders))
2. Vectorized code.
Creates a logical index with TRUE whenever the elements are out of order and reverse only those elements.
df2 <- data
inx <- data[,2] < data[,1]
df2[inx, ] <- data[inx, 2:1]
The results are the same.
identical(df1, df2)
#[1] TRUE

Retaining lagged value to compound towards end value

I would like to please ask for your help concerning the following issue.
In a table-like object where each row corresponds to an observation in time, I would like to obtain the value from the previous row for one particular variable (:= p0), multiply it with an element of another column (:= returnfactor) and write the result to the current row as an element of another column (:= p1).
Illustrated via two pictures, I want to go from
to
.
I have written
matrix <- cbind (
1:10,
1+rnorm(10, 0, 0.05),
NA,
NA
)
colnames(matrix) <- c("timeid", "returnfactor", "p0", "p1")
matrix[1, "p0"] <- 100
for (i in 1:10)
{
if (i==1)
{
matrix[i, "p1"] <- matrix[1, "p0"] * matrix[i, "returnfactor"]
}
else
{
matrix[i, "p0"] <- matrix[i-1, "p1"]
matrix[i, "p1"] <- matrix[i, "p0"] * matrix[i, "returnfactor"]
}
}
That is, I implemented what I would like to reach using a loop. However, this loop is too slow. Obviously, I am new to R.
Could you please give me a hint how to improve the speed using the capabilities R has to offer? I assume there is no need for a loop here, though I lack an approach how to do it else. In SAS, I used its reading of data frames by row and the retain-statement in a data step.
Yours sincerely,
Sinistrum
We can indeed improve this. The key thing to notice is that values of both p0 and p1 involve mostly cumulative products. In particular, we have
mat[, "p1"] <- mat[1, "p0"] * cumprod(mat[, "returnfactor"])
mat[-1, "p0"] <- head(mat[, "p1"], -1)
where head(mat[, "p1"], -1) just takes all the mat[, "p1"] except for its last element. This gives
# timeid returnfactor p0 p1
# [1,] 1 0.9903601 100.00000 99.03601
# [2,] 2 1.0788946 99.03601 106.84941
# [3,] 3 1.0298117 106.84941 110.03478
# [4,] 4 0.9413212 110.03478 103.57806
# [5,] 5 0.9922179 103.57806 102.77200
# [6,] 6 0.9040545 102.77200 92.91149
# [7,] 7 0.9902371 92.91149 92.00440
# [8,] 8 0.8703836 92.00440 80.07913
# [9,] 9 1.0657001 80.07913 85.34033
# [10,] 10 0.9682228 85.34033 82.62846

Populate matrix rows by looping over vector where loop interval is larger than 1

I'm trying to populate an matrix via a loop. Given a vector of values and an empty matrix;
n <- c(seq(10,100,10))
out <- matrix(ncol=1, nrow=length(n))
running this simple loop as an example;
for(i in n){
dostuff <- i*2
out[i,1] <- dostuff
}
gives the error message
Error in[<-(tmp, i, 1, value = dostuff) : subscript out of bounds,
as the interval within the vector that the loop is based on is larger than 1 and therefore does not fit the 1:10 index of the matrix rows. Removing i from the out row index only repeats the result of the last iteration :
for(i in n){
dostuff <- i*2
out[,1] <- dostuff
}
There is obviously something fundamental about loops that I don't understand. I have looked, e.g. here and here, but have not been able to find a good solution. This is the result I'm looking for:
[,1]
[1,] 20
[2,] 40
[3,] 60
[4,] 80
[5,] 100
[6,] 120
[7,] 140
[8,] 160
[9,] 180
[10,] 200

cbind, grep and quotation marks in R

Consider a minimum working example (for, e.g. a binomial model):
test.a.tset <- rnorm(10)
test.b.tset <- rnorm(10)
c <- runif(10)
c[c < 0.5] <- 0
c[c >= 0.5] <- 1
df <- data.frame(test.a.tset,test.b.tset,c)
Using a regex, I want to regress c on all variables with the structure test."anything".tset:
summary(glm(paste("c ~ ",paste(colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
collapse = "+"), sep = ""), data = df, family=binomial))
So far, no problems. Now we get to the part where cbind comes into play. Suppose I want to use a different statistical model (e.g. rbprobitGibbs from the bayesm package), which requires a design matrix as input.
Thus, I need to transform the data frame into the appropriate format.
X <- cbind(df$test.a.tset,df$test.b.tset)
Or, alternatively, if I want to use regex again (where I even add a second grep to ensure that only the part inside the quotation marks is selected):
X2 <- cbind(grep("[^\"]+",paste(paste("df$", colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
sep = ""), collapse = ","), value = TRUE))
But there is a difference:
> X
[,1] [,2]
[1,] -0.4525601 -1.240484170
[2,] 0.3135625 1.240519383
[3,] -0.2883953 -0.554670224
[4,] -1.3696994 -1.373690426
[5,] 0.8514529 -0.063945537
[6,] -1.1804205 -0.314132743
[7,] -1.0161170 -0.001605679
[8,] 1.0072168 0.938921869
[9,] -0.8797069 -1.158626865
[10,] -0.9113297 1.641201924
> X2
[,1]
[1,] "df$test.a.tset,df$test.b.tset"
From my point of view the problem seems to be that grep returns the selected value as a string inside quotation marks and that, while glm sort of ignores the quotation marks in "df$test.a.tset,test.b.tset", cbind does not.
I.e. the call for X2 after the paste is actually read as:
X2 <- cbind("df$test.a.tset,df$test.b.tset")
Question: Is there a way to get the same result for X2 as for X using a regex?
The code grep("test\\.\\w+\\.tset", colnames(df)) will return the indexes of columns that match your pattern. If you wanted to build a matrix using just those columns, you could just use:
X3 <- as.matrix(df[,grep("test\\.\\w+\\.tset", colnames(df))])

Resources