I have created a list whose elements are themselves a list of matrices. I want to be able to extract the vectors of observations for each variable
p13 = 0.493;p43 = 0.325;p25 = 0.335;p35 = 0.574;p12 = 0.868
std_e2 = sqrt(1-p12^2)
std_e3 = sqrt(1-(p13^2+p43^2))
std_e5 = sqrt(1-(p25^2+p35^2+2*p25*p35*(p13*p12)))
set.seed(1234)
z1<-c(0,1)
z2<-c(0,1)
z3<-c(0,1)
z4<-c(0,1)
z5<-c(0,1)
s<-expand.grid(z1,z2,z3,z4,z5); s
s<-s[-1,];s
shift<-3
scenari<-s*shift;scenari
scenario_1<-scenari[1];scenario_1
genereting_fuction<-function(n){
sample<-list()
for (i in 1:nrow(scenario_1)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(10, 100), genereting_fuction)
dati_fault[[1]]
[[1]]
X1 X2 X3 X4 X5
[1,] 2.505826 1.736593 1.0274581 -0.6038358 1.9967656
[2,] 4.127593 3.294344 2.8777777 1.2386725 3.0207723
[3,] 1.853050 1.312617 1.1875699 0.5994921 1.0471564
[4,] 4.481019 3.330629 2.1880050 -0.1087338 2.7331061
[5,] 3.916191 3.306036 0.7258404 -1.1388570 1.0293168
[6,] 3.335131 2.379439 1.2407679 0.3198553 1.6755424
[7,] 3.574675 3.769436 1.1084120 -1.0065481 2.0034434
[8,] 3.203620 2.842074 0.6550587 -0.8516120 -0.1433508
[9,] 2.552959 2.642094 2.5376430 2.0387860 3.5318055
[10,] 2.656474 1.607934 2.2760391 -1.3959822 1.0095796
I only want to save the elements of X1 in an object, and so for the other variables. .
Here you have a list of matrix with scenario in row and n columns.
genereting_fuction <- function(n, scenario, scenari){
# added argument because you assume global variable use
nr <- nrow(scenario)
sample <- vector("list", length = nr) # sample<-list()
# creating a list is better than expanding it each iteration
for (i in 1:nr){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(3, 2), function(x) genereting_fuction(x, scenario_1, scenari))
dati_fault
lapply(dati_fault, function(x) {
tmp <- lapply(x, function(y) y[,"X1"])
tmp <- do.call(rbind, tmp)
})
If you want to assemble this list of matrix, like using cbind, I suggest you just use a single big n value and not the lapply with rep inside it.
Also I bet there is easier way to simulate this number of scenari, but it's difficult to estimate without knowing the context of your code piece.
Also, try to solve your issue with a minimal example, working with a list of 100 list of 32 matrix of 5*10 is a bit messy !
Good luck !
Related
I have several columns of data that I want to use a for loop (specifically a for loop. Please, no answers that don't involve a for loop) to run a function for each column in a matrix.
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
x y z
[1,] 9.5262742 16.22999 21.93228
[2,] 5.8183264 14.53771 21.81774
[3,] 3.9509342 17.36694 22.46594
[4,] 3.0245614 19.46411 25.80411
[5,] 5.0284351 13.89636 21.61767
[6,] 3.0291715 17.50267 26.28110
[7,] 8.4727471 16.77365 27.60535
[8,] 3.3816903 15.23395 22.01265
[9,] 0.3182083 13.97575 29.25909
[10,] 2.6499290 16.71129 27.05160
for (i in 1:ncol(tab)){
print(mean(i))
}
I have almost no familiarity with R and have had trouble finding a solution that specifically uses a for loop to run a function and output a result per column.
Well, strictly using a for loop, I think this would do what you want to!
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
for (i in 1:ncol(tab)){
print(mean(tab[, i]))
}
You need to index the matrix by using [row, column]. When you want to select all rows for a specific column (which is your case), just leave the row field empty. So that's why you have to use [, i], where i is your index.
I would like to please ask for your help concerning the following issue.
In a table-like object where each row corresponds to an observation in time, I would like to obtain the value from the previous row for one particular variable (:= p0), multiply it with an element of another column (:= returnfactor) and write the result to the current row as an element of another column (:= p1).
Illustrated via two pictures, I want to go from
to
.
I have written
matrix <- cbind (
1:10,
1+rnorm(10, 0, 0.05),
NA,
NA
)
colnames(matrix) <- c("timeid", "returnfactor", "p0", "p1")
matrix[1, "p0"] <- 100
for (i in 1:10)
{
if (i==1)
{
matrix[i, "p1"] <- matrix[1, "p0"] * matrix[i, "returnfactor"]
}
else
{
matrix[i, "p0"] <- matrix[i-1, "p1"]
matrix[i, "p1"] <- matrix[i, "p0"] * matrix[i, "returnfactor"]
}
}
That is, I implemented what I would like to reach using a loop. However, this loop is too slow. Obviously, I am new to R.
Could you please give me a hint how to improve the speed using the capabilities R has to offer? I assume there is no need for a loop here, though I lack an approach how to do it else. In SAS, I used its reading of data frames by row and the retain-statement in a data step.
Yours sincerely,
Sinistrum
We can indeed improve this. The key thing to notice is that values of both p0 and p1 involve mostly cumulative products. In particular, we have
mat[, "p1"] <- mat[1, "p0"] * cumprod(mat[, "returnfactor"])
mat[-1, "p0"] <- head(mat[, "p1"], -1)
where head(mat[, "p1"], -1) just takes all the mat[, "p1"] except for its last element. This gives
# timeid returnfactor p0 p1
# [1,] 1 0.9903601 100.00000 99.03601
# [2,] 2 1.0788946 99.03601 106.84941
# [3,] 3 1.0298117 106.84941 110.03478
# [4,] 4 0.9413212 110.03478 103.57806
# [5,] 5 0.9922179 103.57806 102.77200
# [6,] 6 0.9040545 102.77200 92.91149
# [7,] 7 0.9902371 92.91149 92.00440
# [8,] 8 0.8703836 92.00440 80.07913
# [9,] 9 1.0657001 80.07913 85.34033
# [10,] 10 0.9682228 85.34033 82.62846
I am using R to apply a self-written function, that takes as an input two numeric vectors plus a numeric parameter, over column margins of data frame. Each column in data frame is a numeric vector and I want to perform pairwise computations and create a matrix which has all possible combinations of the columns with indicated result of the computation. So essentially I want to generate a behaviour similar to the one yielded by cor() function.
# Data
> head(d)
1 2 3 4
1 -1.01035342 1.2490665 0.7202516 0.101467379
2 -0.50700743 1.4356733 0.9032172 -0.001583743
3 -0.09055243 0.4695046 2.4487632 -1.082570048
4 1.11230416 0.2885735 0.3534247 -0.728574628
5 -1.96115691 0.4831158 1.5650052 0.648675605
6 1.20434218 1.7668086 0.2170858 -0.161570792
> cor(d)
1 2 3 4
1 1.00000000 0.08320968 -0.06432155 0.04909430
2 0.08320968 1.00000000 -0.04557743 -0.01092765
3 -0.06432155 -0.04557743 1.00000000 -0.01654762
4 0.04909430 -0.01092765 -0.01654762 1.00000000
I found this useful answer: Perform pairwise comparison of matrix
Based on this I wrote this function which makes use of another self-written function compareFunctions()
createProbOfNonEqMatrix <- function(df,threshold){
combinations <- combn(ncol(df),2)
predDF <- matrix(nrow = length(density(df[,1])$y)) # df creation for predicted values from density function
for(i in 1:ncol(df)){
predCol <- density(df[,i])$y # convert df of original values to df of predicted values from density function
predDF <- cbind(predDF,predCol)
}
predDF <- predDF[,2:ncol(predDF)]
colnames(predDF) <- colnames(df) # give the predicted values column names as in the original df
predDF <- as.matrix(predDF)
out.mx <- apply( X=combinations,MARGIN = 2,FUN = "compareFunctions",
predicted_by_first = predDF[,combinations[1]],
predicted_by_second = predDF[,combinations[2]],
threshold = threshold)
return(out.mx)
}
The predicted_by_first, predicted_by_second and threshold are inputs for compareFunctions. However I get the following error:
Error in FUN(newX[, i], ...) : unused argument (newX[, i])
In desperation I tried this:
createProbOfNonEqMatrix <- function(df,threshold){
combinations <- combn(ncol(df),2)
predDF <- matrix(nrow = length(density(df[,1])$y))
for(i in 1:ncol(df)){
predCol <- density(df[,i])$y
predDF <- cbind(predDF,predCol)
}
predDF <- predDF[,2:ncol(predDF)]
colnames(predDF) <- colnames(df)
predDF <- as.matrix(predDF)
out.mx <- apply(
X=combinations,MARGIN = 2,FUN = function(x) {
diff <- abs(predDF[,x[1]]-predDF[,x[2]])
boolean <- diff<threshold
acceptCount <- length(boolean[boolean==TRUE])
probability <- acceptCount/length(diff)
return(probability)
}
)
return(out.mx)
}
It does seem to be working but instead of returning the pairwise matrix it gives me a vector:
> createProbOfNonEqMatrix(d,0.001)
[1] 0.10351562 0.08203125 0.13476562 0.13085938 0.14843750 0.10937500
Will you be able to guide me on how to make the desired pairwise matrix even if it implies writing the function code again within apply()? Also, if you could give me an idea on how to keep track of what pairwise comparisons are performed it will be greatly appreciated.
Thank you,
Alex
Your output gives you the result of the calculation in the order of the pairs in combinations: (1,2), (1,3), (1,4), (2,3), (2,4), (3,4). If you want to organise this into a symmetric square matrix you can do a basic manipulation on the result, e.g. as follows:
out.mx<-c(0.10351562, 0.08203125, 0.13476562, 0.13085938, 0.14843750, 0.10937500)
out.mtx<-matrix(nrow=ncol(df1),ncol=ncol(df1))
out.mtx[,]<-1
for (i in 1:length(combinations[1,])){
a<-combinations[1,i]
b<-combinations[2,i]
out.mtx[a,b]<-out.mtx[b,a]<-out.mx[i]
}
out.mtx
which gives you
[,1] [,2] [,3] [,4]
[1,] 1.00000000 0.1035156 0.08203125 0.1347656
[2,] 0.10351562 1.0000000 0.13085938 0.1484375
[3,] 0.08203125 0.1308594 1.00000000 0.1093750
[4,] 0.13476562 0.1484375 0.10937500 1.0000000
I wish to combine equivalent, deeply-nested columns from all elements of a reasonably long list. What I would like to do, though it's not possible in R, is this:
combined.columns <- my.list[[1:length(my.list)]]$my.matrix[,"my.column"]
The only thing I can think of is to manually type out all the elements in cbind() like this:
combined.columns <- cbind(my.list[[1]]$my.matrix[,"my.column"], my.list[[2]]$my.matrix[,"my.column"], . . . )
This answer is pretty close to what I need, but I can't figure out how to make it work for the extra level of nesting.
There must be a more elegant way of doing this, though. Any ideas?
Assuming all your matrices have the same column name you wish to extract you could use sapply
set.seed(123)
my.list <- vector("list")
my.list[[1]] <- list(my.matrix = data.frame(A=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[2]] <- list(my.matrix = data.frame(C=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
my.list[[3]] <- list(my.matrix = data.frame(D=rnorm(10,sd=0.3), B=rnorm(10,sd=0.3)))
sapply(my.list, FUN = function(x) x$my.matrix[,"B"])
Free data:
myList <- list(list(myMat = matrix(1:10, 2, dimnames=list(NULL, letters[1:5])),
myVec = 1:10),
list(myMat = matrix(10:1, 2, dimnames=list(NULL, letters[1:5])),
myVec = 10:1))
We can get column a of myMat a few different ways. Here's one that uses with.
sapply(myList, with, myMat[,"a"])
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9
This mapply one might be better for a more recursive type problem. It works too and might be faster than sapply.
mapply(function(x, y, z) x[[y]][,z] , myList, "myMat", "a")
# [,1] [,2]
# [1,] 1 10
# [2,] 2 9
Consider a minimum working example (for, e.g. a binomial model):
test.a.tset <- rnorm(10)
test.b.tset <- rnorm(10)
c <- runif(10)
c[c < 0.5] <- 0
c[c >= 0.5] <- 1
df <- data.frame(test.a.tset,test.b.tset,c)
Using a regex, I want to regress c on all variables with the structure test."anything".tset:
summary(glm(paste("c ~ ",paste(colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
collapse = "+"), sep = ""), data = df, family=binomial))
So far, no problems. Now we get to the part where cbind comes into play. Suppose I want to use a different statistical model (e.g. rbprobitGibbs from the bayesm package), which requires a design matrix as input.
Thus, I need to transform the data frame into the appropriate format.
X <- cbind(df$test.a.tset,df$test.b.tset)
Or, alternatively, if I want to use regex again (where I even add a second grep to ensure that only the part inside the quotation marks is selected):
X2 <- cbind(grep("[^\"]+",paste(paste("df$", colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
sep = ""), collapse = ","), value = TRUE))
But there is a difference:
> X
[,1] [,2]
[1,] -0.4525601 -1.240484170
[2,] 0.3135625 1.240519383
[3,] -0.2883953 -0.554670224
[4,] -1.3696994 -1.373690426
[5,] 0.8514529 -0.063945537
[6,] -1.1804205 -0.314132743
[7,] -1.0161170 -0.001605679
[8,] 1.0072168 0.938921869
[9,] -0.8797069 -1.158626865
[10,] -0.9113297 1.641201924
> X2
[,1]
[1,] "df$test.a.tset,df$test.b.tset"
From my point of view the problem seems to be that grep returns the selected value as a string inside quotation marks and that, while glm sort of ignores the quotation marks in "df$test.a.tset,test.b.tset", cbind does not.
I.e. the call for X2 after the paste is actually read as:
X2 <- cbind("df$test.a.tset,df$test.b.tset")
Question: Is there a way to get the same result for X2 as for X using a regex?
The code grep("test\\.\\w+\\.tset", colnames(df)) will return the indexes of columns that match your pattern. If you wanted to build a matrix using just those columns, you could just use:
X3 <- as.matrix(df[,grep("test\\.\\w+\\.tset", colnames(df))])