I'm trying to get a matrix of values using replicate(), but when the if() statement returns a NULL, that creates a list instead of a matrix. I've spent time reading web pages and Questions here but just can't seem to get something that works. I've tried variations on invisible and sink, but still haven't been able to get an output that doesn't return a NULL value.
Here is an example that gives a NULL value for the 5th entry.
How do I get the if() statement to not return anything, including NULL?
set.seed(10)
reps <- 10
f_myfun2 <- function(i){
x1 <- rep(1:4, each=5)
x2 <- rep(1:5, times=4)
n <- length(x1)
y <- 0.20 + 0.30*x1 + 0.7*x2 + 0.50*rnorm(n)
cis <- confint(lm( y ~ x1*x2 ))
int_lower <- cis[4,1]
int_upper <- cis[4,2]
if(int_lower > 0 | int_upper < 0){
# I don't want to return anything, including a NULL value
# tried various things including invisible, sink, etc
}
else{
cis2 <- confint(lm( y ~ x1 + x2))
c(cis2[2,], cis2[3,])
}
}
sims <- replicate(reps, f_myfun2(1))
sims # [[5]] is NULL rather than just missing
str(sims) # now it's a list rather than a matrix without the NULL
is(sims)
If your final goal is to combine all of them into one dataframe/matrix you don't have to worry about those NULL values when you combine they are automatically removed.
set.seed(10)
sims <- replicate(reps, f_myfun2(1))
result <- do.call(rbind, sims)
result
# 2.5 % 97.5 % 2.5 % 97.5 %
# [1,] 0.26428935 0.5953288 0.5766971 0.8384067
# [2,] 0.20815417 0.5763875 0.5750148 0.8661288
# [3,] 0.16616864 0.5437533 0.5694641 0.8679710
# [4,] 0.05366132 0.5055326 0.5076891 0.8649247
# [5,] 0.26292580 0.6246576 0.6441824 0.9301565
# [6,] 0.21173249 0.5972766 0.5297051 0.8345045
# [7,] -0.01499442 0.5077043 0.5399975 0.9532272
# [8,] 0.23639931 0.5871463 0.5270991 0.8043890
# [9,] -0.09686737 0.2529322 0.4378804 0.7144212
Related
I have created a list whose elements are themselves a list of matrices. I want to be able to extract the vectors of observations for each variable
p13 = 0.493;p43 = 0.325;p25 = 0.335;p35 = 0.574;p12 = 0.868
std_e2 = sqrt(1-p12^2)
std_e3 = sqrt(1-(p13^2+p43^2))
std_e5 = sqrt(1-(p25^2+p35^2+2*p25*p35*(p13*p12)))
set.seed(1234)
z1<-c(0,1)
z2<-c(0,1)
z3<-c(0,1)
z4<-c(0,1)
z5<-c(0,1)
s<-expand.grid(z1,z2,z3,z4,z5); s
s<-s[-1,];s
shift<-3
scenari<-s*shift;scenari
scenario_1<-scenari[1];scenario_1
genereting_fuction<-function(n){
sample<-list()
for (i in 1:nrow(scenario_1)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(10, 100), genereting_fuction)
dati_fault[[1]]
[[1]]
X1 X2 X3 X4 X5
[1,] 2.505826 1.736593 1.0274581 -0.6038358 1.9967656
[2,] 4.127593 3.294344 2.8777777 1.2386725 3.0207723
[3,] 1.853050 1.312617 1.1875699 0.5994921 1.0471564
[4,] 4.481019 3.330629 2.1880050 -0.1087338 2.7331061
[5,] 3.916191 3.306036 0.7258404 -1.1388570 1.0293168
[6,] 3.335131 2.379439 1.2407679 0.3198553 1.6755424
[7,] 3.574675 3.769436 1.1084120 -1.0065481 2.0034434
[8,] 3.203620 2.842074 0.6550587 -0.8516120 -0.1433508
[9,] 2.552959 2.642094 2.5376430 2.0387860 3.5318055
[10,] 2.656474 1.607934 2.2760391 -1.3959822 1.0095796
I only want to save the elements of X1 in an object, and so for the other variables. .
Here you have a list of matrix with scenario in row and n columns.
genereting_fuction <- function(n, scenario, scenari){
# added argument because you assume global variable use
nr <- nrow(scenario)
sample <- vector("list", length = nr) # sample<-list()
# creating a list is better than expanding it each iteration
for (i in 1:nr){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(3, 2), function(x) genereting_fuction(x, scenario_1, scenari))
dati_fault
lapply(dati_fault, function(x) {
tmp <- lapply(x, function(y) y[,"X1"])
tmp <- do.call(rbind, tmp)
})
If you want to assemble this list of matrix, like using cbind, I suggest you just use a single big n value and not the lapply with rep inside it.
Also I bet there is easier way to simulate this number of scenari, but it's difficult to estimate without knowing the context of your code piece.
Also, try to solve your issue with a minimal example, working with a list of 100 list of 32 matrix of 5*10 is a bit messy !
Good luck !
I have created a function to order a vector of length 2, using the following code
x = (c(6,2))
orders = function(x){
for(i in 1:(length(x)-1)){
if(x[i+1] < x[i]){
return(c(x[i+1], x[i]))} else{
(return(x))
}}}
orders(x)
I have been asked to use this function to process a dataset with 2 columns as follows. Iterate over the rows of the
data set, and if the element in the 2nd column of row i is less than the element in the first
column of row i, switch the order of the two entries in the row by making a suitable call to
the function you just wrote.
I've tried using the following code
set.seed(1128719)
data=matrix(rnorm(20),byrow=T,ncol=2)
df = for (i in 1:2) {
for(j in 1:10){
data = orders(c(x[i], x[j]))
return(data)
}
}
The output is null. I'm not quite sure where I'm going wrong.
Any suggestions?
I modified your code a bit but tried to keep the 'style' the same
Ther is no need for a loop
i in 1:(length(x)-1) always evaluates to
for i in 1:1 and i will only take the value of 1.
orders = function(x){
# Since the function will only work on vectors of length 2
# its good practice to raise an error right at the start
#
if (length(x) != 2) {
stop("x must be vector of lenght 2")
}
if (x[2] < x[1]) {
return(c(x[2], x[1]))
} else {
return(x)
}
}
orders(c(6, 2))
set.seed(1128719)
data <- matrix(rnorm(20),byrow=T,ncol=2)
The for loop itself cant be assigned to a variable
But we use the loop to mutate the matrix 'data'
in place
for (row in 1:nrow(data)) {
data[row, ] <- orders(data[row,])
}
data
Edit:
This is the input:
[,1] [,2]
[1,] -0.04142965 0.2377140
[2,] -0.76237866 -0.8004284
[3,] 0.18700893 -0.6800310
[4,] 0.76499646 0.4430643
[5,] 0.09193440 -0.2592316
[6,] 1.17478053 -0.4044760
[7,] -1.62262500 0.1652850
[8,] -1.54848857 0.7475451
[9,] -0.05907252 -0.8324074
[10,] -1.11064318 -0.1148806
This is the output i get:
[,1] [,2]
[1,] -0.04142965 0.23771403
[2,] -0.80042842 -0.76237866
[3,] -0.68003104 0.18700893
[4,] 0.44306433 0.76499646
[5,] -0.25923164 0.09193440
[6,] -0.40447603 1.17478053
[7,] -1.62262500 0.16528496
[8,] -1.54848857 0.74754509
[9,] -0.83240742 -0.05907252
[10,] -1.11064318 -0.11488062
Here are two ways of ordering the 2 columns matrix.
This is the test matrix posted in the question.
set.seed(1128719)
data <- matrix(rnorm(20), byrow = TRUE, ncol = 2)
1. With a function orders.
The function expects as input a 2 element vector. If they are out of order, return the vector with its elements reversed, else return the vector as is.
orders <- function(x){
stopifnot(length(x) == 2)
if(x[2] < x[1]){
x[2:1]
}else{
x
}
}
Test the function.
x <- c(6,2)
orders(x)
#[1] 2 6
Now with the matrix data.
df1 <- t(apply(data, 1, orders))
2. Vectorized code.
Creates a logical index with TRUE whenever the elements are out of order and reverse only those elements.
df2 <- data
inx <- data[,2] < data[,1]
df2[inx, ] <- data[inx, 2:1]
The results are the same.
identical(df1, df2)
#[1] TRUE
I would like to please ask for your help concerning the following issue.
In a table-like object where each row corresponds to an observation in time, I would like to obtain the value from the previous row for one particular variable (:= p0), multiply it with an element of another column (:= returnfactor) and write the result to the current row as an element of another column (:= p1).
Illustrated via two pictures, I want to go from
to
.
I have written
matrix <- cbind (
1:10,
1+rnorm(10, 0, 0.05),
NA,
NA
)
colnames(matrix) <- c("timeid", "returnfactor", "p0", "p1")
matrix[1, "p0"] <- 100
for (i in 1:10)
{
if (i==1)
{
matrix[i, "p1"] <- matrix[1, "p0"] * matrix[i, "returnfactor"]
}
else
{
matrix[i, "p0"] <- matrix[i-1, "p1"]
matrix[i, "p1"] <- matrix[i, "p0"] * matrix[i, "returnfactor"]
}
}
That is, I implemented what I would like to reach using a loop. However, this loop is too slow. Obviously, I am new to R.
Could you please give me a hint how to improve the speed using the capabilities R has to offer? I assume there is no need for a loop here, though I lack an approach how to do it else. In SAS, I used its reading of data frames by row and the retain-statement in a data step.
Yours sincerely,
Sinistrum
We can indeed improve this. The key thing to notice is that values of both p0 and p1 involve mostly cumulative products. In particular, we have
mat[, "p1"] <- mat[1, "p0"] * cumprod(mat[, "returnfactor"])
mat[-1, "p0"] <- head(mat[, "p1"], -1)
where head(mat[, "p1"], -1) just takes all the mat[, "p1"] except for its last element. This gives
# timeid returnfactor p0 p1
# [1,] 1 0.9903601 100.00000 99.03601
# [2,] 2 1.0788946 99.03601 106.84941
# [3,] 3 1.0298117 106.84941 110.03478
# [4,] 4 0.9413212 110.03478 103.57806
# [5,] 5 0.9922179 103.57806 102.77200
# [6,] 6 0.9040545 102.77200 92.91149
# [7,] 7 0.9902371 92.91149 92.00440
# [8,] 8 0.8703836 92.00440 80.07913
# [9,] 9 1.0657001 80.07913 85.34033
# [10,] 10 0.9682228 85.34033 82.62846
I have a database with columns theme (value 0 or 1), level (value 1 to 9) and startTime (double value). For every level, I want to perform a t-test on the startTime values. Here is my code:
database <- read.csv("database.csv")
themeData <- database[database$theme == 1, ]
noThemeData <- database[database$theme == 0, ]
for (i in 1:9) {
x <- themeData[themeData$level == i, ]
y <- noThemeData[noThemeData$level == i, ]
t.test(x$startTime,y$startTime,
alternative = "less")
}
Unfortunately, no t-tests are being executed. In the end, x and y simply get the value for i=9. What am I doing wrong?
Your code is doing busy work: it is doing the calculations of the t.test, but since for loops always discard their implied results, you aren't storing it anywhere. You would have had to use a vector or list (pre-allocated is always better) like so:
res <- replicate(9, NULL)
for (i in 1:9) {
x <- themeData[themeData$level == i, ]
y <- noThemeData[noThemeData$level == i, ]
res[[i]] <- t.test(x$startTime,y$startTime,
alternative = "less")
}
res[[2]]
This can be "good enough" in that it is saving all test "results objects" in a list for later processing/consumption. A slightly better method is to use one of the *apply functions; the first two I think of that are directly applicable here (lapply, sapply(..., simplify=FALSE)) have various minor advantages, frankly you can choose either.
res <- lapply(c(4, 6, 8), function(thiscyl) {
am0 <- subset(mtcars, am == 0 & cyl == thiscyl)
am1 <- subset(mtcars, am == 1 & cyl == thiscyl)
t.test(am0$mpg, am1$mpg)
})
This is especially beneficial if (unlike here) the tests take a long time: you perform the test and preserve the models, so you can so lots of things to the results without having to rerun the tests. For instance, if you wanted just the p-values:
sapply(res, `[`, "p.value")
# $p.value
# [1] 0.01801712
# $p.value
# [1] 0.187123
# $p.value
# [1] 0.7038727
or more tersely:
sapply(res, `[[`, "p.value")
# [1] 0.01801712 0.18712303 0.70387268
Another example, the confidence intervals, in a matrix:
t(sapply(res, `[[`, "conf.int"))
# [,1] [,2]
# [1,] -9.232108 -1.117892
# [2,] -3.916068 1.032735
# [3,] -2.339549 1.639549
You can always look at a single model with, say, res[[2]], but if you need to see all of them you can use just res and see the whole gamut.
res[[2]]
# Welch Two Sample t-test
# data: am0$mpg and am1$mpg
# t = -1.5606, df = 4.4055, p-value = 0.1871
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -3.916068 1.032735
# sample estimates:
# mean of x mean of y
# 19.12500 20.56667
Consider a minimum working example (for, e.g. a binomial model):
test.a.tset <- rnorm(10)
test.b.tset <- rnorm(10)
c <- runif(10)
c[c < 0.5] <- 0
c[c >= 0.5] <- 1
df <- data.frame(test.a.tset,test.b.tset,c)
Using a regex, I want to regress c on all variables with the structure test."anything".tset:
summary(glm(paste("c ~ ",paste(colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
collapse = "+"), sep = ""), data = df, family=binomial))
So far, no problems. Now we get to the part where cbind comes into play. Suppose I want to use a different statistical model (e.g. rbprobitGibbs from the bayesm package), which requires a design matrix as input.
Thus, I need to transform the data frame into the appropriate format.
X <- cbind(df$test.a.tset,df$test.b.tset)
Or, alternatively, if I want to use regex again (where I even add a second grep to ensure that only the part inside the quotation marks is selected):
X2 <- cbind(grep("[^\"]+",paste(paste("df$", colnames((df[, grep("test\\.\\w+\\.tset", colnames(df))])),
sep = ""), collapse = ","), value = TRUE))
But there is a difference:
> X
[,1] [,2]
[1,] -0.4525601 -1.240484170
[2,] 0.3135625 1.240519383
[3,] -0.2883953 -0.554670224
[4,] -1.3696994 -1.373690426
[5,] 0.8514529 -0.063945537
[6,] -1.1804205 -0.314132743
[7,] -1.0161170 -0.001605679
[8,] 1.0072168 0.938921869
[9,] -0.8797069 -1.158626865
[10,] -0.9113297 1.641201924
> X2
[,1]
[1,] "df$test.a.tset,df$test.b.tset"
From my point of view the problem seems to be that grep returns the selected value as a string inside quotation marks and that, while glm sort of ignores the quotation marks in "df$test.a.tset,test.b.tset", cbind does not.
I.e. the call for X2 after the paste is actually read as:
X2 <- cbind("df$test.a.tset,df$test.b.tset")
Question: Is there a way to get the same result for X2 as for X using a regex?
The code grep("test\\.\\w+\\.tset", colnames(df)) will return the indexes of columns that match your pattern. If you wanted to build a matrix using just those columns, you could just use:
X3 <- as.matrix(df[,grep("test\\.\\w+\\.tset", colnames(df))])