I am trying to store the results of the the code below, however I could only come up with a solution to save the results of the model with the smallest sum of squared residuals. This was useful until the results were in the limits of the range of both c and gamma, therefore I need to assess the characteristics of other points. For this I need to store the results of every iteration. Does anyone know how to do this in this case?
Thanks in advance!
dlpib1 <- info$dlpib1
scale <- sqrt(var(dlpib1))
RSS.m <- 10
for (c in seq(-0.03,0.05,0.001)){
for (gamma in seq(1,100,0.2))
{
trans <- (1+exp(-(gamma/scale)*(dlpib1-c)))^-1
grid.regre <-lm(dlpib ~ dlpib1 + dlpib8 + trans + trans*dlpib1 +
+ I(trans*dlpib4) ,data=info)
coef <- grid.regre$coefficients
RSS <- sum(grid.regre$residuals^2)
if (RSS < RSS.m){
RSS.m <- RSS
gamma.m <- gamma
c.m <- c
coef.m <- coef
}
}
}
grid <- c(RSS=RSS.m,gamma=gamma.m,c=c.m,coef.m)
grid`
The easiest way to store model results by iterations is in a list:
List = list()
for(i in 1:100)
{
LM = lm(rnorm(10)~rnorm(10))
List[[length(List)+1]] = LM
}
You can probably avoid the for loop altogether. However, as for how to accomplish your task, you simply need to index whatever object you are storing the value in. For example,
# outside the for loop
trans <- list()
# inside the for loop
trans[[paste(gamma, c, sep="_")]] <- ...
I'm pretty sure to save all iterations of the RSS's you could do something like this:
dlpib1 <- info$dlpib1
scale <- sqrt(var(dlpib1))
RSS.m <- rep(0,N)
coef <- rep(0,N)
i <- 0
for (c in seq(-0.03,0.05,0.001)){
for (gamma in seq(1,100,0.2))
{
trans <- (1+exp(-(gamma/scale)*(dlpib1-c)))^-1
grid.regre <-lm(dlpib ~ dlpib1 + dlpib8 + trans + trans*dlpib1 +
+ I(trans*dlpib4) ,data=info)
coef <- grid.regre$coefficients
RSS.m[i] <- sum(grid.regre$residuals^2)
i=i+1
}
}
}
Related
I was wondering if I could create a for loop where i goes up by decimals. I have tried writing:
for (i in seq(2,6,.1))
{
data1 <- data[data$x1 > i,]
model <- lm(y~x1, data = data1)
r = summary(model1)$r.squared
result[[i]] = r
}
but the result only gives 5 observations from taking only the integers from 2-6.
Is there a way to get around this.
result[[i]] inside your loop will never work with decimal values of i,
because list indexes must be integers.
Other than that, you can loop in increments of .1, if you change the way you think about .1 increments:
for (i in seq(20, 60)) {
div <- i / 10
data1 <- data[data$x1 > div,]
model <- lm(y~x1, data = data1)
result[[i]] = summary(model1)$r.squared
}
The way you did that is not the best in R. That way is better (but not the best). However it is close to your original code.
data = data.frame(y = runif(100), x1 = runif(1000, 1,7))
f = function(x, data)
{
data1 <- data[data$x1 > x,]
model <- lm(y ~ x1, data = data1)
r <- summary(model)$r.squared
return(r)
}
results = lapply(seq(2,6,.1), f, data)
Consider a hypothetical example:
sim <- function(n,p){
x <- rbinom(n,1,p)
y <- (x==0) * rnorm(n)
z <- (x==1) * rnorm(n,5,2)
dat <- data.frame(x, y, z)
return(dat)
}
Now I want to write another function simfun where I will call the above sim function and check if y and z columns of the data frame is less than a value k.
simfun <- function(n, p, k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
But is it standard to use the argument of sim as the argument of simfun? Can I write simfun <- function(k) and call the sim function inside simfun?
I'd say it's fairly standard to do this sort of thing in R. A few pointers to consider:
Usually you should explicitly declare the argument names so as not to create any unwanted behaviour if changes are made. I.e., instead of sim(n, p), write sim(n = n, p = p).
To get simfun() down to just a k argument will require default values for n and p. There are lots of ways to do this. One way would be to hardcode inside simfun itself. E.g.:
simfun <- function(k) {
dat <- sim(n = 100, p = c(.4, .6))
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
A more flexible way would be to add default values in the function declaration. When you do this, it's good practice to put variables with default values AFTER variables without default values. So k would come first as follow:
simfun <- function(k, n = 100, p = c(.4, .6)){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
The second option is generally preferable because you can still change n or p if you need to.
While not great, you could define n and p separately
n <- 1
p <- .5
simfun <- function(k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
You can read more about R Environments here: http://adv-r.had.co.nz/Environments.html
I'm pretty new in R and i'm stuck with one problem.
I've already found how to create many linear models at once, i made a function that counts AIC for each lm, but I cannot display this function with header that will show the name of the lm. I mean i want to get a data frame with header e.g. lm(a~b+c, data=data), and the AIC result for this lm.
Here's what i already wrote (with big help from stackoverflow, of course)
vars <- c("azot_stand", "przeplyw", "pH", "twardosc", "fosf_stand", "jon_stand", "tlen_stand", "BZO_stand", "spadek_stand")
N <- list(1,2,3,4,5,6,7,8)
COMB <- sapply(N, function(m) combn(x=vars[1:8], m))
COMB2 <- list()
k=0
for(i in seq(COMB)){
tmp <- COMB[[i]]
for(j in seq(ncol(tmp))){
k <- k + 1
COMB2[[k]] <- formula(paste("azot_stand", "~", paste(tmp[,j], collapse=" + ")))
}
}
res <- vector(mode="list", length(COMB2))
for(i in seq(COMB2)){
res[[i]] <- lm(COMB2[[i]], data=s)
}
aic <- vector(mode="list", length(COMB2))
d=0
for(i in seq(res)){
aic[[i]] <- AIC(res[[i]])
}
View(aic)
show(COMB2)
I guess that i miss something in the aic, but don't know what...
With formula you can obtain the formula of a regression model. Since you want to store the formula with the AIC, I would create a data.frame containing both:
aic <- data.frame(model = character(length(res)), aic = numeric(length(res)),
stringsAsFactors = FALSE)
for(i in seq(res)){
aic$model[i] <- deparse(formula(res[[i]]), width.cutoff = 500)
aic$aic[i] <- AIC(res[[i]])
}
Normally you would use format to convert a formula to a character. However, for long formulas this results in multiple lines. Therefore, I use deparse (which is also used by format) and passed it the width.cutoff argument.
You cannot use res[[i]]$call as this is always equal to lm(formula = COMB2[[i]], data = s).
Other suggestions
The first part of your code can be simplified. I would write something like:
s <- attitude
vars <- names(attitude)[-1]
yvar <- names(attitude)[1]
models <- character(0)
for (i in seq_along(vars)) {
comb <- combn(vars, i)
models <- c(models,
paste(yvar, " ~ ", apply(comb, 2, paste, collapse=" + ")))
}
res <- lapply(models, function(m) lm(as.formula(m), data = s))
It is shorter and also has the advantage that magical constants such as the 8 and azot_stand are defined outside the main code and can easily be modified.
I also noticed that you use azot_stand both as target variable and predictor (it is also part of vars). I don't think you will want to do that.
I get an error for running the code below. I haven not figured out what I am doing wrong - sorry if it is obvious, I am new to R. The idea is to "generate" 100 regressions and output the estimated slope 100 times.
set.seed(21)
x <- seq(1,40,1)
for (i in 1:100 ) {
y[i] = 2*x+1+5*rnorm(length(x))
reg[i] <- lm(y[i]~x)
slp[i] <- coef(reg[i])[2]
}
There are several problems with the way you use indexing. You'll probably need to spend some time again on a short tutorial about R for beginners, and not "rush" to loops and regressions...
In the end, you want to have a vector containing 100 slope values. You need to define this (empty) vector 'slp' prior to running the loop and then fill each ith element with its value in the loop.
On the other hand,
1) at each iteration you don't fill the ith element of y but create a whole new vector y with as many values as there are in x...
2) you don't need to keep every regression so you don't need to "index" your object reg.
So here it is:
set.seed(21)
x <- seq(1,40,1)
slp=rep(NA,100)
for (i in 1:100) {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i]<-coef(reg)[2]
}
print(slp)
In addition to the other answers, there is a better (more efficient and easier) possibility. lm accepts a matrix as input for y:
set.seed(21)
y <- matrix(rep(2*x + 1, 100) + 5 *rnorm(length(x) * 100), ncol = 100)
reg1 <- lm(y ~ x)
slp1 <- coef(reg1)[2,]
all.equal(slp, slp1)
#[1] TRUE
If you had a function other than lm and needed a loop, you should use replicate instead of a for loop:
set.seed(21)
slp2 <- replicate(100, {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
unname(coef(reg)[2])
})
all.equal(slp, slp2)
#[1] TRUE
You need to create the matrix/vector y, reg, slp first, to be able to write to position i like: y[i] <-. You can do something along:
set.seed(21)
x <- seq(1,40,1)
slp <- numeric(100)
for (i in 1:100 ) {
y <- 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i] <- coef(reg)[2]
}
> slp
[1] 2.036344 1.953487 1.949170 1.961897 2.098186 2.027659 2.002638 2.107278
[9] 2.036880 1.980800 1.893701 1.925230 1.927503 2.073176 2.101303 1.943719
...
[97] 1.966039 2.041239 2.063801 2.066801
I am trying to create an R code that puts another loop inside of the one I've already created. Here is my code:
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f <- length(t [! t %in% remove]) + 1
}
When I run this code, I get a number a trials it takes f to reach the zero vector, but I want to do 10000 iterations of this experiment.
replicate is probably how you want to run the outer loop. There's also no need for the f assignment to be inside the loop. Here I've moved it outside and converted it to simply count of the elements of t that are greater than 0, plus 1.
result <- replicate(10000, {
t <- rep(1, 1000)
omega <- 1:12
for(i in seq_along(t)) {
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
}
sum(t > 0) + 1
})
I suspect your code could be simplified in other ways as well, and also that you could just write down the distribution that you're looking for without simulation. I believe your variable of interest is just how long until you get at least one of each of the numbers 1:12, yes?
Are you just looking to run your existing loop 10,000 times, like below?
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
f <- rep(NA, 10000)
for(j in 1:10000) {
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f[j] <- length(t [! t %in% remove]) + 1
}
}