Counter in Variable Selection

Counter in Variable Selection - r

I am running the following code, which is working fine:
model <- NULL
summary <- NULL
stepwise <- NULL
for (i in 1:100){
model[[i]] <- lm(r[[i]]~x1[[i]]+x2[[i]]+x3[[i]]+noise1[[i]]+noise2[[i]]+noise3[[i]]+noise4[[i]]+noise5[[i]]+noise6[[i]]+noise7[[i]])
summary[[i]] <- summary(model[[i]])$coefficients
stepwise[[i]] <- step(model[[i]], direction="both")$coefficients
}
I wanted to set up a counter to keep track of the variables that are stored in the stepwise list. I want a count of how many times each variable (x1, x2, x3, noise1, noise2, noise3, noise4, noise5, noise6, noise7) occurs. I was thinking of something like this
createCounter <- function(VALUE){
for (i in 1:100){
output <- VALUE <- VALUE+i
return(output)
}
}
but I don't know how to fine-tune it so that R understands to count a value if the stepwise list contains the particular variable. Any help would be appreciated.

well, the step()$coefficients returns a named vector of the coefficients. While the values contain the actual coefficients, the names on the vector store the actual names of the variables. So you can extract and count all the variable names from all the 100 models with
table(unlist(lapply(stepwise, function(x) names(x))))

Related

Subsetting numeric variable but not factor in R

I would like to calculate Linear mixed-effect models via for loop where there is always hard-coded Y and random effect. The X variables (var.nam[i]) are to be loop through. I wrote the code and it is working (as I believe), but I would also like to subset X variable (var.nam[i]) depending on the X variable (var.nam[i]) type (numeric, factor) where:
when X variable (var.nam[i]) is numeric, exclude all observation equal to 0
when X variable (var.nam[i]) is factor, do not subset X variable (var.nam[i])
A short sample of my code is here:
for(i in 1:length(var.nam)) {
formula[i] <- paste0("Y", "~", paste0(c(var.nam[i], c("Season"),
c("Sex"),
c("Age"),
c("BMI"),
c("(1|HID)")), collapse="+"))
model <- lmer(formula[i], data = subset(data, paste0(c(var.nam[i])) != 0))
# loop continues...
}
As it is written now, it will subset all X variables (var.nam[i]) regardless of the type. Is there any workaround or different way to subset variable, that would work in this specific case?

Checking if this solution works is a bit hard without data or the complete for loop.
Based on your question you want to conditionally subset, adding a if else statement should make this possible:
for(i in 1:length(var.nam)) {
formula[i] <- paste0("Y", "~", paste0(c(var.nam[i], c("Season"),
c("Sex"),
c("Age"),
c("BMI"),
c("(1|HID)")), collapse="+"))
data1 <- if(mode(var.nam[i]) == "numeric") {subset(data, paste0(c(var.nam[i])) !=0)} else {data}
model <- lmer(formula[i], data = data1)
# loop continues...
}

R - Running a regression for tibbles identified by an id out of a dataframe in long format

I would like to loop over a dataframe data_all_long in long format that has time-series tuples identified by a stockID.
With unique() I stored all the different IDs in the data frame "uniqueIDs".
"Numberofrows" is the length of the list containing the time series tuples from above.
Ideally the tmp variable should store all the data for one specific ID out of the long list temporarily to calculate the regression for one specific ID and store it into a vector.
The overall outcome should be a vector with all the regression coefficients for the different IDs.
for(i in uniqueIDs){
for(j in 1:numberofrows){
tmp <- rbind(tmp,filter(data_all_long, stockId == i))
}
beta[,i] <- lm(mrf ~ stockreturn, data = tmp)
}
Does anyone here have any ideas?

For what I understand of the problem, the following might do it.
The trick is to split the data by ID and sapply an anonymous function to each of the resulting data frames. This function will fit the models and extract the coefficients.
sp <- split(data_all_long, data_all_long$ID)
beta <- sapply(sp, function(tmp){
fit <- lm(mrf ~ stockreturn, data = tmp)
coef(fit)
})

R nested for-loop only producing result with last column in dataframe

CONTEXT:
I'm working with respondent-level survey data. Each row of my
data frame represents an individual person's survey responses.
My data frame consists of individual-level utility estimates from a Maximum Difference experiment AND
categorical variables indicating in which of several subgroups an individual survey respondent
resides.
Each subgroup variable is a single categorical variable with exactly two levels. However, in my
desired output, I'd like a data frame where each level of each subgroup has its own column.
OBJECTIVE:
I want to create a function that, for each user-defined subgroup, will conduct recursive T Tests over
every Maximum Difference item in the data frame, extract elements of the T Test output, and store the
elements in a data frame
Using T statistic results as an example, the end result should look like this:
Males_T_stat Females_T_stat
MD_item1 2.71 2.5
MD_item2 1.71 1.5
MD_item3 0.71 0.5
CURRENT CODE:
Right now, I'm focused on writing code to iteratively execute the T Tests and store each test's entire
output object in a list. The code I've used to, unsuccessfully, attempt this is below:
Create a test data frame:
dat <- data.frame(
md1 = 1:60,
gender = factor(rep(c("m", "f"), 30)),
generation = factor(rep(c("a", "b"), 30)),
md2 = 61:120
)
Specify the names of my respondent subgroup (i.e., the categorical variables).
groupnames <- c("gender", "generation")
item_vec <- dat %>% select(contains(("md")))
group_vec <- dat[groupnames]
Convert the subgroup name vectors to data frames. this step may be superfluous, but I'm more comfortable working with data frames.
item_vec <- data.frame(item_vec)
group_vec <- data.frame(group_vec)
So far, I've tried using nested for loops to run the T Tests and store each test output in a list. This code partially works; for each subgroup named in "group_vec", the code produces T Test results for the last item in "item_vec" only. However, I want the results for EVERY item in "item_vec", which is where I've currently stalled.
res <- list()
for (i in 1:length(group_vec)) {
res[[i]] <- list(test)
for (j in 1:length(item_vec)) {
test <- (t.test(item_vec[[j]] ~ group_vec[[i]]))
res[i] <- list(test)
}
}
res
Thank you in advance for any help you can provide!

In the nested loop, replace
res[i] <- list(test)
with
res[[i]][[j]] <- list(test)
as the 'j' is loop over the item_vec. If we just assign it to res[[i]] or res[i], for every item_vec in the 'group_vec', it just updates/replace the previous with the next and as there is nothing to update after the last, the last one remains for each 'group_vec'
Also, it may be better to initialize res as
res <- vector('list', length(group_vec))
and then make the changes as in the for loop
for (i in 1:length(group_vec)) {
res[[i]] <- list(test)
for (j in 1:length(item_vec)) {
test <- (t.test(item_vec[[j]] ~ group_vec[[i]]))
res[[i]][[j]] <- list(test)
}
}

Creating for loops in R using subset data

I recently started programming in R, and am trying to compute slopes for a data set. This is my code:
slopes<- vector()
gdd.values <- length(unique(data.gdd$GDD))
for (i in 1:gdd.values){
subset.data <- data.gdd[which(data.gdd$GDD==i),]
volume <- apply(subset.data[,4,6],1,prod)
species.richness <- apply(subset.data[,7:59],1,sum)
slopes[i] <- lm(log(species.richness) ~ log(volume))$coefficients[2]
}
When I run it the "slopes" value remains empty. All other values are fine (no other empty sets). Let me know if you find any obvious mistakes. Thanks

Currently, you are iterating across the length of unique values and not unique values themselves. So, as #RobJensen comments, adjust the for loop vector and iteration. Hence, why some or all returned values result in missing as subset.data may contain no rows due to imprecise filter.
However, consider a more streamlined approach using the often underused and overlooked by() to subset dataset by needed grouping factor(s) and bind returned list into a vector:
coeff_list <- by(data.gdd, data.gdd$GDD, FUN=function(df) {
volume <- apply(df[,4,6],1,prod)
species.richness <- apply(df[,7:59],1,sum)
lm(log(species.richness) ~ log(volume))$coefficients[2]
})
slopes <- do.call(c, coeff_list)

Running for loop across multiple groups

I am running the following imputation task in R as a for loop:
myData <- essuk[c(2,3,4,5,6,12)]
myDataImp <- matrix(0,dim(myData)[1],dim(myData)[2])
lower <- c(0)
upper <- c(Inf)
for (k in c(1:5))
{
gmm.fit1 <- gmm.tmvnorm(matrix(myData[,k],length(myData[,k]),1), lower=lower, upper=upper)
useMu <- matrix(gmm.fit1$coefficients[1],1,1)
useSigma <- matrix(gmm.fit1$coefficients[2],1,1)
replaceThese <- myData[,k]<=0
myDataImp[,k] <- myData[,k]
myDataImp[replaceThese,k] <- rtmvnorm(n=sum(replaceThese), c(useMu), c(useSigma), c(-Inf), c(0))
}
The steps are pretty straightforward
Define the data set and an empty imputation data set.
For column 1-5, fit a model.
Extract model estimates to be used for imputation.
Run a model using model estimates and replace values <= 0 with the new values in the imputation data set.
However, I want to do this separately for multiple groups, rather than for the full sample. Column 12 in the data set contains information on group membership (integers ranging from 1-72).
I have tried several options, including splitting the data frame with data_list <- split(myData, myData$V12) and use the lapply() function. However, this does not work due to how model estimates are formatted:
Error in as.data.frame.default(data) :
cannot coerce class ""gmm"" to a data.frame
I have also thought about the possibility of doing a nested for loop, although I am not sure how that could be accomplished. Any suggestions are much appreciated.

what about using subset() ?
myData$V12 = as.factor(myData$V12)
listofresults= c()
for (i in levels(myData$V12)){
data = subset (myData, myData$V12 == i)
#your analysis here: result saved in myDataImp
listofresults = c(listofresults, myDataImp)
}
not the most elegant, but should work.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Counter in Variable Selection - r

Related

Subsetting numeric variable but not factor in R

R - Running a regression for tibbles identified by an id out of a dataframe in long format

R nested for-loop only producing result with last column in dataframe

Creating for loops in R using subset data

Running for loop across multiple groups

Categories

Resources