Dynamically create subsets in R with a for loop - r

I am trying to create different subsets out of a table and with each iteration I want to shift one column upwards. So far I realized this with this code but undynamically:
subset_cor_lag00 <- subset(data_24h, select = c(price_return, sentiment_return, tweet_return))
korr_tab_lag00 <- cor(subset_cor_lag00)
subset_cor_lag01 <- transform(subset_cor_lag00, price_return = lead(price_return))
subset_cor_lag01 <- na.omit(subset_cor_lag01)
korr_tab_lag01 <- cor(subset_cor_lag01)
But now I tried to do this dynamically but I got stuck with it. So maybe someone has a hint. I really would appreciate it. I tried this
for(i in 1:5) {
paste0("subset_cor_lag0", i) <- transform(paste0("subset_cor_lag0", i-1), price_return = lead(price_return))
paste0("subset_cor_lag0", i) <- na.omit(paste0("subset_cor_lag0", i))
paste0("korr_tab_lag0", i) <- cor(paste0("subset_cor_lag0", i))
}

You can use assign for this, but usually having sequentially named variables isn't nice to work with. The better way is to use a list:
subset_cor_lag = list(subset(data_24h, select = c(price_return, sentiment_return, tweet_return)))
for(i in 2:6) {
temp = transform(subset_cor_lag[[i - 1]], price_return = lead(price_return))
subset_cor_lag[[i]] = na.omit(temp)
}
korr_tab = lapply(subset_cor_lag, cor)
## add names, if desired:
name_vec = paste0("lag", 0:5)
names(subset_cor_lag) = name_vec
names(korr_tab) = name_vec
You can then access, e.g., subset_cor_lag[["lag2"]] or subset_cor_lag[[3]], which is easy to do programmatically in a loop or with lapply.
See my answer at How to make a list of data frames? for more discussion and examples.

Related

Variable length differ in R

i'm performing Anova testing for my current datasets that has multiple columns which i am trying to loop to make things easier but it seems to me that i am always facing the same error called "variable lengths differ"
here is my code for the loop:
for(i in 5:125){
WL<- colnames(NB[i])
model <- lm(WL ~ Treatment , data = NB)
if(!exists("aovNB")){
aovNB<-anova(model)
}
if(exists("aovNB")){
aovNB <- rbind(aovNB,anova(model))
}
}
and i'm wondering if it is possible that way to store the column names into WL variable which i can use to read the multiple columns i have.
thanks if anyone could solve it. i'm using base R.
Use reformulate/as.formula to create formula from strings. Also instead of rbinding the datasets in a loop store them in a list.
cols <- colnames(NB)[5:125]
result <- vector('list', length(cols))
for(i in seq_along(cols)){
model <- lm(reformulate('Treatment', cols[i]) , data = NB)
result[[i]] <- anova(model)
}
If needed you can combine them using result <- do.call(rbind, result)
We may do this with paste
cols <- colnames(NB)[5:125]
result <- vector('list', length(cols))
for(i in seq_along(cols)) {
result[[i]] <- anova(lm(as.formula(paste(cols[i], '~ Treatment')), data = NB))
}

Variable name in R with for

I am trying to create several data names from a for, but the parameter of the for does not recognize me
c.n_vars<-ncol(DATA)
for (i in 3:c.n_vars)
{
datasets[i] <- ts(DATA[,i],start = c(2009,1),frequency = 12)
}
the idea is to create
datasets_1
datasets_2
datasets_3....
is posible?
In R we try not to create lots of similarly named objects. They are difficult to work with and will cause headaches for you later on. Instead we put related objects in lists:
c.n_vars <- ncol(DATA)
datasets <- vector(mode = "list",length = c.n_vars - 2)
for (i in 3:c.n_vars){
datasets[[i]] <- ts(DATA[,i],start = c(2009,1),frequency = 12)
}
If you want the list items to have names you can name them:
names(datasets) <- paste0("dataset_",1:length(dataset))

add non permanent vectors to data frame using rbind

i've non permanent vectors that i like to merge them to one data frame,
im using the following loop to create those vectors
for (i in campagin_id){
h <- basicHeaderGatherer()
doc <- getURI(paste0(automations_url,
"/",i,
"?apikey=",accessToken,
"&count=",pagination), headerfunction = h$update)
assign(paste0('web_id',i),c(i,as.integer(substring(h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)-1,as.integer(regexpr(SearchTerm,h$value()[as.integer(grep(SearchTerm, h$value()))]))+nchar(SearchTerm)+StringLength-2))))
}
i received list of vectors and i like to marge them with rbind something like that
rbind(web_id0f09cc8ddd,web_id18a71f70a8)
the issue is that i don't not how many vectors i will get but i knows only the beginning of the vector name, so i'm trying to run the following loop
for (i in campagin_id) {
web_id <- do.call("rbind",list(paste0('web_id',i)))
}
but it insert only one vector to the data frame
the campaign_id contains all the i values i need in specific time
Thanks
do.call is the right idea, but rbind is a slow operation. You should add your vectors to a list one-at-a-time, and then do a single rbind at the end, something like this (untested, obviously, as the example isn't reproducible, but it should give you the idea):
result_list = list(length = length(campagin_id))
for (i in campagin_id) {
h <- basicHeaderGatherer()
doc <- getURI(
paste0(
automations_url,
"/",
i,
"?apikey=",
accessToken,
"&count=",
pagination
),
headerfunction = h$update
)
result_list[[i]] = c(i, as.integer(
substring(
h$value()[as.integer(grep(SearchTerm, h$value()))],
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) - 1,
as.integer(regexpr(SearchTerm, h$value()[as.integer(grep(SearchTerm, h$value()))])) +
nchar(SearchTerm) + StringLength - 2
)
))
}
results = do.call(rbind, result_list)

Improvement upon a for-loop: create a series of subsets without looping

My objective is to create a number of time-series subsets from a list of variables. I wrote this with a for-loop. However, I'm looking for more elegant ideas on how to do with an existir R function, that doesn't require a loop.
All ideas and intros to new functions in R are much appreciated.
A reproducible example of the code:
russell_sim <- arima.sim(model=list(ar=c(.9,-.2)),n=449)
russell_sim <- ts(russell_sim, start = c(1980,1), end = c(2017,5) ,frequency = 12)
pmi_sim <- arima.sim(model=list(ar=c(.9,-.2)),n=449)
pmi_sim <- ts(russell_sim, start = c(1980,1), end = c(2017,5) ,frequency = 12)
big_list<- list(russell = russell_sim, pmi= pmi_sim)
for (i in 1: length(big_list)) {
assign(paste(names(x = big_list)[i], "_before08", sep = ""), window(big_list[[i]], start=c(1981,1), end=c(2007, 12)) )
}
Thank you.
You can make use of the handy list2env function but you will need to edit the list first to get your desired output:
# New List to edit
big_list_before08 <- big_list
# change your observations
big_list_before08 <- lapply(big_list_before08, function(x) window(x, start = c(1981,1),
end = c(2007,12)))
# change the individual list element names
names(big_list_before08) <- paste0(names(big_list),"_before08")
# save to the global environment
list2env(big_list_before08, envir = .GlobalEnv)
Let me know if you have any questions!

Indexing certain elements in a nested list, for all nests

I have a list which contains more lists of lists:
results <- sapply(c(paste0("cv_", seq(1:50)), "errors"), function(x) NULL)
## Locations for results to be stored
step_results <- sapply(c("myFit", "forecast", "errors"), function(x) NULL)
step_errors <- sapply(c("MAE", "MSE", "sign_accuracy"), function(x) NULL)
final_error <- sapply(c("MAE", "MSE", "sign_accuracy"), function(x) NULL)
for(i in 1:50){results[[i]] <- step_results}
for(i in 1:50){results[[i]][[3]] <- step_errors}
results$errors <- final_error
Now in this whole structure, I would like to sum up all the values in sign_accuracy and save them in results$errors$sign_accuracy
I could maybe do this with a for-loop, indexing with i:
## This is just an example - it won't actually work!
sign_acc <- matrix(nrow = 50, ncol = 2)
for (i in 1:50){
sign_acc[i, ] <- `results[[i]][[3]][[3]]`
results$errors$sign_accuracy <- sign_acc
}
If I remember correctly, in Matlab there is something like list(:), which means all elements. In Python I have seen something like list(0:-1), which also means all elements.
What is the elegent R equivalent? I don't really like loops.
I have seen methods using the apply family of functions. With something like apply(data, "[[", 2), but can't get it to work for deeper lists.
Did you try with c(..., recursive)?
Here is an option with a short example at the end:
sumList <- function(l, label) {
lc <- c(l, recursive=T)
filter <- grepl(paste0("\\.",label, "$"), names(lc)) | (names(lc) == label)
nums <- lc[filter]
return(sum(as.numeric(nums)))
}
ex <- list(a=56,b=list("5",a=34,list(c="3",a="5")))
sumList(ex,"a")
In this case, you can do what you want with
results$errors$sign_accuracy <- do.call(sum, lapply(results, function(x){x[[3]][[3]]}))
lapply loops through the first layer of results, and pulls out the third element of the third element for each. do.call(sum catches all the results and sums them.
The real problems with lists arise when the nesting is more irregular, or when you need to loop through more than one index. It can always be done in the same way, but it gets extraordinarily ugly very quickly.

Resources