Is there a way to simplify this code using a loop?
VariableList <- c(v0,v1,v2, ... etc)
National_DF <- df[,VariableList]
AL_DF <- AL[,VariableList]
AR_DF <- AR[,VariableList]
AZ_DF <- AZ[,VariableList]
... etc
I want the end result to have each as a data frame since it will be used later in the model. Each state such as 'AL', 'AR', 'AZ', etc are data frames. The v{#} represents an out of place variable from the RAW data frame. This is meant to restructure the fields, while eliminating some fields, for preparation for model use.
Continuing the answer from your previous question, we can arrange the data in the same lapply call before creating dataframes.
VariableList <- c('v0','v1','v2')
data <- unlist(lapply(mget(ls(pattern = '_DF$')), function(df) {
index <- sample(1:nrow(df), 0.7*nrow(df))
df <- df[, VariableList]
list(train = df[index,], test = df[-index,])
}), recursive = FALSE)
Then get data in global environment :
list2env(data, .GlobalEnv)
Related
my first question on Stack Overflow so bear with me ;-)
I wrote a function to row-bind all objects whose names meet a regex criterion into a dataframe.
Curiously, if I run the lines out of the function, it works perfectly. But within the function, an empty data frame is returned.
Reproducible example:
offers_2022_05 <- data.frame(x = 3)
offers_2022_06 <- data.frame(x = 6)
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
bind_multiple_dates("offers")
# A tibble: 0 × 0
However, this works:
prefix <- "offers"
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
data
month x
1 offers_2022_05 3
2 offers_2022_06 5
I suppose it has something to do with the environment, but I can't really figure it out. Is there a better way to do this? I would like to keep the code as a function.
Thanks in advance :-)
By default ls() will look in the current environment when looking for variables. In this case, the current environment is the function body and those data.frame variables are not inside the function scope. You can explicitly set the environment to the calling environment to find using the envir= parameter. For example
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix), envir=parent.frame())
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
The "better" way to do this is to not create a bunch of separate variables like offers_2022_05 and offers_2022_06 in the first place. Variables should not have data or indexes in their name. It would be better to create the data frames in a list directly from the beginning. Often this is easily accomplished with a call to lapply or purrr::map. See this existing question for more info
This is an extension of question on Function and looping in training and testing set using r.
How to get the result of the function (func1) given below in external folder using common columns and then each with its own additional column output? Moreover, how can I get the output of each unique data_by_plot result in external folder. I used write.table(func1, “c:\\Document\\project\\result), but I couldn’t get in the way I want. My code is given below. I tried different ways using cbind and rbind but it doesn’t give me what I want.
My code is :
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Model1$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Model2$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Model3$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Model4$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
return(result)
})
Ok with clarification from your comment above. There are two ways you could do this. You could incorporate it into the function itself or pull out the result of the function and pass it to an export function.
I think the easiest way would be the former, so create an export function:
export.function <- function(result){
path <- "//folder/" #whatever your path is to the folder
as.data.frame(result) -> result #turn to data.frame
write.csv(paste0(path, result, ".csv"))
}
This will write the result, as a data frame, as a csv in the path designated. (It will name it "result.csv").
Then add it:
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Mean_model$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Mean_model$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Mean_model$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Mean_model$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
export.function(result)
})
The other way would be to just do:
func1(df) -> results
export.function(results)
I prepare a function to have a temporary dataframe, but whent i apply this function on my old dataframe , the temporary dataframe is empty. How can i solve this ?
I tried this code :
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat["vname"]
locci_1 <- sample(dat["loc1"], replace = F)
locci_2 <- sample(dat["loc2"], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= "data_a",vname="pop",loc1="PA1",loc2="PA2")
I've tried to convert the data_a with
data_a <- as.matrix(data_a)
and
popu <- sample(dat[,1], replace = F)
but they didn't work too
Thank's :)
There are maybe multiple issues. First, when you have created your data frame, be aware that data.frame function family treat string as a factor by default. It may be not what you want.
Then #NURAIMIAZIMAH is right, your function needs a data frame to work properly, so :
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
is a good start.
Moreover, you give value to vector like vname, loc1 and loc2. But you only use the name of these objects in your function, because you forgot to remove quotation mark.
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[loc1], replace = F)
locci_2 <- sample(dat[loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
Now your function should work, but maybe not in the way you would like to. Because there won't be any permutations in your data_3 table. If you look carefully, the type of return of this part of the code dat[loc1] is a data frame. You certainly want a vector to permute your data, so you have to subset your data frame like this : dat[,loc1].
This code below should do what you expect.
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[,loc1], replace = F)
locci_2 <- sample(dat[,loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
See you.
Let's say I have 5 datasets in a list (each named df_1, df_2, and so on), each with a variable called cons. I'd like to execute a function over cons in each dataset in the list, and create a new variable whose name has the suffix of the corresponding dataset.
So in the end df_1 will have a variable called something like cons_1 and df_2 will have a variable called cons_2. The problem I run into is the variable looping and trying to create dynamic names.
Any suggestions?
This is actually pretty straightforward:
df_names <- paste("df", 1:5, sep = "_")
cons_names <- paste("cons", 1:5, sep = "_")
for (i in 1:5) {
# get the df from the current env by name
df_i <- get(df_names[i])
# do whatever you need to do and assign the result
df_i[[cons_names[i]]] <- some_operation(df_i)
}
But it would make more sense to keep your data frames in a list to avoid using get, which can be sketchy:
for (i in 1:5) {
df_i[[cons_names[i]]] <- some_operation(df_list[[i]])
}
Using the purrr package, this would be an alternative solution:
library(purrr)
lst <- list(mtcars_1 = mtcars,
mtcars_2 = mtcars,
mtcars_3 = mtcars,
mtcars_4 = mtcars,
mtcars_5 = mtcars)
map(seq_along(lst), function(x) {
lst[[x]][paste0("mpg_", x)] <- some_operation(lst[[x]]['mpg']); lst[[x]]
})
Subset each data frame from the list, create the new mpg variable with the index of the current data frame and perform whatever operation you want on the mpg variable. The result is a list with all data previous data frames with the new variable for each data frame.
Since this new list doesn't have the data frame names, you can always just add them with setNames(newlist, names(lst))
I made a loop that assigns the result of a function to a newly created variable. After that that variable is used to create another.
This second step fails to produce the expected result.
library(stringr)
for (i in 1:length(Ids)){
nam <- paste("data", Ids[i], sep = "_")
assign(nam, GetReportData(query, token,paginate_query = F))
newvar=paste(nam,"contentid",sep="$")
originStr=paste(nam,"pagePath",sep="$")
assign(newvar,str_extract(originStr,"&id=[0-9]+"))
}
Don't create a bunch of variables, store related values in named lists to make it easier to retrieve them. You didn't supply any input to test with, but i'm guessing this does the same thing.
library(stringr)
mydata <- lapply(1:length(Ids), function(i) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
dd
})
This will return a list of data.frames. You can access them with mydata[[1]], mydata[[2]], etc rather than data_1, data_2, etc
If you absolutely insist on creating a bunch of variables, just make sure to do all your transformations on an actual object, and then save that object when your are done. You can never use assign with names that have $ or [ as described in the help page: "assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc." For example
for(i in 1:length(Ids)) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
assign(paste("data",i,sep="_"), dd)
}