R new variable assignment - r

I made a loop that assigns the result of a function to a newly created variable. After that that variable is used to create another.
This second step fails to produce the expected result.
library(stringr)
for (i in 1:length(Ids)){
nam <- paste("data", Ids[i], sep = "_")
assign(nam, GetReportData(query, token,paginate_query = F))
newvar=paste(nam,"contentid",sep="$")
originStr=paste(nam,"pagePath",sep="$")
assign(newvar,str_extract(originStr,"&id=[0-9]+"))
}

Don't create a bunch of variables, store related values in named lists to make it easier to retrieve them. You didn't supply any input to test with, but i'm guessing this does the same thing.
library(stringr)
mydata <- lapply(1:length(Ids), function(i) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
dd
})
This will return a list of data.frames. You can access them with mydata[[1]], mydata[[2]], etc rather than data_1, data_2, etc
If you absolutely insist on creating a bunch of variables, just make sure to do all your transformations on an actual object, and then save that object when your are done. You can never use assign with names that have $ or [ as described in the help page: "assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc." For example
for(i in 1:length(Ids)) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
assign(paste("data",i,sep="_"), dd)
}

Related

Function does return empty data frame

my first question on Stack Overflow so bear with me ;-)
I wrote a function to row-bind all objects whose names meet a regex criterion into a dataframe.
Curiously, if I run the lines out of the function, it works perfectly. But within the function, an empty data frame is returned.
Reproducible example:
offers_2022_05 <- data.frame(x = 3)
offers_2022_06 <- data.frame(x = 6)
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
bind_multiple_dates("offers")
# A tibble: 0 × 0
However, this works:
prefix <- "offers"
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
data
month x
1 offers_2022_05 3
2 offers_2022_06 5
I suppose it has something to do with the environment, but I can't really figure it out. Is there a better way to do this? I would like to keep the code as a function.
Thanks in advance :-)
By default ls() will look in the current environment when looking for variables. In this case, the current environment is the function body and those data.frame variables are not inside the function scope. You can explicitly set the environment to the calling environment to find using the envir= parameter. For example
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix), envir=parent.frame())
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
The "better" way to do this is to not create a bunch of separate variables like offers_2022_05 and offers_2022_06 in the first place. Variables should not have data or indexes in their name. It would be better to create the data frames in a list directly from the beginning. Often this is easily accomplished with a call to lapply or purrr::map. See this existing question for more info

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Getting Looped Output into an Appended Object

So I am trying to make a basic sensitivity analysis script. The outputs come out as I want via the print I added to the end of the script. Issue is that I would like a tibble or object that has all the outputs appended together that I can export as a csv or xlsx.
I created two functions, sens_analysis which runs all the code, and multiply_across which multiplies across each possible percentage across each possible column of your table. You need multiply_across to run the sens_analysis.
I would normally like a title but instead I just added an indicator column instead that I can sort by.
I made everything with mtcars so it should be easy to replicate, the issue is that I just have a huge print at the end; not an object that I can manipulate or pull from for other analysis.
I have been trying the rbind, bind_row, appending rows in a variety of ways.
Or building a new object. As you can see in the code at line (18) I make something called output that I have tried to populate, which hasn't gone well.
rm(list = ls())
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(xtable)
data<-mtcars
percent<-c(.05,.1,.15)
goods<-c("hp","gear","wt")
weight<-c(6,7,8)
disagg<-"cyl"
func<-median
sens_analysis<-function(data=data, goods=goods, weight=weight, disagg=disagg, precent=percent, func=func){
output<-NULL%>%
as.tibble()
basket<-(rbind(goods,weight))
percent<-c(0,percent,(percent*-1))
percent_to_1<-percent+1
data_select<-data%>%
dplyr::select(c(goods,disagg))%>%
group_by_at(disagg)%>%
summarise_at(.vars = goods ,.funs = func)%>%
as_tibble()
data_select_weight<-purrr::map2(data_select[,-1], as.numeric(basket[2,]),function(var, weight){
var*weight
})%>% as_tibble %>%
add_column(data_select[,1], .before = 1)
colnames(data_select_weight)[1]<-disagg
multiply_across(data_select_weight,percent_to_1)
return(output)
#output2<-rbind(output2,output)
}
############################
multiply_across<-function(data=data_select_weight,list=percent_to_1){
varlist<-names(data[,-1])
for(i in varlist){
df1 = data[,i]
for(j in list){
df<-data
df[,i]<-round(df1*j,2)
df<-mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type=paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg))%>%
print(df)
#output<-bind_rows(output,df)
#output<-bind_rows(output,df)
#output[[j]]<-df[[j]]
}
}
}
##############################################################################################
sens_analysis(data,goods,weight,disagg,percent,func)
The expected result if you just run the code straight-up should just be a bunch of printed tibbles, that arent in an object. But ideally, for future analysis on the data or easy of use, a table of the outputs appended together would be best.
So I figured it out and will add my answer here in case someone else hits this issues.
I created a list within loops and then binded those lists together.
Just focus on the binding rows outside the right for-loop.
multiply_across<-function(data=data_select_weight,
list=percent_to_1){
varlist <- colnames(data[, -1])
output_list <- list()
for (i in varlist) {
df1 <- data[,i]
for (j in list) {
name <- paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg)
df <- as_tibble(data)
df[,i] <- round(df1*j, 2)
df <- mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type = paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg))
df<-df[,c(6,1,2,3,4,5)]
output_list[[paste0(i," BY ",(as.numeric(j)-1)*100)]] <- (assign(paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg),df))
}
}
bind_rows(lapply(output_list,
as.data.frame.list,
stringsAsFactors=F))
}

R, creating variables on the fly in a list using assign statement

I want to create variable names on the fly inside a list and assign them values in R, but I am unable to get the desired result. Here is the logic of my code:
Upon the function call: dat_in <- readf(1,2), an input file is read based on a product and site. After reading, a particular column (13th, here) is assigned to a variable aot500. I want to have this variable return from the function for each combination of product and site. For example, I need variables name in the list statement as aot500.AF, aot500.CM, aot500.RB to be returned from this function. I am having trouble in the return statement. There is no error but there is nothing in dat_in. I expect it to have dat_in$aot500.AF etc. Please inform what is wrong in the return statement. Furthermore, I want to read files for all combinations in a single call to the function, say using a for loop and I wonder how would the return statement handle list of more variables.
prod <- c('inv','tot')
site <- c('AF','CM','RB')
readf <- function(pp, kk) {
fname.dsa <- paste("../data/site_data_",prod[pp],"/daily_",site[kk],".dat",sep="")
inp.aod <- read.csv(fname.dsa,skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
aot500 <- inp.aod[,13]
return(list(assign(paste("aot500",siteabbr[kk],sep="."),aot500)))
}
Almost always there is no need to use assign(), we can solve the problem in two steps, read the files into a list, then give names.
(Not tested as we don't have your files)
prod <- c('inv', 'tot')
site <- c('AF', 'CM', 'RB')
# get combo of site and prod
prod_site <- expand.grid(prod, site)
colnames(prod_site) <- c("prod", "site")
# Step 1: read the files into a list
res <- lapply(1:nrow(prod_site), function(i){
fname.dsa <- paste0("../data/site_data_",
prod_site[i, "prod"],
"/daily_",
prod_site[i, "site"],
".dat")
inp.aod <- read.csv(fname.dsa,
skip = 4,
stringsAsFactors = FALSE,
na.strings = "N/A")
inp.aod[, 13]
})
# Step 2: assign names to a list
names(res) <- paste("aot500", prod_site$prod, prod_site$site, sep = ".")
I propose two answers, one based on dplyr and one based on base R.
You'll probably have to adapt the filename in the readAOT_500 function to your particular case.
Base R answer
#' Function that reads AOT_500 from the given product and site file
#' #param prodsite character vector containing 2 elements
#' name of a product and name of a site
readAOT_500 <- function(prodsite,
selectedcolumn = c("AOT_500"),
path = tempdir()){
cat(path, prodsite)
filename <- paste0(path, prodsite[1],
prodsite[2], ".csv")
dtf <- read.csv(filename, stringsAsFactors = FALSE)
dtf <- dtf[selectedcolumn]
dtf$prod <- prodsite[1]
dtf$site <- prodsite[2]
return(dtf)
}
# Load one file for example
readAOT_500(c("inv", "AF"))
listofsites <- list(c("inv","AF"),
c("tot","AF"),
c("inv", "CM"),
c( "tot", "CM"),
c("inv", "RB"),
c("tot", "RB"))
# Load all files in a list of data frames
prodsitedata <- lapply(listofsites, readAOT_500)
# Combine all data frames together
prodsitedata <- Reduce(rbind,prodsitedata)
dplyr answer
I use Hadley Wickham's packages to clean data.
library(dplyr)
library(tidyr)
daily_CM <- read.csv("~/downloads/daily_CM.dat",skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
# Generate all combinations of product and site.
prodsite <- expand.grid(prod = c('inv','tot'),
site = c('AF','CM','RB')) %>%
# Group variables to use do() later on
group_by(prod, site)
Create 6 fake files by sampling from the data you provided
You can skip this section when you have real data.
I used various sample length so that the number of observations
differs for each site.
prodsite$samplelength <- sample(1:495,nrow(prodsite))
prodsite %>%
do(stuff = write.csv(sample_n(daily_CM,.$samplelength),
paste0(tempdir(),.$prod,.$site,".csv")))
Read many files using dplyr::do()
prodsitedata <- prodsite %>%
do(read.csv(paste0(tempdir(),.$prod,.$site,".csv"),
stringsAsFactors = FALSE))
# Select only the columns you are interested in
prodsitedata2 <- prodsitedata %>%
select(prod, site, AOT_500)

Resources