A for() loop to overwrite existing data.frames - r

I'm testing some operations with that classical 99' Czech Bank data set, trying to execute a tidyverse task upon several data.frames in my global environment, but the loop I've created keeps overwriting the same object val, which was supposed to be a dummy for the df themselves:
x <- c("loans93","loans94","loans95","loans96",
"loans97","loans98")
x <- base::mget(x, envir=as.environment(-1), mode= "any", inherits=F)
for (val in x) {
val <- val %>%
select(account_id, district_id, balance, status, date) %>%
group_by(account_id, district_id, status, date) %>%
summarise(balance=mean(balance, na.rm=T)) %>%
ungroup()
}
What am I doing wrong? I've searched for similar questions but people keep answering lapply solutions, I just need the task to be saved upon my DFs instead of this "val" object I keep getting.

you could try this:
df_names <- c("loans93", "loans94", "loans95",
"loans96", "loans97", "loans98")
for(df_name in df_names){
get(df_name) %>%
head %>% ## replace with desired manipulations
assign(value = .,
x = paste0(df_name,'_manipulated'), ## or just: df_name to overwrite original
envir = globalenv())
}
aside: list2env is handy to "spawn" list members (e. g. dataframes) into the environment. Example:
list_of_dataframes <- list(
iris_short = head(iris),
cars_short = head(cars)
)
list2env(x = list_of_dataframes)

Related

Create a list of all dataframes/tibbles in the global environment?

How does one create a named list of all dataframes/tibbles in the global environment in R? Is there a way to do this without manually hardcoding all dataframes/tibbles?
I.e. if the global environment contains the dataframes/tibbles df_1, my_data_1, science_1, all_data, how does one create an output that looks like:
files_list <- list(
df_1 = df_1,
my_data_1 = my_data_1,
science_1 = science_1,
all_data = all_data
)
We may Filter the elements that are data.frame or tibble in the environment that we are working on - e.g. in the global env, it can be
Filter(length, eapply(.GlobalEnv,
function(x) if(is.data.frame(x)||is_tibble(x)) x))
We can get all objects first, then keep only the data.frames
library(purrr)
mget(ls()) %>% keep(is.data.frame)
A base way, combining methods of #GuedesBF and #akrun could be using ls, mget and Filter.
Filter(is.data.frame, mget(ls()))
#Filter(is.data.frame, mget(ls(.GlobalEnv))) #More explicit using globEnv
Please try the below code which will generate a df
naml <- list()
for (i in seq_along(ls(envir =.GlobalEnv))) {
j <- ls(envir =.GlobalEnv)[i]
if (any(class(get(j))=='data.frame')) name <- {j} else name <- NA
if (any(class(get(j))=='data.frame')) class <- class(get(j))[3] else class <- NA
if (!is.na(name) & !is.na(class)) {
df <- data.frame(namex=name,classx=class)
naml[[j]] <- df
}
}
df2 <- do.call(rbind, naml) %>% rownames_to_column('name') %>%
pivot_wider(names_from = name, values_from = namex)

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

How to properly parse (?) mdsets in expss within a loop?

I'm new to R and I don't know all basic concepts yet. The task is to produce a one merged table with multiple response sets. I am trying to do this using expss library and a loop.
This is the code in R without a loop (works fine):
#libraries
#blah, blah...
#path
df.path = "C:/dataset.sav"
#dataset load
df = read_sav(df.path)
#table
table_undropped1 = df %>%
tab_cells(mdset(q20s1i1 %to% q20s1i8)) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
There are 10 multiple response sets therefore I need to create 10 tables in a manner shown above. Then I transpose those tables and merge. To simplify the code (and learn something new) I decided to produce tables using a loop. However nothing works. I'd looked for a solution and I think the most close to correct one is:
#this generates a message: '1' not found
for(i in 1:10) {
assign(paste0("table_undropped",i),1) = df %>%
tab_cells(mdset(assign(paste0("q20s",i,"i1"),1) %to% assign(paste0("q20s",i,"i8"),1)))
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
Still it causes an error described above the code.
Alternatively, an SPSS macro for that would be (published only to better express the problem because I have to avoid SPSS):
define macro1 (x = !tokens (1)
/y = !tokens (1))
!do !i = !x !to !y.
mrsets
/mdgroup name = !concat($SET_,!i)
variables = !concat("q20s",!i,"i1") to !concat("q20s",!i,"i8")
value = 1.
ctables
/table !concat($SET_,!i) [colpct.responses.count pct40.0].
!doend
!enddefine.
*** MACRO CALL.
macro1 x = 1 y = 10.
In other words I am looking for a working substitute of !concat() in R.
%to% is not suited for parametric variable selection. There is a set of special functions for parametric variable selection and assignment. One of them is mdset_t:
for(i in 1:10) {
table_name = paste0("table_undropped",i)
..$table_name = df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
}
However, it is not good practice to store all tables as separate variables in the global environment. Better approach is to save all tables in the list:
all_tables = lapply(1:10, function(i)
df %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>%
tab_total_row_position("none") %>%
tab_stat_cpct() %>%
tab_pivot()
)
UPDATE.
Generally speaking, there is no need to merge. You can do all your work with tab_*:
my_big_table = df %>%
tab_total_row_position("none")
for(i in 1:10) {
my_big_table = my_big_table %>%
tab_cells(mdset_t("q20s{i}i{1:8}")) %>% # expressions in the curly brackets will be evaluated and substituted
tab_stat_cpct()
}
my_big_table = my_big_table %>%
tab_pivot(stat_position = "inside_columns") # here we say that we need combine subtables horizontally

How to make this work without global variable in R?

I am parsing some metadata containing json files to similar dataframes. I am using tidyjson. Finally i made it work like this:
append_arrays_and_objects <- function (tbl) {
objs <- tbl %>%
filter(is_json_object(.)) %>% gather_object %>%
append_values_string
arr <- tbl %>%
filter(is_json_array(.)) %>% gather_array %>%
append_values_string
if (nrow(objs) > 0) append_arrays_and_objects(objs)
if (nrow(arr) > 0) append_arrays_and_objects(arr)
print(objs)
print(arr)
res1 <- merge(objs,arr, all=TRUE)
result <<- merge(result,res1, all=TRUE)
result
}
#parse microdata
result <- data.frame()
md <- dataHighest$JSON %>%
enter_object(microdata) %>%
append_arrays_and_objects
rm(result)
It just bothers me that I can't make it work without the global dataframe result. When i tried it by returning any combination of local dataframes it always ends up with a dataframe with the "first level" depth dataframe.. I think when it has all the data collected i cannot seem to pass it back anymore. Should be trivial to solve?

Which environment should be called when using eval( ) in a function?

I've got a set of functions that I'm trying to work with and I'm struggling to figure out why the assignment isn't working. Here are the functions I'm using:
new_timeline <- function() {
timeline = structure(list(), class="timeline")
timeline$title <- list("text" = list("headline" = NULL, "text" = NULL),
"start_date" = list("year" = NULL, "month" = NULL, "day" = NULL),
"end_date" = list("year" = NULL, "month" = NULL, "day" = NULL))
return(timeline)
}
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>% parse(text = .) %>% eval()
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>% parse(text = .) %>% eval()
if (!is.na(date[3])) stringr::str_interp("self$title$${time_type}_date$day <- date[3]") %>% parse(text = .) %>% eval()
return(self)
}
edit_title <- function(self, headline = NULL, text = NULL, start_date = NULL, end_date = NULL) {
if (class(self) != "timeline") stop("The object passed must be a timeline object.")
if (is.null(headline) && is.null(self$title$text$headline)) stop("Headline cannot be empty when adding a new title.")
if (!is.null(headline)) self$title$text$headline <- headline
if (!is.null(text)) self$title$text$text <- text
if (!is.null(start_date)) self <- .add_date(self, date = start_date, time_type = "start")
if (!is.null(end_date)) self <- .add_date(self, date = end_date, time_type = "end")
return(self)
}
EDIT: The above code has been severely reduced per a request in the comments. The code is still sufficient to reproduce the error.
I know that's a bit long-winded, so I apologize. The first function establishes a new timeline object. The third function allows us to change the title of the timeline object and the second function is a helper function that handles dates. The code would be used like this:
library(magrittr)
#devtools::install_github("hadley/stringr")
library(stringr)
tl <- new_timeline()
tl <- tl %>% edit_title(headline = "My Timeline", text = "Example", start_date = "2015-10-18")
The code runs with no errors, but when I call tl$title$start_date$year, it comes back as NULL. Using an answer I got in this previous question I asked, I tried to set envir = globalenv() within the eval function. When I do that, the function returns an error saying that object self cannot be found.
So I'm under the impression that self is held in the parent.frame(). So I add both of these to a list: envir = list(globalenv(), parent.frame()). This causes the function to run without error, but there's still no assignment.
Where am I going wrong? Thanks in advance!
As mentioned in the comments, I think you could probably do away with all of the code parsing and just pass variables in [[ for your assignments. Anyway, when you use the pipe operator a bunch of function wrapping happens so determining how many frames to go back is painful. Here are a couple solutions modifying the .add_date function.
You already found one, using <<-, since it searches back through the parent environments until it finds the variable (or doesnt and assigns it in the global).
Another would be just storing the function environment() and passing that to eval.
A third would be counting how many frames deep you go, and using sys.frame to tell eval which environment to look in.
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
## Examining environemnts
e <- environment() # current env
efirst <- sys.nframe() # frame number
print(paste("Currently in frame", efirst))
envs <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.frames()} # list of frames
elast <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.nframe()} # number of last
print(paste("Went", elast, "frames deep."))
## Go back this many frames in eval
goback <- efirst-elast
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
## Solution 1: use sys.frame
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>%
parse(text = .) %>% eval(envir=sys.frame(goback))
## Solution 2: use environment defined in function
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>%
parse(text = .) %>% eval(envir=e)
return(self)
}

Resources