How to make this work without global variable in R? - r

I am parsing some metadata containing json files to similar dataframes. I am using tidyjson. Finally i made it work like this:
append_arrays_and_objects <- function (tbl) {
objs <- tbl %>%
filter(is_json_object(.)) %>% gather_object %>%
append_values_string
arr <- tbl %>%
filter(is_json_array(.)) %>% gather_array %>%
append_values_string
if (nrow(objs) > 0) append_arrays_and_objects(objs)
if (nrow(arr) > 0) append_arrays_and_objects(arr)
print(objs)
print(arr)
res1 <- merge(objs,arr, all=TRUE)
result <<- merge(result,res1, all=TRUE)
result
}
#parse microdata
result <- data.frame()
md <- dataHighest$JSON %>%
enter_object(microdata) %>%
append_arrays_and_objects
rm(result)
It just bothers me that I can't make it work without the global dataframe result. When i tried it by returning any combination of local dataframes it always ends up with a dataframe with the "first level" depth dataframe.. I think when it has all the data collected i cannot seem to pass it back anymore. Should be trivial to solve?

Related

A for() loop to overwrite existing data.frames

I'm testing some operations with that classical 99' Czech Bank data set, trying to execute a tidyverse task upon several data.frames in my global environment, but the loop I've created keeps overwriting the same object val, which was supposed to be a dummy for the df themselves:
x <- c("loans93","loans94","loans95","loans96",
"loans97","loans98")
x <- base::mget(x, envir=as.environment(-1), mode= "any", inherits=F)
for (val in x) {
val <- val %>%
select(account_id, district_id, balance, status, date) %>%
group_by(account_id, district_id, status, date) %>%
summarise(balance=mean(balance, na.rm=T)) %>%
ungroup()
}
What am I doing wrong? I've searched for similar questions but people keep answering lapply solutions, I just need the task to be saved upon my DFs instead of this "val" object I keep getting.
you could try this:
df_names <- c("loans93", "loans94", "loans95",
"loans96", "loans97", "loans98")
for(df_name in df_names){
get(df_name) %>%
head %>% ## replace with desired manipulations
assign(value = .,
x = paste0(df_name,'_manipulated'), ## or just: df_name to overwrite original
envir = globalenv())
}
aside: list2env is handy to "spawn" list members (e. g. dataframes) into the environment. Example:
list_of_dataframes <- list(
iris_short = head(iris),
cars_short = head(cars)
)
list2env(x = list_of_dataframes)

Using seperate items in list as input for pipe operator in R

I have written a script which uses a list of URL's as input, and then scrapes certain information from the websites. I have done this with a for loop, but already the process time is verry long, I expect the list to get bigger over time, so I wanted to re-code my script a more efficient way. My idea was to eliminate the for loop and use pipe operators to reduce the processing time. My original (working code) is as follows;
imo <- c()
mmsi <- c()
for(i in 1:nrow(data)){
url <- sprintf("https://www.marinevesseltraffic.com/vessels?vessel=%s&flag=&page=1&sort=lenght&direction=desc",data$NAME[i])
page <- read_html(url)
CSSextract1 <- html_nodes(page, '.td_imo')
CSSextract2 <- html_nodes(page, '.td_mmsi')
imos <- html_text(CSSextract1)[2]
imo[i] <- imos
mmsis <- html_text(CSSextract2)[2]
mmsi[i] <- mmsis
}
data$IMO <- gsub("[\r \n \t]", "", imo)
data$MMSI <- gsub("[\r \n \t]", "", mmsi)
data$NAME <- gsub("\\+", " ", data$NAME)
I have re-written the code, trying to eliminate the for loop as follows;
CSSex1 <- function(page){
CSSextract <- html_nodes(page,'.td_imo')
return(CSSextract)
}
data$url <- sprintf("https://www.marinevesseltraffic.com/vessels?vessel=%s&flag=&page=1&sort=lenght&direction=desc",data$NAME)
data$mmsi <- data$url %>% read_html() %>% CSSex1() %>% html_text()[2]
However it gives me the error;
Error: `x` must be a string of length 1
I assume, the way I coded, the list (data$url) as a whole is now taken as input, so my question is;
Is it possible, and if yes how, to take each element from data$url as a input without using a (for) loop?
You may wish to set up url as a column of a data frame (data) to try:
mmsi_func <- function(x) {
z <- x %>%
read_html() %>%
CSSex1() %>%
html_text()
z[2]
}
data <- data %>%
rowwise() %>%
dplyr::mutate(mmsi = mmsi_func(url))
or something along those lines. I am not sure what the expected output is supposed to look like, but if it is a list rather than a vector, you can use this minor adjustment for a list column in the dataframe:
mmsi_func <- function(x) {
z <- x %>%
read_html() %>%
CSSex1() %>%
html_text()
z[2]
}
data <- data %>%
rowwise() %>%
dplyr::mutate(mmsi = list(mmsi_func(url)))

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

Convert first row into column names for mulitiple dataframes in a loop

I have multiple data frames. In each one of them, I need to convert the 1st row into column names
My code is as follows:
Assign all dataframes into cities
cities <- objects()
library(janitor)
for (i in cities){
paste0("file_",i) <- assign(i, get(i) %>% row_to_names(row_number = 1))
}
This code creates the following error:
Error in paste0("file_", i) <- assign(i, get(i) %>% row_to_names(row_number = 1)) :
target of assignment expands to non-language object
How can I fix this problem?
We can use the paste0 inside thee assign
library(dplyr)
for (i in cities){
assign(paste0("file_",i), get(i) %>% row_to_names(row_number = 1))
}

Getting Looped Output into an Appended Object

So I am trying to make a basic sensitivity analysis script. The outputs come out as I want via the print I added to the end of the script. Issue is that I would like a tibble or object that has all the outputs appended together that I can export as a csv or xlsx.
I created two functions, sens_analysis which runs all the code, and multiply_across which multiplies across each possible percentage across each possible column of your table. You need multiply_across to run the sens_analysis.
I would normally like a title but instead I just added an indicator column instead that I can sort by.
I made everything with mtcars so it should be easy to replicate, the issue is that I just have a huge print at the end; not an object that I can manipulate or pull from for other analysis.
I have been trying the rbind, bind_row, appending rows in a variety of ways.
Or building a new object. As you can see in the code at line (18) I make something called output that I have tried to populate, which hasn't gone well.
rm(list = ls())
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(xtable)
data<-mtcars
percent<-c(.05,.1,.15)
goods<-c("hp","gear","wt")
weight<-c(6,7,8)
disagg<-"cyl"
func<-median
sens_analysis<-function(data=data, goods=goods, weight=weight, disagg=disagg, precent=percent, func=func){
output<-NULL%>%
as.tibble()
basket<-(rbind(goods,weight))
percent<-c(0,percent,(percent*-1))
percent_to_1<-percent+1
data_select<-data%>%
dplyr::select(c(goods,disagg))%>%
group_by_at(disagg)%>%
summarise_at(.vars = goods ,.funs = func)%>%
as_tibble()
data_select_weight<-purrr::map2(data_select[,-1], as.numeric(basket[2,]),function(var, weight){
var*weight
})%>% as_tibble %>%
add_column(data_select[,1], .before = 1)
colnames(data_select_weight)[1]<-disagg
multiply_across(data_select_weight,percent_to_1)
return(output)
#output2<-rbind(output2,output)
}
############################
multiply_across<-function(data=data_select_weight,list=percent_to_1){
varlist<-names(data[,-1])
for(i in varlist){
df1 = data[,i]
for(j in list){
df<-data
df[,i]<-round(df1*j,2)
df<-mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type=paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg))%>%
print(df)
#output<-bind_rows(output,df)
#output<-bind_rows(output,df)
#output[[j]]<-df[[j]]
}
}
}
##############################################################################################
sens_analysis(data,goods,weight,disagg,percent,func)
The expected result if you just run the code straight-up should just be a bunch of printed tibbles, that arent in an object. But ideally, for future analysis on the data or easy of use, a table of the outputs appended together would be best.
So I figured it out and will add my answer here in case someone else hits this issues.
I created a list within loops and then binded those lists together.
Just focus on the binding rows outside the right for-loop.
multiply_across<-function(data=data_select_weight,
list=percent_to_1){
varlist <- colnames(data[, -1])
output_list <- list()
for (i in varlist) {
df1 <- data[,i]
for (j in list) {
name <- paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg)
df <- as_tibble(data)
df[,i] <- round(df1*j, 2)
df <- mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type = paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg))
df<-df[,c(6,1,2,3,4,5)]
output_list[[paste0(i," BY ",(as.numeric(j)-1)*100)]] <- (assign(paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg),df))
}
}
bind_rows(lapply(output_list,
as.data.frame.list,
stringsAsFactors=F))
}

Resources