Getting Looped Output into an Appended Object - r

So I am trying to make a basic sensitivity analysis script. The outputs come out as I want via the print I added to the end of the script. Issue is that I would like a tibble or object that has all the outputs appended together that I can export as a csv or xlsx.
I created two functions, sens_analysis which runs all the code, and multiply_across which multiplies across each possible percentage across each possible column of your table. You need multiply_across to run the sens_analysis.
I would normally like a title but instead I just added an indicator column instead that I can sort by.
I made everything with mtcars so it should be easy to replicate, the issue is that I just have a huge print at the end; not an object that I can manipulate or pull from for other analysis.
I have been trying the rbind, bind_row, appending rows in a variety of ways.
Or building a new object. As you can see in the code at line (18) I make something called output that I have tried to populate, which hasn't gone well.
rm(list = ls())
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(xtable)
data<-mtcars
percent<-c(.05,.1,.15)
goods<-c("hp","gear","wt")
weight<-c(6,7,8)
disagg<-"cyl"
func<-median
sens_analysis<-function(data=data, goods=goods, weight=weight, disagg=disagg, precent=percent, func=func){
output<-NULL%>%
as.tibble()
basket<-(rbind(goods,weight))
percent<-c(0,percent,(percent*-1))
percent_to_1<-percent+1
data_select<-data%>%
dplyr::select(c(goods,disagg))%>%
group_by_at(disagg)%>%
summarise_at(.vars = goods ,.funs = func)%>%
as_tibble()
data_select_weight<-purrr::map2(data_select[,-1], as.numeric(basket[2,]),function(var, weight){
var*weight
})%>% as_tibble %>%
add_column(data_select[,1], .before = 1)
colnames(data_select_weight)[1]<-disagg
multiply_across(data_select_weight,percent_to_1)
return(output)
#output2<-rbind(output2,output)
}
############################
multiply_across<-function(data=data_select_weight,list=percent_to_1){
varlist<-names(data[,-1])
for(i in varlist){
df1 = data[,i]
for(j in list){
df<-data
df[,i]<-round(df1*j,2)
df<-mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type=paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg))%>%
print(df)
#output<-bind_rows(output,df)
#output<-bind_rows(output,df)
#output[[j]]<-df[[j]]
}
}
}
##############################################################################################
sens_analysis(data,goods,weight,disagg,percent,func)
The expected result if you just run the code straight-up should just be a bunch of printed tibbles, that arent in an object. But ideally, for future analysis on the data or easy of use, a table of the outputs appended together would be best.

So I figured it out and will add my answer here in case someone else hits this issues.
I created a list within loops and then binded those lists together.
Just focus on the binding rows outside the right for-loop.
multiply_across<-function(data=data_select_weight,
list=percent_to_1){
varlist <- colnames(data[, -1])
output_list <- list()
for (i in varlist) {
df1 <- data[,i]
for (j in list) {
name <- paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg)
df <- as_tibble(data)
df[,i] <- round(df1*j, 2)
df <- mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type = paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg))
df<-df[,c(6,1,2,3,4,5)]
output_list[[paste0(i," BY ",(as.numeric(j)-1)*100)]] <- (assign(paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg),df))
}
}
bind_rows(lapply(output_list,
as.data.frame.list,
stringsAsFactors=F))
}

Related

Applying Function to Multiple Files & Handling the resulting dataframes

I've got a function which I'm trying to apply in a for loop that extracts a dataframe from multiple files and combines them into a single one.
This is how, from what I've read, I thought would be the best way to attack it but I get an empty list returned, when I was hoping for a list of dataframes which could be combined using bind_rows.
This is the code I'm using:
combined_functions <- function(file_name) {
#combines the get_dfm_df and get corp function: get dfm tibble straight from the file name
data_frame_returned<- get_dfm_df(getcorp(file_name))
data_frame_returned
}
list_of_dataframes <- list()
file.list <- dir(pattern ="DOCX$")
for (file in file.list) {
dataframe_of_file <- combined_function(file)
append(list_of_dataframes,dataframe_of_file)
}
bind_rows(list_of_dataframes, .id = "column_label") #https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame
It creates an empty list, gets a list of the file names which the function combined_function uses to create a data frame out of the file and should, to my understanding, append this dataframe to the list. After all the files in the directory have been matched, bind_rows should combine it into one overall dataframe but it only returns an empty tibble. list_of_dataframes is also empty.
I've tried the solution in this answer but it didn't help:
Append a data frame to a list
https://www.dropbox.com/sh/z8vh50b370gcb1j/AAAcbnfAUOM6-y8uWn4-lUWLa?dl=0
This a link to the raw files I am using in this case, but I think the problem is a general one.
Appendix:
These are the functions combined_functions refer too. They work on the individual cases so I'm confident this is not the cause of the problem but I've included them for completeness anyway.
rm(list = ls())
library(quanteda)
library(quanteda.corpora)
library(readtext)
library(LexisNexisTools)
library(tidyverse)
library(tools)
getcorp<- function(file_name){
#function to take the lexis word document, convert it into quanteda corpus object, returns duplicate df and date from filename in list
LNToutput <- lnt_read(file_name)
duplicates_df <- lnt_similarity(LNToutput = LNToutput,
threshold = 0.99)
duplicates_df <- duplicates_df[duplicates_df$Similarity > 0.99] #https://github.com/JBGruber/LexisNexisTools creates dataframe of duplicate articles
LNToutput <- LNToutput[!LNToutput#meta$ID %in% duplicates_df$ID_duplicate, ] #removes these duplicates from the main dataframe
corp <- lnt_convert(LNToutput, to = "quanteda") #to return multiple values from the r function, must be placed in a list
corp_date_from_file_name <- basename(file_name)
file_date <- as.Date(corp_date_from_file_name, format ="%d_%m_%y")
list_of_returns <-list(duplicates_df, corp,file_date) #list returns has duplicate df in first position, corpus in second and the file date in third
list_of_returns
}
get_dfm_df <- function(corp_list){
# takes the corp from getcorp, applies lexicoder dictionary, adds the neg_pos etc to their equivalent columns,
# calculates the percentage each category is of the total number of sentiment bearing words, adds the date specified from the file name
corpus_we_want <- corp_list[[2]]
sentiment_df <- dfm(corpus_we_want, dictionary = data_dictionary_LSD2015) %>% #applies the dictionary
convert("data.frame") %>%
cbind(docvars(corpus_we_want)) %>% #https://stackoverflow.com/questions/60419692/how-to-convert-dfm-into-dataframe-but-keeping-docvars
as_tibble() %>%
mutate(combined_negative = negative + neg_positive, combined_positive = positive + neg_negative) %>%
mutate(pos_percentage = combined_positive/(combined_positive + combined_negative ), neg_percentage =combined_negative/(combined_positive + combined_negative ) ) %>%
mutate(date = corp_list[[3]])
sentiment_df
}

How to make this work without global variable in R?

I am parsing some metadata containing json files to similar dataframes. I am using tidyjson. Finally i made it work like this:
append_arrays_and_objects <- function (tbl) {
objs <- tbl %>%
filter(is_json_object(.)) %>% gather_object %>%
append_values_string
arr <- tbl %>%
filter(is_json_array(.)) %>% gather_array %>%
append_values_string
if (nrow(objs) > 0) append_arrays_and_objects(objs)
if (nrow(arr) > 0) append_arrays_and_objects(arr)
print(objs)
print(arr)
res1 <- merge(objs,arr, all=TRUE)
result <<- merge(result,res1, all=TRUE)
result
}
#parse microdata
result <- data.frame()
md <- dataHighest$JSON %>%
enter_object(microdata) %>%
append_arrays_and_objects
rm(result)
It just bothers me that I can't make it work without the global dataframe result. When i tried it by returning any combination of local dataframes it always ends up with a dataframe with the "first level" depth dataframe.. I think when it has all the data collected i cannot seem to pass it back anymore. Should be trivial to solve?

change a for loop to a function to scrape a website

I am trying to scrape a website using the following:
industryurl <- "https://finance.yahoo.com/industries"
library(rvest)
read <- read_html(industryurl) %>%
html_table()
library(plyr)
industries <- ldply(read, data.frame)
industries = industries[-1,]
read <- read_html(industryurl)
industryurls <- html_attr(html_nodes(read, "a"), "href")
links <- industryurls[grep("/industry/", industryurls)]
industryurl <- "https://finance.yahoo.com"
links <- paste0(industryurl, links)
links
##############################################################################################
store <- NULL
tbl <- NULL
for(i in links){
store[[i]] = read_html(i)
tbl[[i]] = html_table(store[[i]])
}
#################################################################################################
I am mostly interested in the code between ########## and I want to apply a function instead of a for loop since I am running into time out issues with yahoo and I want to make it more human like to extract this data (it is not too much).
My question is, how can I take links apply a function and set a sort of delay timer to read in the contents of the for loop?
I can paste my own version of the for loop which does not work.
This is the function I came up with
##First argument is the link you need
##The second argument is the total time for Sys.sleep
extract_function <- function(define_link, define_time){
print(paste0("The system will stop for: ", define_time, " seconds"))
Sys.sleep(define_time)
first <- read_html(define_link)
print(paste0("It will now return the table for link", define_link))
return(html_table(first))
}
##I added the following tryCatch function
link_try_catch <- function(define_link, define_time){
out <- tryCatch(extract_function(define_link,define_time), error =
function(e) NA)
return(out)
}
##You can now retrieve the data using the links vector in two ways
##Picking the first ten, so it should not crash on link 5
p <- lapply(1:10, function(i)link_try_catch(links[i],1))
##OR (I subset the vector just for demo purposes
p2 <- lapply(links[1:10], function(i)extract_function(i,1))
Hope it helps

Simple loop with subset and variable name assignment

I am actually learning R and I don't understand why this simple assignment does not works. I would like to subset by year using the filter function of the dplyr package. After several tentatives, here are a reproducible example using the gapminder dataset.
I could use the subset function, lapply, or even anonymous function to solve this problem, but here, I just want to understand why this specific code is not working.
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
paste0("gapminder", i) <- print(gapminder %>%
filter(year == i))
}
With or without print, same problem
It's because your assignment is to a function (paste0).
If you remove that part it prints each filtered dataframe:
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
print(gapminder %>% filter(year == i))
}
You could assign each to a list, like so:
my_list <- list()
library(gapminder)
library(dplyr)
for (i in seq_along(unique(gapminder$year))) {
year_filter <- unique(gapminder$year)[i] # each iteration we get another year
my_list[[i]] <- gapminder %>% filter(year == year_filter)
cat(paste0("gapminder", year_filter, " ")) # use cat if you want to print at each iteration
}
paste0 just concatenates vectors after converting to character.
Use assign function to store the output.
for (i in unique(gapminder$year))
{
assign(paste0("gapminder", i),print(gapminder %>%filter(year == i)))
}
If you want to get the specific output, use get function.
out_i = get(paste0("gapminder", i))

R new variable assignment

I made a loop that assigns the result of a function to a newly created variable. After that that variable is used to create another.
This second step fails to produce the expected result.
library(stringr)
for (i in 1:length(Ids)){
nam <- paste("data", Ids[i], sep = "_")
assign(nam, GetReportData(query, token,paginate_query = F))
newvar=paste(nam,"contentid",sep="$")
originStr=paste(nam,"pagePath",sep="$")
assign(newvar,str_extract(originStr,"&id=[0-9]+"))
}
Don't create a bunch of variables, store related values in named lists to make it easier to retrieve them. You didn't supply any input to test with, but i'm guessing this does the same thing.
library(stringr)
mydata <- lapply(1:length(Ids), function(i) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
dd
})
This will return a list of data.frames. You can access them with mydata[[1]], mydata[[2]], etc rather than data_1, data_2, etc
If you absolutely insist on creating a bunch of variables, just make sure to do all your transformations on an actual object, and then save that object when your are done. You can never use assign with names that have $ or [ as described in the help page: "assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc." For example
for(i in 1:length(Ids)) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
assign(paste("data",i,sep="_"), dd)
}

Resources