R function overwriting variable name in global environment - r

I'm having trouble getting a function to save separate data frames. It keeps overwriting the name of a variable in the function. The code I have is
LFStable = function(LFS_number, vector_number, titles_number){
LFS_number <- get_cansim_vector(vector_number,
start_time = as.Date(startdate),
end_time = today(),
refresh = FALSE)
Temp <- data.frame(get_cansim_vector_info(vector_number))
Temp <- Temp %>% select(title, VECTOR, table) %>% separate(title,
titles_number,";")
LFS_number <<- left_join(LFS_number,Temp, by = "VECTOR")
rm(Temp)
}
LFStable(LFS0063, V0063, title0063)
LFS_number is the number of the table I'm pulling from. vector_number is the list of vectors from Stats Canada I need from that table. titles_number is the name of the columns I need from the table.
It is making the table in the correct format but instead of naming the table "LFS0063" it names it "LFS_number" and then overwrites it when I run the function again for another table.
How can I get the table to be saved to the global environment with the name I gave it?
Thanks for reading and trying to help!
Edit: based on the comments from #MrFlick and #r2evans , I changed the code to
LFStable = function(LFS_number, vector_number, titles_number){
Temp_table <- get_cansim_vector(vector_number,
start_time = as.Date(startdate),
end_time = today(),
refresh = FALSE)
Temp <- data.frame(get_cansim_vector_info(vector_number))
Temp <- Temp %>% select(title, VECTOR, table) %>% separate(title,
titles_number,";")
LFS_number <<- left_join(Temp_table,Temp, by = "VECTOR")
rm(Temp, vector_number)
}
LFStable(LFS0063, V0063, title0063)
Which produces the same problem as before.
OR
Temp_table <- left_join(Temp_table,Temp, by = "VECTOR")
assign(LFS_number,Temp_table,envir=.GlobalEnv)
rm(Temp, vector_number)
}
LFStable(LFS0063, V0063, title0063)
which gives an error saying "invalid first argument". I created empty data frames with the LFS_number names to assign them to before running the function.

Related

How to create a loop of ppcor function?

I am trying to create a loop to go through and perform a correlation (and in future a partial correlation) using ppcor function on variables stored within a data frame. The first variable (A) will remain the same for all correlations, whilst the second variable (B) will be the next variable along in the next column within my data frame. I have around 1000 variables.
I show the mtcars dataset below as an example, as it is in the same layout as my data.
I've been able to complete the operation successfully when performed manually using cbind to bind 2 columns (the 2 variables of interest) prior to running ppcor on the array ("tmp_df"). I have then been able to bind the output from correlation operation ("mpg_cycl"), ("mpg_disp") into a single object. However I can't get any of this operation to work in a loop. Any ideas please?
library("MASS")
install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
attempting to loop above operation ## (ammended after last reviewer's comments:
for (i in mtcars_df[2:7]){
tmp_df = (cbind(i, mtcars_df$mpg)
i <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i, file = paste0("MyDataOutput",i[1],".csv")
}
I expected the loop to output two of the correlations results to MyDataOutput csv file. But this generates an error message, I thought i was in the correct place?:
Error: unexpected symbol in:
" tmp_df = (cbind(i, mtcars_df$mpg)
i"
Even adding a curly bracket at the end does not resolve issue so I have left this out as it introduces another error message '}'
I have redone some of your code and fixed missing ), }, ". The for cyckle now outputs file with name + name of the variable. Hope this will help.
library("MASS")
#install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
"mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i_resutl, file = paste0("MyDataOutput_",i,".csv"))
}
for merging before saving:
dta <- c()
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
dta <- rbind(dta,c(i,(unlist( i_resutl))))
}

Trycatch in for loop- continue to next r dataRetrieval

I have a list containing the following site id numbers:
sitelist <- c("02074500", "02077200", "208111310", "02081500", "02082950")
I want to use the dataRetrieval package to collect additional information about these sites and save it into individual .csv files. Site number "208111310" does not exist, so it returns an error and stops the code.
I want the code to ignore site numbers that do not return data and continue to the next number in sitelist.
I've tried trycatch in several ways but can't get the correct syntax. Here is my for loop without trycatch.
for (i in sitelist){
test_gage <- readNWISdv(siteNumbers = i,
parameterCd = pCode)
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date))
write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}
You can use the variable error in the function trycatch to specify what happened when an error occurs and store the return value using operator <<-.
for (i in sitelist){
test_gage <- NULL
trycatch(error=function(message){
test_gage <<- readNWISdv(siteNumbers = i,parameterCd = pCode)
}
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date)) write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}
If you want to catch the warnings also just give a second argument to trycatch.
trycatch(error=function(){},warning=function(){})

Iterating through values in R

I'm new-ish to R and am having some trouble iterating through values.
For context: I have data on 60 people over time, and each person has his/her own dataset in a folder (I received the data with id #s 00:59). For each person, there are 2 values I need - time of response and picture response given (a number 1 - 16). I need to convert this data from wide to long format for each person, and then eventually append all of the datasets together.
My problem is that I'm having trouble writing a loop that will do this for each person (i.e. each dataset). Here's the code I have so far:
pam[x] <- fromJSON(file = "PAM_u[x].json")
pam[x]df <- as.data.frame(pam[x])
#Creating long dataframe for times
pam[x]_long_times <- gather(
select(pam[x]df, starts_with("resp")),
key = "time",
value = "resp_times"
)
#Creating long dataframe for pic_nums (affect response)
pam[x]_long_pics <- gather(
select(pam[x]df, starts_with("pic")),
key = "picture",
value = "pic_num"
)
#Combining the two long dataframes so that I have one df per person
pam[x]_long_fin <- bind_cols(pam[x]_long_times, pam[x]_long_pics) %>%
select(resp_times, pic_num) %>%
add_column(id = [x], .before = 1)
If you replace [x] in the above code with a person's id# (e.g. 00), the code will run and will give me the dataframe I want for that person. Any advice on how to do this so I can get all 60 people done?
Thanks!
EDIT
So, using library(jsonlite) rather than library(rjson) set up the files in the format I needed without having to do all of the manipulation. Thanks all for the responses, but the solution was apparently much easier than I'd thought.
I don't know the structure of your json files. If you are not in the same folder, like the json files, try that:
library(jsonlite)
# setup - read files
json_folder <- "U:/test/" #adjust you folder here
files <- list.files(path = paste0(json_folder), pattern = "\\.json$")
# import data
pam <- NULL
pam_df <- NULL
for (i in seq_along(files)) {
pam[[i]] <- fromJSON(file = files[i])
pam_df[[i]] <- as.data.frame(pam[[i]])
}
Here you generally read all json files in the folder and build a vector of a length of 60.
Than you sequence along that vector and read all files.
I assume at the end you can do bind_rowsor add you code in the for loop. But remember to set the data frames to NULL before the loop starts, e.g. pam_long_pics <- NULL
Hope that helped? Let me know.
Something along these lines could work:
#library("tidyverse")
#library("jsonlite")
file_list <- list.files(pattern = "*.json", full.names = TRUE)
Data_raw <- tibble(File_name = file_list) %>%
mutate(File_contents = map(File_name, fromJSON)) %>% # This should result in a nested tibble
mutate(File_contents = map(File_contents, as_tibble))
Data_raw %>%
mutate(Long_times = map(File_contents, ~ gather(key = "time", value = "resp_times", starts_with("resp"))),
Long_pics = map(File_contents, ~ gather(key = "picture", value = "pic_num", starts_with("pic")))) %>%
unnest(Long_times, Long_pics) %>%
select(File_name, resp_times, pic_num)
EDIT: you may or may not need not to include as_tibble() after reading in the JSON files, depending on how your data looks like.

R rbind - numbers of columns of arguments do not match

How can I ignore a data set if some column names don't exist in it?
I have a list of weather data from a stream but I think certain key weather conditions don't exist and therefore I have this error below with rbind:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
My code:
weatherDf <- data.frame()
for(i in weatherData) {
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
So how can I make sure these key conditions below exist in the data:
meanwindspdi
meanwdird
meantempm
humidity
If any of them does not exit, then ignore the bunch of them. Is it possible?
EDIT:
The content of weatherData is in jsfiddle (I can't post it here as it is too long and I dunno where is the best place to show the data publicly for R...)
EDIT 2:
I get some error when I try to export the data into a txt:
> write.table(weatherData,"/home/teelou/Desktop/data/data.txt",sep="\t",row.names=FALSE)
Error in data.frame(date = list(pretty = "January 1, 1970", year = "1970", :
arguments imply differing number of rows: 1, 0
What does it mean? It seems that there are some errors in the data...
EDIT 3:
I have exported my entire data in .RData to my google drive:
https://drive.google.com/file/d/0B_w5RSQMxtRSbjdQYWJMX3pfWXM/view?usp=sharing
If you use RStudio, then you can just import the data.
EDIT 4:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
# If it has data then loop it.
if (!is.null(weatherData)) {
# Initialize a data frame.
weatherDf <- data.frame()
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
# Rename column names.
colnames(weatherDf) <- c("airport", "key_date", "ws", "wd", "tempi", 'humidity')
# Convert certain columns weatherDf type to numberic.
columns <-c("ws", "wd", "tempi", "humidity")
weatherDf[, columns] <- lapply(columns, function(x) as.numeric(weatherDf[[x]]))
}
Inspect the weatherDf:
> View(weatherDf)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
You can use next to skip the current iteration of the loop and go to the next iteration:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# continue with loop...

Which environment should be called when using eval( ) in a function?

I've got a set of functions that I'm trying to work with and I'm struggling to figure out why the assignment isn't working. Here are the functions I'm using:
new_timeline <- function() {
timeline = structure(list(), class="timeline")
timeline$title <- list("text" = list("headline" = NULL, "text" = NULL),
"start_date" = list("year" = NULL, "month" = NULL, "day" = NULL),
"end_date" = list("year" = NULL, "month" = NULL, "day" = NULL))
return(timeline)
}
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>% parse(text = .) %>% eval()
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>% parse(text = .) %>% eval()
if (!is.na(date[3])) stringr::str_interp("self$title$${time_type}_date$day <- date[3]") %>% parse(text = .) %>% eval()
return(self)
}
edit_title <- function(self, headline = NULL, text = NULL, start_date = NULL, end_date = NULL) {
if (class(self) != "timeline") stop("The object passed must be a timeline object.")
if (is.null(headline) && is.null(self$title$text$headline)) stop("Headline cannot be empty when adding a new title.")
if (!is.null(headline)) self$title$text$headline <- headline
if (!is.null(text)) self$title$text$text <- text
if (!is.null(start_date)) self <- .add_date(self, date = start_date, time_type = "start")
if (!is.null(end_date)) self <- .add_date(self, date = end_date, time_type = "end")
return(self)
}
EDIT: The above code has been severely reduced per a request in the comments. The code is still sufficient to reproduce the error.
I know that's a bit long-winded, so I apologize. The first function establishes a new timeline object. The third function allows us to change the title of the timeline object and the second function is a helper function that handles dates. The code would be used like this:
library(magrittr)
#devtools::install_github("hadley/stringr")
library(stringr)
tl <- new_timeline()
tl <- tl %>% edit_title(headline = "My Timeline", text = "Example", start_date = "2015-10-18")
The code runs with no errors, but when I call tl$title$start_date$year, it comes back as NULL. Using an answer I got in this previous question I asked, I tried to set envir = globalenv() within the eval function. When I do that, the function returns an error saying that object self cannot be found.
So I'm under the impression that self is held in the parent.frame(). So I add both of these to a list: envir = list(globalenv(), parent.frame()). This causes the function to run without error, but there's still no assignment.
Where am I going wrong? Thanks in advance!
As mentioned in the comments, I think you could probably do away with all of the code parsing and just pass variables in [[ for your assignments. Anyway, when you use the pipe operator a bunch of function wrapping happens so determining how many frames to go back is painful. Here are a couple solutions modifying the .add_date function.
You already found one, using <<-, since it searches back through the parent environments until it finds the variable (or doesnt and assigns it in the global).
Another would be just storing the function environment() and passing that to eval.
A third would be counting how many frames deep you go, and using sys.frame to tell eval which environment to look in.
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
## Examining environemnts
e <- environment() # current env
efirst <- sys.nframe() # frame number
print(paste("Currently in frame", efirst))
envs <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.frames()} # list of frames
elast <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.nframe()} # number of last
print(paste("Went", elast, "frames deep."))
## Go back this many frames in eval
goback <- efirst-elast
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
## Solution 1: use sys.frame
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>%
parse(text = .) %>% eval(envir=sys.frame(goback))
## Solution 2: use environment defined in function
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>%
parse(text = .) %>% eval(envir=e)
return(self)
}

Resources