Trycatch in for loop- continue to next r dataRetrieval - r

I have a list containing the following site id numbers:
sitelist <- c("02074500", "02077200", "208111310", "02081500", "02082950")
I want to use the dataRetrieval package to collect additional information about these sites and save it into individual .csv files. Site number "208111310" does not exist, so it returns an error and stops the code.
I want the code to ignore site numbers that do not return data and continue to the next number in sitelist.
I've tried trycatch in several ways but can't get the correct syntax. Here is my for loop without trycatch.
for (i in sitelist){
test_gage <- readNWISdv(siteNumbers = i,
parameterCd = pCode)
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date))
write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}

You can use the variable error in the function trycatch to specify what happened when an error occurs and store the return value using operator <<-.
for (i in sitelist){
test_gage <- NULL
trycatch(error=function(message){
test_gage <<- readNWISdv(siteNumbers = i,parameterCd = pCode)
}
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date)) write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}
If you want to catch the warnings also just give a second argument to trycatch.
trycatch(error=function(){},warning=function(){})

Related

R function overwriting variable name in global environment

I'm having trouble getting a function to save separate data frames. It keeps overwriting the name of a variable in the function. The code I have is
LFStable = function(LFS_number, vector_number, titles_number){
LFS_number <- get_cansim_vector(vector_number,
start_time = as.Date(startdate),
end_time = today(),
refresh = FALSE)
Temp <- data.frame(get_cansim_vector_info(vector_number))
Temp <- Temp %>% select(title, VECTOR, table) %>% separate(title,
titles_number,";")
LFS_number <<- left_join(LFS_number,Temp, by = "VECTOR")
rm(Temp)
}
LFStable(LFS0063, V0063, title0063)
LFS_number is the number of the table I'm pulling from. vector_number is the list of vectors from Stats Canada I need from that table. titles_number is the name of the columns I need from the table.
It is making the table in the correct format but instead of naming the table "LFS0063" it names it "LFS_number" and then overwrites it when I run the function again for another table.
How can I get the table to be saved to the global environment with the name I gave it?
Thanks for reading and trying to help!
Edit: based on the comments from #MrFlick and #r2evans , I changed the code to
LFStable = function(LFS_number, vector_number, titles_number){
Temp_table <- get_cansim_vector(vector_number,
start_time = as.Date(startdate),
end_time = today(),
refresh = FALSE)
Temp <- data.frame(get_cansim_vector_info(vector_number))
Temp <- Temp %>% select(title, VECTOR, table) %>% separate(title,
titles_number,";")
LFS_number <<- left_join(Temp_table,Temp, by = "VECTOR")
rm(Temp, vector_number)
}
LFStable(LFS0063, V0063, title0063)
Which produces the same problem as before.
OR
Temp_table <- left_join(Temp_table,Temp, by = "VECTOR")
assign(LFS_number,Temp_table,envir=.GlobalEnv)
rm(Temp, vector_number)
}
LFStable(LFS0063, V0063, title0063)
which gives an error saying "invalid first argument". I created empty data frames with the LFS_number names to assign them to before running the function.

Function for looping over common variable names with different suffixes in R

I have some code which I'm looking to replicate many times, each for a different country as the suffix.
Assuming 3 countries as a simple example:
country_list <- c('ALB', 'ARE', 'ARG')
I'm trying to create a series of variables called a_m5_ALB, a_m5_ARE, a_m5_ARG etc which have various functions e.g. addcol or round_df applied to reg_math_ALB, reg_math_ARE, reg_math_ARG etc
for (i in country_list) {
paste("a_m5", i , sep = "_") <- addcol(paste("reg_math", i , sep = "_"))
}
for (i in country_list) {
paste("a_m5", i , sep = "_") <- round_df(paste("reg_math", i , sep = "_"))
}
where addcol and round_df are defined as:
addcol = function(y){
dat1 = mutate(y, p.value = ((1 - pt(q = abs(reg.t.value), df = dof))*2))
return(dat1)
}
round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
The loop errors when any of the functions are added in brackets before the paste variable part but it works if doing it manually e.g.
a_m5_ALB <- addcol(reg_math_ALB)
Please could you help? I think it's the application of the function in a loop which i'm getting wrong.
Errors:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "character"
Error in round(x[numeric_columns], digits) :
non-numeric argument to mathematical function
Thank you
From your examples, you're really in a case where everything should be in a single dataframe. Here, keeping separate variables for each country is not the right tool for the job. Say you have your per-country dataframes saved as csv, you can rewrite everything as:
library(tidyverse)
country_list <- c('ALB', 'ARE', 'ARG')
read_data <- function(ctry){
read_csv(paste0("/path/to/file/", "reg_math_", ctry)) %>%
add_column(country = ctry)
}
total_df <- map_dfr(country_list, read_data)
total_df %>%
mutate(p.value = (1 - pt(q = abs(reg.t.value), df = dof))*2) %>%
mutate(across(where(is.numeric), round, digits = digits))
And it gives you immediate access to all other dplyr functions that are great for this kind of manipulation.

How to create a loop of ppcor function?

I am trying to create a loop to go through and perform a correlation (and in future a partial correlation) using ppcor function on variables stored within a data frame. The first variable (A) will remain the same for all correlations, whilst the second variable (B) will be the next variable along in the next column within my data frame. I have around 1000 variables.
I show the mtcars dataset below as an example, as it is in the same layout as my data.
I've been able to complete the operation successfully when performed manually using cbind to bind 2 columns (the 2 variables of interest) prior to running ppcor on the array ("tmp_df"). I have then been able to bind the output from correlation operation ("mpg_cycl"), ("mpg_disp") into a single object. However I can't get any of this operation to work in a loop. Any ideas please?
library("MASS")
install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
attempting to loop above operation ## (ammended after last reviewer's comments:
for (i in mtcars_df[2:7]){
tmp_df = (cbind(i, mtcars_df$mpg)
i <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i, file = paste0("MyDataOutput",i[1],".csv")
}
I expected the loop to output two of the correlations results to MyDataOutput csv file. But this generates an error message, I thought i was in the correct place?:
Error: unexpected symbol in:
" tmp_df = (cbind(i, mtcars_df$mpg)
i"
Even adding a curly bracket at the end does not resolve issue so I have left this out as it introduces another error message '}'
I have redone some of your code and fixed missing ), }, ". The for cyckle now outputs file with name + name of the variable. Hope this will help.
library("MASS")
#install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
"mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i_resutl, file = paste0("MyDataOutput_",i,".csv"))
}
for merging before saving:
dta <- c()
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
dta <- rbind(dta,c(i,(unlist( i_resutl))))
}

What is wrong with my pattern matching and replacement function

I have a dataframe with temperatures in the format XX,X instead of XX.X.
I can use the following code to successfully change them...
df$tempMedian <- sub(",",".",df$tempMedian)
df$tempMedian <- as.numeric(df$tempMedian)
I've tried writing the following function to do the same thing:
comma_to_point <- function(data, colname){
data$colname <- sub(",", ".", data$colname)
data$colname <- as.numeric(data$colname)
}
When I call the function:
comma_to_point(df, tempMedian)
I get the following error:
"Error in `$<-.data.frame`(`tmp`, colname, value = character(0)) :
replacement has 0 rows, data has 365"
My dataframe is 365 obs long.
Give this a shot
comma_to_point <- function(data, colname){
data[[colname]] <- sub(",", ".", data[[colname]])
data[[colname]] <- as.numeric(data[[colname]])
return (data)
}
df = comma_to_point(df, "tempMedian")
When using a variable var='my_column' to reference a column in a data.frame, you can't do df$var, since R will think var is the name of the column. Instead you can get the column with df[[var]].

Function to create new binary variables within existing dataframe?

This question is related to a previous topic:
How to use custom function to create new binary variables within existing dataframe?
I would like to use a similar function but be able to use a vector to specify ICD9 diagnosis variables within the dataframe to search for (e.g., "diag_1", "diag_2","diag_1", etc )
I tried
y<-c("diag_1","diag_2","diag_1")
diagnosis_func(patient_db, y, "2851", "Anemia")
but I get the following error:
Error in `[[<-`(`*tmp*`, i, value = value) :
recursive indexing failed at level 2
Below is the working function by Benjamin from the referenced post. However, it works only from 1 diagnosis variable at a time. Ultimately I need to create a new binary variable that indicates if a patient has a specific diagnosis by querying the 25 diagnosis variables of the dataframe.
*targetcolumn is the icd9 diagnosis variables "diag_1"..."diag_20" is the one I would like to input as vector
diagnosis_func <- function(data, target_col, icd, new_col){
pattern <- sprintf("^(%s)",
paste0(icd, collapse = "|"))
data[[new_col]] <- grepl(pattern = pattern,
x = data[[target_col]]) + 0L
data
}
diagnosis_func(patient_db, "diag_1", "2851", "Anemia")
This non-function version works for multiple diagnosis. However I have not figured out how to use it in a function version as above.
pattern = paste("^(", paste0("2851", collapse = "|"), ")", sep = "")
df$anemia<-ifelse(rowSums(sapply(df[c("diag_1","diag_2","diag_3")], grepl, pattern = pattern)) != 0,"1","0")
Any help or guidance on how to get this function to work would be greatly appreciated.
Best,
Albit
Try this modified version of Benjamin's function:
diagnosis_func <- function(data, target_col, icd, new_col){
pattern <- sprintf("^(%s)",
paste0(icd, collapse = "|"))
new <- apply(data[target_col], 2, function(x) grepl(pattern=pattern, x)) + 0L
data[[new_col]] <- ifelse(rowSums(new)>0, 1,0)
data
}

Resources