purrr:pmap not passing values from a list - r

Calling a function from pmap is throwing an error as arguments are not being passed on their own
Tried creating lists of the parameters but this too has resulted in an error
library(tidyverse)
library(dplyr)
periods <- c(10,11,12)
redemption <- rep(100,3)
firstcallDate <- c("2014-01-01","2015-01-01","2016-01-01")
testdf <- data.frame(redemption, periods, firstcallDate)
testdf$firstcallDate <- as.Date(testdf$firstcallDate)
testdf$CallSch <- NA
CallScheduleGen <- function (redemption,periods,firstcallDate, ...) {
Price <- rep(as.double(redemption),periods)
Date <- seq(firstcallDate, by = "1 month", length = periods)
callSch <- data.frame(Price, Date)
return(callSch)
}
testdf$CallSch <- pmap_dfr(testdf,CallScheduleGen)
I am expecting a dataframe to be created in each of the cells in the testdf dataframe. Pmap appears to pass input arguments to the functions as a list, rather than element-wise.
Can anyone suggest an approach as I need this to be vectorized rather than creating a loop?

Related

Pass a function input as column name to data.frame function

I have a function taking a character input. Within the function, I want to use the data.frame() function. Within the data.frame() function, one column name should be the function's character input.
I tried it like this and it didn't work:
frame_create <- function(data, **character_input**){
...
some_vector <- c(1:50)
temp_frame <- data.frame(**character_input** = some_vector, ...)
return(temp_frame)
}
Either use, names to assign or with setNames as = wouldn't allow evaluation on the lhs of =. In package functions i.e tibble or lst, it can be created with := and !!
frame_create <- function(data, character_input){
some_vector <- 1:50
temp_frame <- data.frame(some_vector)
names(temp_frame) <- character_input
return(temp_frame)
}
Can you explain your requirement for using a function to create a new dataframe column? If you have a dataframe df and you want to make a copy with a new column appended then the trivial solution is:
df2 <- df
df2$new_col <- 1:50
Example of merging multiple dataframes in R:
cars1 <- mtcars
cars2 <- cars1
cars3 <- cars2
list1 <- list(cars1, cars2, cars3)
all_cars <- Reduce(rbind, list1)

Storing Result from API Call in Data Frame through Loop in R

I would like to store a result of an API call in a data frame. The code should loop through different time periods and countries.
If I do one example, it's like this:
testapicall <- jsonlite::fromJSON("https.api.companyname/jurisdiction=eu_ger&date=2018-01-01:2018-01:31&api_token=XYZ")
testapicall[["results"]]$total_number
Now I want to to get this "total number" for different jurisdictions and date ranges. One column should be country name, one should be the date (e.g., 01-2018), and one should be the total_number.
To set up the loop, I've split the API key into 3 parts:
base_string1 <- "https.api.companyname/jurisdiction="
base_string2 <- "&date="
end_string <- "api_token=XYZ"
Then, I can create the dates and countries the dates and countries like this:
dates <- seq(as.Date("1990-01-01"), as.Date("2022-01-01"), by = "month")
dates <- paste(head(dates, -1), tail(dates-1, - 1), sep = ":")
countries<- paste0("eu_", c("fra", "ger"))
Now, I'd like to get the results for each country-date into a data frame, but this is the part I am not sure how to do. I know I have to make an empty data frame and then fill it somehow.
for (y in date){
for(c in countries){
api_string <- paste0(base_string1,y, base_string2,c, end_string)
json <- jsonlite::fromJSON(api_string)
json[["results"]]$total_number
}
}
Any help is appreciated!
You can use map_dfr from purrr to iterate over the countries and dates and generate a dataframe with a single row for each iteration. map_dfr will row bind all the dataframes together.
library(purrr)
map_dfr(dates, function(date){
map_dfr(countries, function(country){
api_string <- paste0(base_string1, date, base_string2, country, end_string)
json <- jsonlite::fromJSON(api_string)
data.frame(country = country,
date = date,
total_number = json[["results"]]$total_number)
})
})
Consider expand.grid to build all possible pairwise combinations of country and month dates into data frame and then mapply to create a new column to retrieve the API data elementwise between each country and month range.
Also, consider a user-defined method that uses tryCatch (useful with API calls) to return NA on error and not stop on problematic urls.
# INPUTS
dates <- seq(as.Date("1990-01-01"), as.Date("2022-01-01"), by="month")
countries <- paste0("eu_", c("fra", "ger"))
# USER-DEFINED METHODS
get_api_data <- function(cnty, rng) {
url <- paste0(
"https.api.companyname/jurisdiction=", cnty,
"&date=", rng, "api_token=XYZ"
)
tryCatch({
api_response <- jsonlite::fromJSON(url)
}, error = function(e) {
paste0(url, " : ", e)
return(NA_real_)
})
return(api_response$results$total_number)
}
add.months <- function(date, n)
seq.Date(date, by=paste(n, "months"), length = 2)[2]
# BUILD DATA FRAME
api_results_df <- expand.grid(
country = countries, date = dates
) |> within({
month_add <- sapply(date, add.months, n=1) |> `class<-`("Date")
ranges <- paste(date, month_add-1, sep=":")
# PASS COLUMN DATA ELEMENTWISE INTO DEFINED METHOD
total_number <- mapply(get_api_data, cnty=country, rng=ranges)
rm(ranges, month_add)
})

How to apply function to each group of data frame

I have a dataframe called covars with three ethnicities. How do I apply function Get_STATs so I can get the output for each ethnicity?
Right, now I am running it like this:
tt <- covars[covars$ETHNICITY == "HISPANIC",]
Get_STATs(tt)
tt <- covars[covars$ETHNICITY == "WHITE",]
Get_STATs(tt)
tt <- covars[covars$ETHNICITY == "ASIAN",]
Get_STATs(tt)
I tried to run it like this
aggregate(covars, by = list(covars$ETHNICITY), FUN = Get_STATs)
which generates error rror: $ operator is invalid for atomic vectors
aggregate runs on each column separately, we may need by
do.call(rbind, by(covars, covars$ETHNICITY, FUN = Get_STATs))
Or split into a list and loop over the list and apply the function
do.call(rbind, lapply(split(covars, covars$ETHNICITY), Get_STATs))
If we need the ETHNICITY names as well
lst1 <- split(covars, covars$ETHNICITY)
do.call(rbind, Map(cbind, ETHNICITY = names(lst1), lapply(lst1, Get_STATs)))
Depending on the Get_STATs function, you can use dplyr:
tt <-
covars %>%
group_by(ETHNICITY) %>%
Get_STATs()

Create new column in each dataframe of list and fill with string from character vector based on position (R)

I have a list of data frames and a character vector with strings. The number of dataframes and the number of strings in the chr are the same.
I'd like to populate a specific column in each dataframe in the list with the string at the corresponding position in the character vector
dfs<-list(mtcars[,1:4], iris[,1:4])
dfs <- lapply(dfs, function(x) transform(x, mycol=""))
z <- c("red", "blue")
As the final output I'd like
dfs[[1]]$mycol to be populated with red and
dfs[[2]]$mycol to be populated with blue
Conceptually, I think I need to do something like this:
dfs <- lapply(dfs, function(n) dfs[[n]]$mycol <- z[n]), but I get the error
Error in z[n] : invalid subscript type 'list'
The real data is a list of 97 elements
You can also try creating directly mycol with mapply():
#Data
dfs<-list(mtcars[,1:4], iris[,1:4])
z <- c("red", "blue")
#Code
L <- mapply(function(x,y) {x$mycol<-y;return(x)},x=dfs,y=z,SIMPLIFY = F)
This is what you're after
lapply(1:n, function(x) transform(dfs[x], mycol = z[x]))
When you want to perform an apply by passing an index on several objects, simply apply on 1:n then pass this as an argument to the different objects within the anonymous function.
EDIT transform only works with all objects contained in the same environment. So it throws the error object not found with the previous code because dfs exists the .GlobalEnv while x exists only in the function environments.
The below code works
lapply(c(1:n), function(x) {
toreplace <<- z[x] # forcing to parent envir using <<-
base::transform(dfs[x], mycol = toreplace)
})
We could use Map with transform in base R
Map(transform, dfs, mycol = z)
Or map2 from purrr
library(purrr)
library(dplyr)
map2(dfs, z, ~ .x %>%
mutate(mycol = .y))

Compute median per column in loop

I have this loop to compute the mean per column, which works.
for (i in 1:length(DF1)) {
tempA <- DF1[i] # save column of DF1 onto temp variable
names(tempA) <- 'word' # label temp variable for inner_join function
DF2 <- inner_join(tempA, DF0, by='word') # match words with numeric value from look-up DF0
tempB <- as.data.frame(t(colMeans(DF2[-1]))) # compute mean of column
DF3<- rbind(tempB, DF3) # save results togther
}
The script uses the dplyr package for inner_join.
DF0 is the look-up database with 3 columns (word, value1, value2, value3).
DF 1 is the text data with one word per cell.
DF3 is the output.
Now I want to compute the median instead of the mean. It seemed easy enough with the colMedians function from 'robustbase', but I can't get the below to work.
library(robustbase)
for (i in 1:length(DF1)) {
tempA <- DF1[i]
names(tempA) <- 'word'
DF2 <- inner_join(tempA, DF0, by='word')
tempB <- as.data.frame(t(colMedians(DF2[-1])))
DF3<- rbind(tempB, DF3)
}
The error message reads:
Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.
I've tried to format DF2 as a matrix prior to the colMedians function, but still get the error message:
Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.
I don't understand what is going on here. Thanks for the help!
Happy to provide sample data and error traceback, but trying to keep it as crisp and simple as possible.
According to the comment by the OP, the following solved the problem.
I have added a call to library(dplyr).
My contribution was colMedians(data.matrix(DF2[-1]), na.rm = TRUE).
library(robustbase)
library(dplyr)
for (i in 1:length(DF1)) {
tempA <- DF1[i]
names(tempA) <- 'word'
DF2 <- inner_join(tempA, DF0, by='word')
tempB <- colMedians(data.matrix(DF2[-1]), na.rm = TRUE)
DF3 <- rbind(tempB, DF3)
}
Stumbled on this answer which helped me fix the loop as following:
DF3Mean <- data.frame() # instantiate dataframe
DF4Median <- data.frame( # instantiate dataframe
for (i in 1:length(DF1)) {
tempA <- DF1[i] # save column of DF1 onto temp variable
names(tempA) <- 'word' # label temp variable for inner_join function
DF2 <- inner_join(tempA, DF0, by='word') # match words with numeric value from look-up DF0
tempMean <- as.data.frame(t(colMeans(DF2[-1]))) # compute mean of column
DF3Mean <- rbind(tempMean, DF3Mean) # save results togther
tempMedian <- apply(DF2[ ,2:4], 2, median) #compute mean for columns 2,3, and 4
DF4Median <- rbind(tempMedian, DF4Median) # save results togther
}
I guess I was too stuck in my mind on the colMedian function.

Resources