issues with Iterating over f(x, y) using map2_ - r

I am using a wrapper for the LastFM API to search for track Tags.
the wrapper function is...
devtools::install_github("juyeongkim/lastfmr")
track_getInfo("track", "artist", api_key= lastkey)
I defined my own function as
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y,api_key = lastkey)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(output_1)
}
Then prepared my list elements from my larger data frame
artist4lf <- c(small_descriptive[1:10,2])
track4lf <- c(small_descriptive[1:10,3])
x<- vector("list", length = length(track4lf))
y<- vector("list", length = length(artist4lf))
names(x) <- track4lf
names(y) <- artist4lf
Then...
map2_df(track4lf, artist4lf, INFOLM)
I get a 0x0 tibble back everytime... does anyone have a suggestion?

I think your INFOLM function will work if you just delete the api_key argument from the track_getInfo function.
Also, not sure you need to use purrr::map2 here, you should be able to use your small_descriptive dataframe with rowwise and mutate to add the column(s) you want.
Here's a go, using testdf as if it's your small_descriptive dataframe with only the track and artist columns.
library(lastfmr)
library(dplyr)
library(tidyr)
testdf <- tribble(
~Artist, ~Track,
"SmashMouth", "All Star",
"Garth Brooks", "The Dance"
)
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(paste(output_1, collapse = ","))
}
testdf %>% rowwise %>%
mutate(stuff = INFOLM(Track, Artist)) %>%
tidyr::separate(stuff, c("Tag1", "Tag2", "Tag3", "Tag4", "Tag5"), sep = ",")

Related

Create a list of all dataframes/tibbles in the global environment?

How does one create a named list of all dataframes/tibbles in the global environment in R? Is there a way to do this without manually hardcoding all dataframes/tibbles?
I.e. if the global environment contains the dataframes/tibbles df_1, my_data_1, science_1, all_data, how does one create an output that looks like:
files_list <- list(
df_1 = df_1,
my_data_1 = my_data_1,
science_1 = science_1,
all_data = all_data
)
We may Filter the elements that are data.frame or tibble in the environment that we are working on - e.g. in the global env, it can be
Filter(length, eapply(.GlobalEnv,
function(x) if(is.data.frame(x)||is_tibble(x)) x))
We can get all objects first, then keep only the data.frames
library(purrr)
mget(ls()) %>% keep(is.data.frame)
A base way, combining methods of #GuedesBF and #akrun could be using ls, mget and Filter.
Filter(is.data.frame, mget(ls()))
#Filter(is.data.frame, mget(ls(.GlobalEnv))) #More explicit using globEnv
Please try the below code which will generate a df
naml <- list()
for (i in seq_along(ls(envir =.GlobalEnv))) {
j <- ls(envir =.GlobalEnv)[i]
if (any(class(get(j))=='data.frame')) name <- {j} else name <- NA
if (any(class(get(j))=='data.frame')) class <- class(get(j))[3] else class <- NA
if (!is.na(name) & !is.na(class)) {
df <- data.frame(namex=name,classx=class)
naml[[j]] <- df
}
}
df2 <- do.call(rbind, naml) %>% rownames_to_column('name') %>%
pivot_wider(names_from = name, values_from = namex)

Function for looping over common variable names with different suffixes in R

I have some code which I'm looking to replicate many times, each for a different country as the suffix.
Assuming 3 countries as a simple example:
country_list <- c('ALB', 'ARE', 'ARG')
I'm trying to create a series of variables called a_m5_ALB, a_m5_ARE, a_m5_ARG etc which have various functions e.g. addcol or round_df applied to reg_math_ALB, reg_math_ARE, reg_math_ARG etc
for (i in country_list) {
paste("a_m5", i , sep = "_") <- addcol(paste("reg_math", i , sep = "_"))
}
for (i in country_list) {
paste("a_m5", i , sep = "_") <- round_df(paste("reg_math", i , sep = "_"))
}
where addcol and round_df are defined as:
addcol = function(y){
dat1 = mutate(y, p.value = ((1 - pt(q = abs(reg.t.value), df = dof))*2))
return(dat1)
}
round_df <- function(x, digits) {
numeric_columns <- sapply(x, mode) == 'numeric'
x[numeric_columns] <- round(x[numeric_columns], digits)
x
}
The loop errors when any of the functions are added in brackets before the paste variable part but it works if doing it manually e.g.
a_m5_ALB <- addcol(reg_math_ALB)
Please could you help? I think it's the application of the function in a loop which i'm getting wrong.
Errors:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "character"
Error in round(x[numeric_columns], digits) :
non-numeric argument to mathematical function
Thank you
From your examples, you're really in a case where everything should be in a single dataframe. Here, keeping separate variables for each country is not the right tool for the job. Say you have your per-country dataframes saved as csv, you can rewrite everything as:
library(tidyverse)
country_list <- c('ALB', 'ARE', 'ARG')
read_data <- function(ctry){
read_csv(paste0("/path/to/file/", "reg_math_", ctry)) %>%
add_column(country = ctry)
}
total_df <- map_dfr(country_list, read_data)
total_df %>%
mutate(p.value = (1 - pt(q = abs(reg.t.value), df = dof))*2) %>%
mutate(across(where(is.numeric), round, digits = digits))
And it gives you immediate access to all other dplyr functions that are great for this kind of manipulation.

Retain value from nested for loop

So basically I am trying the following loop:
rawData = read.csv(file = "SampleData.csv")
companySplit = split(rawData, rawData$Company)
NameOfCompany <- numeric()
DateOfOrder <- character()
WhichProducts <- numeric()
for (i in 1:length(companySplit)){
company_DateSplit = split(companySplit[[i]], companySplit[[i]]$Date)
for (j in 1:length(company_DateSplit)){
WhichProducts[j] <- (paste0(company_DateSplit[[j]]$ID, collapse=","))
DateOfOrder[j] <- (paste0(company_DateSplit[[j]]$Date[1]))
NameOfCompany[j] <- (paste0(companySplit[[i]]$Company[[1]]))
}
}
df <- data.frame(NameOfCompany,DateOfOrder, WhichProducts)
write.csv(df, file = "basket.csv")
If you check basket.csv there is output for only company D. It is not writing because of nesting of for loops I guess. I am not able to get out of it.
I need exact output as basket.csv but for all companies.
Here are the CSVs:
Input Data: Link
Output of code basket.csv: Link
The output should look like this:
Company,Date, All IDs comma seperated.
e.g.
A,Jan-18,(1,2,4)
A,Feb-18,(1,4)
B,Jan-18,(2,3,4)
I'm able to get it from the above code. But Not able to save it in CSV for all A,B,C,D companies. It saves values for only company D which is the last value in looping. (check output file link)
The initial error is that you import your data without the parameter stringsAsFactors = FALSE which happens all the time. Also, looping in R is usually less efficient and harder to reason about than using a more functional approach. I think what you're trying to do can be done with the aggregate function
rawData <- read.csv(file = "SampleData.csv", stringsAsFactors = FALSE)
df <- aggregate(ID ~ Company + Date, data = rawData, FUN = paste, collapse = ",")
colnames(df) <- c("NameOfCompany", "DateOfOrder", "ID")
df = split(df, df$NameOfCompany)
Or using a tidy approach
df <- rawData %>% group_by(Company, Date) %>%
summarise(WhichProducts=paste(ID,collapse=',')) %>%
rename(DateOfOrder = Date) %>%
rename(NameOfCompany = Company) %>%
group_split()

How do I make a function in R with a mass amount of code?

I think this is a fairly basic question, as I am a new R user, but I want to make it so that I can activate the entire code below with a single entry/word (I presumed it would be a function). If this has already been asked, I apologize, and please refer me to the link where it is answered. Thank you in advance for all help.
My code:
head(yelp, 10)
str(yelp)
yelp_flat<- flatten(yelp)
str(yelp_flat)
library(tibble)
yelp_tbl <- as_data_frame(yelp_flat)
yelp_tbl
yelp_tbl$newcolumn <- NULL
yelp_tbl$newcolumn1 <- NULL
yelp_tbl$shotClock <- NULL
yelp_tbl$period <- NULL
yelp_tbl$wallClock <- NULL
yelp_tbl$gameClock <- NULL
yelp_tbl$gameClockStopped <- NULL
yelp_tbl$ball <- NULL
head(yelp_tbl)
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
library(tidyr)
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
Away_Team <- u
Away_Team$garbage <- NULL
Away_Team$playerId1<- NULL
Away_Team$aplayer_x <- NULL
Away_Team$aplayer_y <- NULL
Away_Team$aplayer_z <- NULL
Away_Team$dispose <- NULL
Away_Team$brack <- NULL
Away_Team$kol <- NULL
Away_Team$tra <- NULL
View(Away_Team)
yelp_tbl
yelp_tbl$newcolumn <- NULL
yelp_tbl$newcolumn1 <- NULL
yelp_tbl$shotClock <- NULL
yelp_tbl$period <- NULL
yelp_tbl$wallClock <- NULL
yelp_tbl$gameClock <- NULL
yelp_tbl$gameClockStopped <- NULL
yelp_tbl$ball <- NULL
head(yelp_tbl)
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
library(tidyr)
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
Home_Team <- u
Home_Team$garbage <- NULL
Home_Team$playerId1<- NULL
Home_Team$hplayer_x <- NULL
Home_Team$hplayer_y <- NULL
Home_Team$hplayer_z <- NULL
Home_Team$dispose <- NULL
Home_Team$brack <- NULL
Home_Team$kol <- NULL
Home_Team$tra <- NULL
View(Home_Team)
View (Away_Team)
Table <- rbind(Home_Team, Away_Team)
View(Table) #order frameIdx to see correct order
So, indeed you should make a function. Here are some steps to follow:
1. Put all your code in your function
my_function <- function(){
# Your code
}
2. Identify what you have as an input (aka, what your are not building in your code), they will become the argument of your function
my_function <- function(arg1, arg2, ...){
# Your code
}
In your example, I identified yelp
3. Identify what you want to output (ideally only one object), they will be in the return of your function
my_function <- function(arg1, arg2, ...){
# Your code
return(output)
}
In your example I identified Table
4. Take all the import/library and put them outside your function
library(lib1)
my_function <- function(arg1, arg2, ...){
# Your code
return(output)
}
EDIT using #r2evans suggestion: Using libraryis generally used instead of require, here and here is some literature on it.
In your code I identified tidyr and tibble
5. Identify what you want to print/View and what was just for debugging. Add a print to print, suppres what you don't want
6. Add some comments/slice your code
For example I would add something like # Creating XXX table
7. Improve code quality
You should try to minimize the number of line of code (for example using loops and avoiding code to be in double). Make variables names explicit (instead of k, u, r...)
Regarding loop, in your code you drop some columns on at a time, you could do a loop to drop them in order. (It's what I have done bellow). It helps to make your code easier to read/debug. In this particular case, as Gregor said it is heaven faster to drop them all at once with using a list of column names (if you are interested check his comment).
Here you go:
There are still some improvement to do especially regarding point number 7 and 5.
library(tibble)
library(tidyr)
yelp_function <- function(yelp){
# Printing the input
print(head(yelp, 10))
print(str(yelp))
# Flatten table
yelp_flat<- flatten(yelp)
print(str(yelp_flat))
# Create yelp_tbl and drop some columns
yelp_tbl <- as_data_frame(yelp_flat)
# Drop some columns
for (col in c("newcolumn", "newcolumn1", "shotClock", "period", "wallClock", "gameClock", "gameClockStopped", "ball")){
yelp_tbl[, col] <- NULL
}
print(head(yelp_tbl))
# Build some table
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
# Build away team
Away_Team <- u
# Build yelp table: I'm not quite sure why you are rebdoing that... Is this code necessary?
yelp_tbl
# Drop some columns
for (col in c("newcolumn", "newcolumn1", "shotClock", "period", "wallClock", "gameClock", "gameClockStopped", "ball")){
yelp_tbl[, col] <- NULL
}
print(head(yelp_tbl))
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
# Build some table
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
## Build home_team
Home_Team <- u
# Drop some columns
for (col in c("garbage", "playerId1", "aplayer_x", "aplayer_y", "aplayer_z", "dispose", "brack", "kol", "tra")){
Away_Team[, col] <- NULL
Home_Team[, col] <- NULL
}
# Merge
Table <- rbind(Home_Team, Away_Team)
# Return
return(Table)
}
View(Table) #order frameIdx to see correct order
Run it:
To run your code you now just have to execute the function with the needed argument:
yelp_function(yelp)
NB 1: please note that I didn't tested the code since you didn't provide data to run it. To improve your question you should give some data using dputfunction.
NB 2: There is always room for improvement in the code so you might want to go further and llok into refactoring to avoid having code in double. Control your inputs with some sanity check...
It's rather simple.
You do this:
foo <- function{
#all your code goes here
}
Then you call your function by typing (in console for instance):
foo()

Data Harvesting in R: Get nested lists, unlist, make edits, re-nest them back

the following code harvests data from a website. I retrieve a list of lists, I want to unlist one of the lists, edit it, then re-nest it back into the data into the form the data was received. Here is my code below, it fails one the re-nesting.
library(jsonlite)
library(plyr)
library(ckanr)
library(purrr)
library(dplyr)
ckanr_setup(url = "https://energydata.info/")
package_search(q = 'organization:world-bank-grou')$count
json_data2 <- fromJSON("https://energydata.info/api/3/action/package_search?q=organization:world-bank-grou", flatten = TRUE)
dat2 <- json_data2$result
str(dat2)
###########
#Get the datasets and unlist metadata
###########
df <- as.data.frame(json_data2$result$results)
Tags <- select(df, id, topic)
#Make some edits
Tags$topic <- tolower(Tags$topic)
res <- rbind.fill(lapply(Tags,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
res$V1 = paste0("Some edit:",res$V1)
res$V2 = paste0("Some edits:", res$V2)
res$V3 = paste0("Some edit:", res$V3)
res[res=="Some edit:NA"]<-NA
res$V1 <- gsub(" ", "_", res$V1)
res$V2 <- gsub(" ", "_", res$V2)
res$V3 <- sub(" ", "_", res$V3)
res
###########
#Re-nest
###########
#turning res df back into list of lists
nestedList <- flatten(by_row(res, ..f = function(x) flatten_chr(x), .labels = FALSE)) #FAILS HERE
ERROR: Error in flatten(by_row(res, ..f = function(x) flatten_chr(x),
.labels = FALSE)) : could not find function "by_row"
Unclear from the question wording exactly what kind of list of lists you want to end up with, but maybe this is what you're looking for?
res %>%
rowwise() %>%
as.list()
or
res %>%
t() %>%
as.data.frame() %>%
rowwise() %>%
as.list()

Resources