Retain value from nested for loop - r

So basically I am trying the following loop:
rawData = read.csv(file = "SampleData.csv")
companySplit = split(rawData, rawData$Company)
NameOfCompany <- numeric()
DateOfOrder <- character()
WhichProducts <- numeric()
for (i in 1:length(companySplit)){
company_DateSplit = split(companySplit[[i]], companySplit[[i]]$Date)
for (j in 1:length(company_DateSplit)){
WhichProducts[j] <- (paste0(company_DateSplit[[j]]$ID, collapse=","))
DateOfOrder[j] <- (paste0(company_DateSplit[[j]]$Date[1]))
NameOfCompany[j] <- (paste0(companySplit[[i]]$Company[[1]]))
}
}
df <- data.frame(NameOfCompany,DateOfOrder, WhichProducts)
write.csv(df, file = "basket.csv")
If you check basket.csv there is output for only company D. It is not writing because of nesting of for loops I guess. I am not able to get out of it.
I need exact output as basket.csv but for all companies.
Here are the CSVs:
Input Data: Link
Output of code basket.csv: Link
The output should look like this:
Company,Date, All IDs comma seperated.
e.g.
A,Jan-18,(1,2,4)
A,Feb-18,(1,4)
B,Jan-18,(2,3,4)
I'm able to get it from the above code. But Not able to save it in CSV for all A,B,C,D companies. It saves values for only company D which is the last value in looping. (check output file link)

The initial error is that you import your data without the parameter stringsAsFactors = FALSE which happens all the time. Also, looping in R is usually less efficient and harder to reason about than using a more functional approach. I think what you're trying to do can be done with the aggregate function
rawData <- read.csv(file = "SampleData.csv", stringsAsFactors = FALSE)
df <- aggregate(ID ~ Company + Date, data = rawData, FUN = paste, collapse = ",")
colnames(df) <- c("NameOfCompany", "DateOfOrder", "ID")
df = split(df, df$NameOfCompany)
Or using a tidy approach
df <- rawData %>% group_by(Company, Date) %>%
summarise(WhichProducts=paste(ID,collapse=',')) %>%
rename(DateOfOrder = Date) %>%
rename(NameOfCompany = Company) %>%
group_split()

Related

How to pass the unquoted name of a variable (and not its value) into a function dynamically?

I wish to write a function to analyze several identical variables in several datasets
I built the function below but it does not work well. I am not sure how to pass a name in a function dynamically. Could someone help?
There are 10 identical variables (testvar1, testvar2,..., testvar10, etc...)
in 15 different datasets (mydata1, mydata2,...mydata15, etc...)
library(readxl)
to_analyze <- function (data="mydata1", var = testvar1) {
#reading my file in
excelfile <- paste(`data`, "xlsx", sep = ".")
dataset_name <- read_excel(excelfile)
#populating the testvar1, testvar2,...
dataset_name$var_interest <- dataset_name$var #this does not work
#I was hoping it would give dataset_name$var_interest <- dataset_name$testvar1
#creating a smaller dataset
eco <- dataset_name %>%
select(id, var_interest) #I want var_interest to be testvar1 (not the value but the name)
##doing some analysis on that dataset
}
#creating another function for all the datasets (15 total)
fct_all <- function(x){
for(i in 1:15){
iq <- as.double(i)
dsn <- paste("mydata", deparse(iq), sep="")
to_analyze(data=dsn, var = x)
}
}
#applying the function for all the variables
all_var <- c(testvar1, testvar2, testvar3)
fct_all(all_var)```
You can use equivalently data$var1 or data[['var1']]. Which means that with the second form, as you provide the variable name as a string, you can easily replace the string by a dynamical variable name :
var <- 'var1'
dataset_name$var_interest <- dataset_name[[var]]
Note that dplyr is very clever and does accept string or symbols. Which means you can further simplify your function with one of the following forms :
library(dplyr)
# Using select then rename with the var given as a string
to_analyze <- function (data = "mydata1", var = "testvar1") {
eco <- paste(data, "xlsx", sep = ".") %>%
read_excel() %>%
select(id, var) %>%
rename(var_intereset = var)
...
}
For the sake of interest, you could also use quasiquotation, but given that you'll eventually wrap your variables into a vector, I guess it's not that useful. Would be something of the form :
library(dplyr)
to_analyze <- function (data = "mydata1", var = quo(testvar1)){
quo_var <- enquo(var)
eco <- paste(data, "xlsx", sep = ".") %>%
read_excel() %>%
select(id, !!quo_var) %>%
rename(var_intereset = !!quo_var)
...
}
This form allows you to call your variable with the raw variable name rather than a string : to_analyse(data = "mydata1", var = testvar1). But as said before, probably not the most useful in your case.
Thank you #Romain. Your code worked well. I just changed the single quote to backticks
var <- `var`
to_analyze <- function (data = "mydata1", var = "testvar1") {
eco <- paste(data, "xlsx", sep = ".") %>%
read_excel() %>%
select(id, var) %>%
rename(var_intereset = var)
...
}

Loop in r to run script

I need to run a script for each station (I was replacing the numbers 1 by 1 in the script) but there're more than 100 stations.
I thought maybe loop in script could save my time. Never done loop before, don't know if it's possible to do what I want. I've tried as the bellow but doesn't work.
Just a bit of my df8 data (txt):
RowNum,date,code,gauging_station,precp
1,01/01/2008 01:00,1586,315,0.4
2,01/01/2008 01:00,10990,16589,0.2
3,01/01/2008 01:00,17221,30523,0.6
4,01/01/2008 01:00,34592,17344,0
5,01/01/2008 01:00,38131,373,0
6,01/01/2008 01:00,44287,370,0
7,01/01/2008 01:00,53903,17314,0.4
8,01/01/2008 01:00,56005,16596,0
9,01/01/2008 01:00,56349,342,0
10,01/01/2008 01:00,57294,346,0
11,01/01/2008 01:00,64423,533,0
12,01/01/2008 01:00,75266,513,0
13,01/01/2008 01:00,96514,19187,0
Code:
station <- sample(50:150,53,replace=F)
for(i in station)
{
df08_1 <- filter(df08, V7==station [i])
colnames(df08_1) <- c("Date","gauging_station", "code", "precp")
df08_1 <- unique(df08_1)
final <- df08_1 %>%
group_by(Date=floor_date(Date, "1 hour"), gauging_station, code) %>%
summarize(precp=sum(precp))
write.csv(final,file="../station [i].csv", row.names = FALSE)
}
If you're not averse to using some tidyverse packages, I think you could simplify this a bit:
Updated with your new sample data - this runs ok on my computer:
Code:
library(dplyr)
dat %>%
select(-RowNum) %>%
distinct() %>%
group_by(date_hour = lubridate::floor_date(date, 'hour'), gauging_station, code) %>%
summarize(precp = sum(precp)) %>%
split(.$gauging_station) %>%
purrr::map(~write.csv(.x,
file = paste0('../',.x$gauging_station, '.csv'),
row.names = FALSE))
Data:
dat <- data.table::fread("RowNum,date,code,gauging_station,precp
1,01/01/2008 01:00,1586,315,0.4
2,01/01/2008 01:00,10990,16589,0.2
3,01/01/2008 01:00,17221,30523,0.6
4,01/01/2008 01:00,34592,17344,0
5,01/01/2008 01:00,38131,373,0
6,01/01/2008 01:00,44287,370,0
7,01/01/2008 01:00,53903,17314,0.4
8,01/01/2008 01:00,56005,16596,0
9,01/01/2008 01:00,56349,342,0
10,01/01/2008 01:00,57294,346,0
11,01/01/2008 01:00,64423,533,0
12,01/01/2008 01:00,75266,513,0
13,01/01/2008 01:00,96514,19187,0") %>%
mutate(date = as.POSIXct(date, format = '%m/%d/%Y %H:%M'))
Can't comment for a lack of reputation, but if the code works if you change station [i] for the number of the station, it sounds like each station is a part of and has to be extracted from the df08 object (dataframe).
If I understand you correctly, I would do this as follows:
stations <- c(1:100) #put your station IDs into a vector
for(i in stations) { #run the script for each entry in the list
#assuming that 'V7' is the name of the (unnamed) seventh column of df08, it could
#work like this:
df08_1 <- filter(df08, df08$V7==i) #if your station names are something like
#'station 1' as a string, use paste("station", 1, sep = "")
colnames(df08_1) <- c("Date","gauging_station", "code", "precp")
df08_1 <- unique(df08_1)
final <- df08_1 %>%
group_by(Date=floor_date(Date, "1 hour"), gauging_station, code) %>%
summarize(precp=sum(precp)) #floor_date here is probably your own function
write.csv(final,file=paste("../station", i, ".csv", sep=""), row.names = FALSE)
#automatically generate names. You can modify the string to whatever you want ofc.
}
If this and all of the other examples don't work, could you provide us with some dummy data to work with, just to see what the df08 dataframe looks like? And also what the floor_date() function does?

issues with Iterating over f(x, y) using map2_

I am using a wrapper for the LastFM API to search for track Tags.
the wrapper function is...
devtools::install_github("juyeongkim/lastfmr")
track_getInfo("track", "artist", api_key= lastkey)
I defined my own function as
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y,api_key = lastkey)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(output_1)
}
Then prepared my list elements from my larger data frame
artist4lf <- c(small_descriptive[1:10,2])
track4lf <- c(small_descriptive[1:10,3])
x<- vector("list", length = length(track4lf))
y<- vector("list", length = length(artist4lf))
names(x) <- track4lf
names(y) <- artist4lf
Then...
map2_df(track4lf, artist4lf, INFOLM)
I get a 0x0 tibble back everytime... does anyone have a suggestion?
I think your INFOLM function will work if you just delete the api_key argument from the track_getInfo function.
Also, not sure you need to use purrr::map2 here, you should be able to use your small_descriptive dataframe with rowwise and mutate to add the column(s) you want.
Here's a go, using testdf as if it's your small_descriptive dataframe with only the track and artist columns.
library(lastfmr)
library(dplyr)
library(tidyr)
testdf <- tribble(
~Artist, ~Track,
"SmashMouth", "All Star",
"Garth Brooks", "The Dance"
)
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(paste(output_1, collapse = ","))
}
testdf %>% rowwise %>%
mutate(stuff = INFOLM(Track, Artist)) %>%
tidyr::separate(stuff, c("Tag1", "Tag2", "Tag3", "Tag4", "Tag5"), sep = ",")

How do I make a function in R with a mass amount of code?

I think this is a fairly basic question, as I am a new R user, but I want to make it so that I can activate the entire code below with a single entry/word (I presumed it would be a function). If this has already been asked, I apologize, and please refer me to the link where it is answered. Thank you in advance for all help.
My code:
head(yelp, 10)
str(yelp)
yelp_flat<- flatten(yelp)
str(yelp_flat)
library(tibble)
yelp_tbl <- as_data_frame(yelp_flat)
yelp_tbl
yelp_tbl$newcolumn <- NULL
yelp_tbl$newcolumn1 <- NULL
yelp_tbl$shotClock <- NULL
yelp_tbl$period <- NULL
yelp_tbl$wallClock <- NULL
yelp_tbl$gameClock <- NULL
yelp_tbl$gameClockStopped <- NULL
yelp_tbl$ball <- NULL
head(yelp_tbl)
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
library(tidyr)
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
Away_Team <- u
Away_Team$garbage <- NULL
Away_Team$playerId1<- NULL
Away_Team$aplayer_x <- NULL
Away_Team$aplayer_y <- NULL
Away_Team$aplayer_z <- NULL
Away_Team$dispose <- NULL
Away_Team$brack <- NULL
Away_Team$kol <- NULL
Away_Team$tra <- NULL
View(Away_Team)
yelp_tbl
yelp_tbl$newcolumn <- NULL
yelp_tbl$newcolumn1 <- NULL
yelp_tbl$shotClock <- NULL
yelp_tbl$period <- NULL
yelp_tbl$wallClock <- NULL
yelp_tbl$gameClock <- NULL
yelp_tbl$gameClockStopped <- NULL
yelp_tbl$ball <- NULL
head(yelp_tbl)
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
library(tidyr)
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
Home_Team <- u
Home_Team$garbage <- NULL
Home_Team$playerId1<- NULL
Home_Team$hplayer_x <- NULL
Home_Team$hplayer_y <- NULL
Home_Team$hplayer_z <- NULL
Home_Team$dispose <- NULL
Home_Team$brack <- NULL
Home_Team$kol <- NULL
Home_Team$tra <- NULL
View(Home_Team)
View (Away_Team)
Table <- rbind(Home_Team, Away_Team)
View(Table) #order frameIdx to see correct order
So, indeed you should make a function. Here are some steps to follow:
1. Put all your code in your function
my_function <- function(){
# Your code
}
2. Identify what you have as an input (aka, what your are not building in your code), they will become the argument of your function
my_function <- function(arg1, arg2, ...){
# Your code
}
In your example, I identified yelp
3. Identify what you want to output (ideally only one object), they will be in the return of your function
my_function <- function(arg1, arg2, ...){
# Your code
return(output)
}
In your example I identified Table
4. Take all the import/library and put them outside your function
library(lib1)
my_function <- function(arg1, arg2, ...){
# Your code
return(output)
}
EDIT using #r2evans suggestion: Using libraryis generally used instead of require, here and here is some literature on it.
In your code I identified tidyr and tibble
5. Identify what you want to print/View and what was just for debugging. Add a print to print, suppres what you don't want
6. Add some comments/slice your code
For example I would add something like # Creating XXX table
7. Improve code quality
You should try to minimize the number of line of code (for example using loops and avoiding code to be in double). Make variables names explicit (instead of k, u, r...)
Regarding loop, in your code you drop some columns on at a time, you could do a loop to drop them in order. (It's what I have done bellow). It helps to make your code easier to read/debug. In this particular case, as Gregor said it is heaven faster to drop them all at once with using a list of column names (if you are interested check his comment).
Here you go:
There are still some improvement to do especially regarding point number 7 and 5.
library(tibble)
library(tidyr)
yelp_function <- function(yelp){
# Printing the input
print(head(yelp, 10))
print(str(yelp))
# Flatten table
yelp_flat<- flatten(yelp)
print(str(yelp_flat))
# Create yelp_tbl and drop some columns
yelp_tbl <- as_data_frame(yelp_flat)
# Drop some columns
for (col in c("newcolumn", "newcolumn1", "shotClock", "period", "wallClock", "gameClock", "gameClockStopped", "ball")){
yelp_tbl[, col] <- NULL
}
print(head(yelp_tbl))
# Build some table
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
# Build away team
Away_Team <- u
# Build yelp table: I'm not quite sure why you are rebdoing that... Is this code necessary?
yelp_tbl
# Drop some columns
for (col in c("newcolumn", "newcolumn1", "shotClock", "period", "wallClock", "gameClock", "gameClockStopped", "ball")){
yelp_tbl[, col] <- NULL
}
print(head(yelp_tbl))
good <- unnest(yelp_tbl) #extracts xyz from original dataframe
# Build some table
player <- good %>% separate(xyz, c("player_x", "player_y", "player_z"), sep = ",")
finish <- player %>% separate(xyz1, c("player_x", "player_y", "player_z"), sep = ",")
k <- finish %>% separate(player_x, c("trash", "player_x"), sep = "c")
k$trash <- NULL
r <- k %>% separate(player_z, c("player_z", "tra"), sep = "\\)")
u <- r %>% separate(player_x, c("kol", "player_x"), sep = "\\(")
## Build home_team
Home_Team <- u
# Drop some columns
for (col in c("garbage", "playerId1", "aplayer_x", "aplayer_y", "aplayer_z", "dispose", "brack", "kol", "tra")){
Away_Team[, col] <- NULL
Home_Team[, col] <- NULL
}
# Merge
Table <- rbind(Home_Team, Away_Team)
# Return
return(Table)
}
View(Table) #order frameIdx to see correct order
Run it:
To run your code you now just have to execute the function with the needed argument:
yelp_function(yelp)
NB 1: please note that I didn't tested the code since you didn't provide data to run it. To improve your question you should give some data using dputfunction.
NB 2: There is always room for improvement in the code so you might want to go further and llok into refactoring to avoid having code in double. Control your inputs with some sanity check...
It's rather simple.
You do this:
foo <- function{
#all your code goes here
}
Then you call your function by typing (in console for instance):
foo()

Iterating through values in R

I'm new-ish to R and am having some trouble iterating through values.
For context: I have data on 60 people over time, and each person has his/her own dataset in a folder (I received the data with id #s 00:59). For each person, there are 2 values I need - time of response and picture response given (a number 1 - 16). I need to convert this data from wide to long format for each person, and then eventually append all of the datasets together.
My problem is that I'm having trouble writing a loop that will do this for each person (i.e. each dataset). Here's the code I have so far:
pam[x] <- fromJSON(file = "PAM_u[x].json")
pam[x]df <- as.data.frame(pam[x])
#Creating long dataframe for times
pam[x]_long_times <- gather(
select(pam[x]df, starts_with("resp")),
key = "time",
value = "resp_times"
)
#Creating long dataframe for pic_nums (affect response)
pam[x]_long_pics <- gather(
select(pam[x]df, starts_with("pic")),
key = "picture",
value = "pic_num"
)
#Combining the two long dataframes so that I have one df per person
pam[x]_long_fin <- bind_cols(pam[x]_long_times, pam[x]_long_pics) %>%
select(resp_times, pic_num) %>%
add_column(id = [x], .before = 1)
If you replace [x] in the above code with a person's id# (e.g. 00), the code will run and will give me the dataframe I want for that person. Any advice on how to do this so I can get all 60 people done?
Thanks!
EDIT
So, using library(jsonlite) rather than library(rjson) set up the files in the format I needed without having to do all of the manipulation. Thanks all for the responses, but the solution was apparently much easier than I'd thought.
I don't know the structure of your json files. If you are not in the same folder, like the json files, try that:
library(jsonlite)
# setup - read files
json_folder <- "U:/test/" #adjust you folder here
files <- list.files(path = paste0(json_folder), pattern = "\\.json$")
# import data
pam <- NULL
pam_df <- NULL
for (i in seq_along(files)) {
pam[[i]] <- fromJSON(file = files[i])
pam_df[[i]] <- as.data.frame(pam[[i]])
}
Here you generally read all json files in the folder and build a vector of a length of 60.
Than you sequence along that vector and read all files.
I assume at the end you can do bind_rowsor add you code in the for loop. But remember to set the data frames to NULL before the loop starts, e.g. pam_long_pics <- NULL
Hope that helped? Let me know.
Something along these lines could work:
#library("tidyverse")
#library("jsonlite")
file_list <- list.files(pattern = "*.json", full.names = TRUE)
Data_raw <- tibble(File_name = file_list) %>%
mutate(File_contents = map(File_name, fromJSON)) %>% # This should result in a nested tibble
mutate(File_contents = map(File_contents, as_tibble))
Data_raw %>%
mutate(Long_times = map(File_contents, ~ gather(key = "time", value = "resp_times", starts_with("resp"))),
Long_pics = map(File_contents, ~ gather(key = "picture", value = "pic_num", starts_with("pic")))) %>%
unnest(Long_times, Long_pics) %>%
select(File_name, resp_times, pic_num)
EDIT: you may or may not need not to include as_tibble() after reading in the JSON files, depending on how your data looks like.

Resources