How to operate on variables in R functions - r

I am trying to do following variable operations on data frame variables:
ptinr <- read.csv('ptinr.CSV')
ptinr$project <- gsub("_19T228z1xx","", ptinr$project)
ptinr$Subject <- as.integer(gsub("CTMS-",'', ptinr$Subject))
ptinr$Subject <- sprintf("%03d", ptinr$Subject)
ptinr$Subject <- paste0(ptinr$project,'-',ptinr$Subject)
I want to convert this to a function and pass the file name. Any suggestions?

Do you mean this kind of function?
f <- function(fname) {
ptinr <- read.csv(fname)
ptinr$project <- gsub("_19T228z1xx", "", ptinr$project)
ptinr$Subject <- as.integer(gsub("CTMS-", "", ptinr$Subject))
ptinr$Subject <- sprintf("%03d", ptinr$Subject)
ptinr$Subject <- paste0(ptinr$project, "-", ptinr$Subject)
ptinr
}

An option with tidyverse
library(readr)
library(stringr)
library(dplyr)
f1 <- function(fname) {
read_csv(fname) %>%
mutate(project = str_remove(project, '_19T228z1xx'),
Subject = glue::glue('{project}_',
'{sprintf("%03d", parse_number(Subject))}'))
}

Related

Write a function to manipulate and then write a dataframe

I would like to read in multiple .csv files (dataframes) from a folder and apply a function that I create to all the files. And finally this function will write the new .csv files.
I want the function to do the following 3 things
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
I don't really know how to put these thing into a function, but I've tried
`
file_names <- list.files(pattern = "\\.csv$")
tidy_up_fxn <- function(file_names) {
df <- do.call(bind_rows,lapply(file_names,data.table::fread))
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
out <- function(df)
fwrite(out, file = file_names, sep = ",")
}
tidy_up_fxn(file_names)
`
When I run it, R gets busy for a few seconds and then nothing happens. Please, help correct my function!
The following works the way I intended to
file_names <- list.files(pattern = "\\.csv$")
tidy_up_fxn <- function(file_names) {
df <- bind_rows(lapply(file_names,data.table::fread))
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
fwrite(df, file = "new.csv", sep = ",")
}
tidy_up_fxn(file_names)
Thank you all!!

generate variable names in for loop

Hope you don't mind if this is too easy for you.
In R, I am using fromJSON() to read from 3 urls (tier 1 url) , in the JSON file there is "link" field which give me another url (tier 2 url) and I use that and read.table() to get my final data. My code now is like this:
# note, this code does not run
urlJohn <- www.foo1.com
urlJane <- www.foo2.com
urlJoe <- www.foo3.com
tempJohn <- fromJson(urlJohn)
tempJohn[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJohn <- read.table(tempJohn[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
tempJane <- fromJson(urlJane)
tempJane[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJane <- read.table(tempJane[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
tempJoe <- fromJson(urlJoe)
tempJoe[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJoe <- read.table(tempJoe[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
As you can see, I am just copying-n-pasting code blocks. What I wish is this:
# note, this code also does not run
urlJohn <- www.foo1.com
urlJane <- www.foo2.com
urlJoe <- www.foo3.com
source <- c("John", "Jane", "joe")
for (i in source){
temp <- paste(temp, i, sep = "")
url <- paste(url, i, sep = "")
data <- paste(data, i, sep = "")
temp <- fromJson(url)
temp[["data"]][["rows"]]$link %<>%
{clean up this data}
data <- read.table(temp[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
}
What do I need to do to make the for loop work? If my question is not clear, please ask me to clarify it.
I usually find using lapply convenient than a for loop. Although you can easily convert this to a for loop if needed.
URLs <- c('www.foo1.com', 'www.foo2.com', 'www.foo3.com')
lapply(URLs, function(x) {
temp <- jsonlite::fromJSON(x)
temp[["data"]][["rows"]]$link %<>% {clean up this data}
read.table(temp[["data"]][["rows"]]$link,header = TRUE,sep = ",")
}) -> list_data
list_data
Thanks to #Ronak Shah. The R community strongly favors "non-For-loop" solution.
The way to get my desired result is lapply.
Below is non-running codes in mnemonics:
URLs <- c('www.foo1.com', 'www.foo2.com', 'www.foo3.com')
lapply(URLs, function(x) {
temp <- jsonlite::fromJSON(x)
x <- temp[["data"]][["rows"]]$link %<>% {clean up this data}
y <- read.table(temp[["data"]][["rows"]]$link,header = TRUE,sep = ",")
return(list(x, y))
})
And this is a running example.
x <- list(alpha = 1:10,
beta = exp(-3:3),
logic = c(TRUE,FALSE,FALSE,TRUE))
lapply(x, function(x){
temp <- sum(x) / 2
temp2 <- list(x,
temp)
return(temp2)
}
)

Writing a csv file in R with parameter in the file name

I am doing a small log processing project in R. I am trying to write a function that gets a dataframe, and writes it in a csv file with some parameters (dataframe name, today's date.. etc)
I have made some progress but didn't manage to write the csv. I hope the code is reproducible and good.
library(dplyr)
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('"',"./logs/",dfname, "_", Sys.Date(),'.csv"')
dfpath <- as.data.frame(dfpath)
df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)
EDIT- this is a final version that works well. Thanks to Ronak Shah.
wd<- getwd()
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0(wd,'/logs/',dfname, '_', Sys.Date(),'.csv')
df %>% write_excel_csv(dfpath)
}
I do however now have a bunch of dataframes that i want to run the function with them. should I make them as a list? this didn't quite work
l <- list(df1,df2)
lapply(l , wrt_csv)
Any thoughts?
Thanks!
Keep dfpath as string. Try :
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('/logs/',dfname, '_', Sys.Date(),'.csv')
write.csv(df, dfpath, row.names = FALSE)
#Or same as OP
#df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)
We can also do
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- sprintf('/logs/%s_%s.csv', dfname, Sys.Date())
write.csv(df, dfpath, row.names = FALSE)
}
wrt_csv(mtcars)

Is there a way that I could use a vector instead of this for loop?

I'm trying loop through a bunch of files in R and access info in each one. Needless to say, the loop is unbearably slow. Is there a way I could vectorize this?
library("rjson")
all_files=list.files(path="~/p/a/t/h", recursive = TRUE)
for(i in seq_along(all_files)) {
temp = fromJSON(file = all_files[i])
if (length(temp$tags) != 0){
songTags <- c(songTags, temp$tags)
songTrack_id <- c(songTrack_id, temp$track_id)
}
}
Growing objects in a loop is usually very expensive/slow. You can use lapply/sapply.
all_data <- do.call(rbind, lapply(all_files, function(x) {
temp = jsonlite::fromJSON(file = x)
if(length(temp$tags))
list(tags = temp$tags, track_id = temp$track_id)
}))
Or a shorter option using purrr's map_df
all_data <- map_df(all_files, ~{
temp = jsonlite::fromJSON(file = .x)
if(length(temp$tags))
list(tags = temp$tags, track_id = temp$track_id)
})
Untested:
rawdat <- lapply(all_files, fromJSON)
hastags <- sapply(rawdat, function(x) "tags" %in% names(rawdat))
if (any(hastags)) {
songTags <- unlist(lapply(rawdat[hastags], `[[`, "tags"))
songTracks <- unlist(lapply(rawdat[hastags], `[[`, "track_id"))
}

str_subset with curly curly in R

I've a small function to read in files with certain string using str_subset which works if I pass the argument in quotes but I want to able to do it without. I thought I could do this with curly curly but isn't working.
Working example passing with quotes:
#creating csv file
library(tidyverse)
write_csv(mtcars, "C:\\Users\\testSTACK.csv")
#reading function
read_in_fun <- function(x) {
setwd("C:\\Users")
d <- list.files() #lists all files in folder
file <- d %>%
str_subset(pattern = x)
#read in
df <- read_csv(file)
arg_name <- deparse(substitute(x))
var_name <- paste("df_new", arg_name, sep = "_")
assign(var_name, df, env = .GlobalEnv)
}
read_in_fun("STACK")
#this works, returns df called:
df_new_"STACK"
now if i try to be able to pass with no quotes using curly curly approach:
read_in_fun <- function(x) {
setwd("C:\\Users")
d <- list.files() #lists all files in folder
file <- d %>%
str_subset(pattern = {{x}})
#read in
df <- read_csv(file)
arg_name <- deparse(substitute(x))
var_name <- paste("df_new", arg_name, sep = "_")
assign(var_name, df, env = .GlobalEnv)
}
read_in_fun(STACK)
#Error in type(pattern) : object 'STACK' not found
also tried using enquo
read_in_fun <- function(x) {
x_quo <- enquo(x)
setwd("C:\\Users")
d <- list.files() #lists all files in folder
file <- d %>%
str_subset(pattern = !! as_label(x_quo)) #OR !!(x_quo)
#read in
df <- read_csv(file)
arg_name <- deparse(substitute(x))
var_name <- paste("df_new", arg_name, sep = "_")
assign(var_name, df, env = .GlobalEnv)
}
read_in_fun(STACK)
# Error during wrapup: Quosures can only be unquoted within a quasiquotation context.
My desired output is a df called df_new_STACK. Can curly curly be used in this way? Thanks
Using ensym should work.
read_in_fun <- function(x) {
x_sym <- ensym(x)
d <- list.files()
file <- d %>%
str_subset(pattern = as_label(x_sym))
#read in
df <- read_csv(file)
arg_name <- deparse(substitute(x))
var_name <- paste("df_new", arg_name, sep = "_")
assign(var_name, df, env = .GlobalEnv)
}
read_in_fun(STACK)
df_new_STACK

Resources