Creating a Loop Function with read_csv - RStudio - r

I have code in RStudio which imports a csv based on criteria by using paste function.
Name <- "Sam"
Location <- "Barnsley"
Code <- "A"
Test2 <- read_csv(paste("C:/Users/....,Opposition , " (",Code,")/Vs ",Location, " (",Code,") Export for ",Name,".csv",sep = ""),skip = 8)
I usually follow this import code by a few lines of code for calculations. For arguments sake: Run Code Series
I would like to recreate this code in order to create a list of names, and have the code run through each 1 by 1 followed by running the code.
Desired:
Name <- c("Sam","David","Paul","John")
Then be able to run the import code and have it Run Code Series after each import before importing the next name.

I believe from your question that you want to end with a separate dataframe for each name. If so, you could do it like this:
Names <- c("Sam","David","Paul","John")
Location <- "Barnsley"
Code <- "A"
for(i in Names){
Test2 <- read_csv(paste("C:/Users/....,Opposition" , " (", Code,")/Vs ", Location, " (",Code,") Export for ", i, ".csv", sep = ""), skip = 8)
Run Code Series
assign(paste("df_for_", i, sep = ""), Test2)
}
This will go through your list of names and within the loop, open the file as Test2. You perform your calculations on Test2, and then assign it to a dataframe for the particular name in the list using paste. Also your quotes in your read_csv line do not match up, so that will need to be corrected.

Related

How to work with nested for loops in R with same list?

I amtrying to do some R coding for my project. Where I have to read some .csv files from one directory in R and I have to assign data frame as df_subject1_activity1, i have tried nested loops but it is not working.
ex:
my dir name is "Test" and i have six .csv files
subject1activity1.csv,
subject1activity2.csv,
subject1activity3.csv,
subject2activity1.csv,
subject2activity2.csv,
subject2activity3.csv
now i want to write code to load this .csv file in R and assign dataframe name as
ex:
subject1activity1 = df_subject1_activity1
subject1activity2 = df_subject1_activity2
.... so on using for loop.
my expected output is:
df_subject1_activity1
df_subject1_activity2
df_subject1_activity3
df_subject2_activity1
df_subject2_activity2
df_subject2_activity3
I have trie dfollowing code:
setwd(dirname(getActiveDocumentContext()$path))
new_path <- getwd()
new_path
data_files <- list.files(pattern=".csv") # Identify file names
data_files
for(i in 1:length(data_files)) {
for(j in 1:4){
assign(paste0("df_subj",i,"_activity",j)
read.csv2(paste0(new_path,"/",data_files[i]),sep=",",header=FALSE))
}
}
I am not getting desire output.
new to R can anyone please help.
Thanks
One solution is to use the vroom package (https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/), e.g.
library(tidyverse)
library(vroom)
library(fs)
files <- fs::dir_ls(glob = "subject_*.csv")
data <- purrr::map(files, ~vroom::vroom(.x))
list2env(data, envir = .GlobalEnv)
# You can also combine all the dataframes if they have the same columns, e.g.
library(data.table)
concat <- data.table::rbindlist(data, fill = TRUE)
You are almost there. As always, if you are unsure, is never a bad idea to code clearly using more lines.
data_files <- list.files(pattern=".csv", full.names=TRUE) # Identify file names data_files
for( data_file in data_files) {
## check that the data file matches our expected pattern:
if(!grepl( "subject[0-9]activity[0-9]", basename(data_file) )) {
warning( "skiping file ", basename(data_file) )
next
}
## start creating the variable name from the filename
## remove the .csv extension
var.name <- sub( "\\.csv", "", basename(data_file), ignore.case=TRUE )
## prepend 'df' and introduce underscores:
var.name <- paste0(
"df",
gsub( "(subject|activity)", "_\\1", var.name ) ## this looks for literal 'subject' and 'acitivity' and if found, adds an underscore in front of it
)
## now read the file
data.from.file <- read.csv2( data_file )
## and assign it to our variable name
assign( var.name, data.from.file )
}
I don't have your files to test with, but should the above fail, you should be able to run the code line by line and easily see where it starts to go wrong.

Concatinate text using paste to call a vector in r

I'm very new to R so may still be thinking in spreadsheets. I'd like to loop a list of names from a vector (list) through a function (effect) and append text to the front and end of the name a bit of text ("data$" and ".time0" or ".time1") so it references a specific vector of a dataframe I already have loaded (i.e., data$variable.time0 and data$variable.time1).
Paste just gives me a character named "data$variable.time0" or "data$variable.time1", rather than referencing the vector of the dataframe I want it to. Can I convert this to a reference somehow?
for (i in list){
function(i)
}
effect <- function(i){
time0 <- paste("data$",i,".time0", sep = ""))
time1 <- paste("data$",i,".time1", sep = ""))
#code continues but not relevant here
}
You can use eval(parse(text = "...")) to evaluate characters.
Try
time0 <- eval(parse(text = paste("data$",i,".time0", sep = ""))))
within your loop.

interactive multiple file upload in different variables using R

I am trying to let user define how many drugs' data user want to upload for specific therapy. Based on that number my function want to let user select data for that many drugs and store them using variables e.g. drug_1_data, drug_2_data, etc.
I have wrote a code but it doesn't work
Could someone please help
no_drugs <- readline("how many drugs for this therapy? Ans:")
i=0
while(i < no_drugs) {
i <- i+1
caption_to_add <- paste("drug",i, sep = "_")
mydata <- choose.files( caption = caption_to_add) # caption describes data for which drug
file_name <- noquote(paste("drug", i, "data", sep = "_")) # to create variable that will save uploaded .csv file
file_name <- read.csv(mydata[i],header=TRUE, sep = "\t")
}
In your example, mydata is a one element string, so subsets with i bigger than 1 will return NA. Furthermore, in your first assignment of file_name you set it to a non-quoted character vector but then overwrite it with data (and in every iteration of the loop you lose the data you created in the previous step). I think what you wanted was something more in the line of:
file_name <- paste("drug", i, "data", sep = "_")
assign(file_name, read.delim(mydata, header=TRUE)
# I changed the function to read.delim since the separator is a tab
However, I would also recommend to think about putting all the data in a list (it might be easier to apply operations to multiple drug dataframes like that), using something like this:
n_drugs <- as.numeric(readline("how many drugs for this therapy? Ans:"))
drugs <- vector("list", n_drugs)
for(i in 1:n_drugs) {
caption_to_add <- paste("drug",i, sep = "_")
mydata <- choose.files( caption = caption_to_add)
drugs[i] <- read.delim(mydata,header=TRUE)
}

using cat in R to create a formatted R script

I want to read an R file or script, modify the name of the external data file being read and export the modified R code into a new R file or script. Other than the name of the data file being read (and the name of the new R file) I want the two R scripts to be identical.
I can come close, except that I cannot figure out how to retain the blank lines I use for readability and error reduction.
Here is the original R file being read. Note that some of the code in this file is non-sensical, but to me that is irrelevant. This code does not need to run.
# apple.pie.all.purpose.flour.arsc.Jun23.2013.r
library(my.library)
aa <- 10 # aa
bb <- c(1:7) # bb
my.data = convert.txt("../applepieallpurposeflour.txt",
group.df = data.frame(recipe =
c("recipe1", "recipe2", "recipe3", "recipe4", "recipe5")),
covariates = c(paste( "temp", seq_along(1:aa), sep="")))
ingredient <- c('all purpose flour')
function(make.pie){ make a pie }
Here is R code I use to read the above file, modify it and export the result. This R code runs and is the only code that needs to run to achieve the desired result (except that I cannot get the format of the new R script to match that of the original R script exactly, i.e., blank lines present in the original R script are not present in the new R script):
setwd('c:/users/mmiller21/simple r programs/')
# define new fruit
new.fruit <- 'peach'
# read flour file for original fruit
flour <- readLines('apple.pie.all.purpose.flour.arsc.Jun23.2013.r')
# create new file name
output.flour <- paste(new.fruit, ".pie.all.purpose.flour.arsc.Jun23.2013.r", sep="")
# add new file name
flour.a <- gsub("# apple.pie.all.purpose.flour.arsc.Jun23.2013.r",
paste("# ", output.flour, sep=""), flour)
# add line to read new data file
cat(file = output.flour,
gsub( "my.data = convert.txt\\(\"../applepieallpurposeflour.txt",
paste("my.data = convert.txt\\(\"../", new.fruit, "pieallpurposeflour.txt",
sep=""), flour.a),
sep=c("","\n"), fill = TRUE
)
Here is the resulting new R script:
# peach.pie.all.purpose.flour.arsc.Jun23.2013.r
library(my.library)
aa <- 10 # aa
bb <- c(1:7) # bb
my.data = convert.txt("../peachpieallpurposeflour.txt",
group.df = data.frame(recipe =
c("recipe1", "recipe2", "recipe3", "recipe4", "recipe5")),
covariates = c(paste( "temp", seq_along(1:aa), sep="")))
ingredient <- c('all purpose flour')
function(make.pie){ make a pie }
There is one blank line in the newly-created R file, but how can I insert all of the blank lines present in the original R script? Thank you for any advice.
EDIT: I cannot seem to duplicate the blank lines here on StackOverflow. They seem to be deleted automatically. StackOverflow is even deleting the indentation I am using and I cannot seem to replace it. Sorry about this. Automatic deletion of blank lines and indentation is problematic when the issue at hand is specifically about formatting. I cannot seem to fix the post to display the R code as formatted in my script. However, the code does display correctly when I am actively editing the post.
EDIT: June 27, 2013: The deletion of empty rows and indentation in the code for the original R file and in the code for the middle R file appears to be associated with my laptop rather than with StackOverflow. When I view this post and my answers on my office desktop the format is correct. When I view this post and my answers with my laptop the empty rows and indentation are gone. Perhaps my laptop monitor is malfunctioning. Sorry about assuming initially that the problem was with StackOverflow.
Here is a function that will create a new R file for every combination of two variables. Sorry the formatting of the code below is not better. The code does run and does work as intended (provided the name of the original R file ends in ".arsc.Jun26.2013.r" instead of in ".arsc.Jun23.2013.r" used in the original post):
setwd('c:/users/mmiller21/simple r programs/')
# define fruits of interest
fruits <- c('apple', 'pumpkin', 'pecan')
# define ingredients of interest
ingredients <- c('all.purpose.flour', 'sugar', 'ground.cinnamon')
# define every combination of fruit and ingredient
fruits.and.ingredients <- expand.grid(fruits, ingredients)
old.fruit <- as.character(rep('apple', nrow(fruits.and.ingredients)))
old.ingredient <- as.character(rep('all.purpose.flour', nrow(fruits.and.ingredients)))
fruits.and.ingredients2 <- cbind(old.fruit , as.character(fruits.and.ingredients[,1]),
old.ingredient, as.character(fruits.and.ingredients[,2]))
colnames(fruits.and.ingredients2) <- c('old.fruit', 'new.fruit', 'old.ingredient', 'new.ingredient')
# begin function
make.pie <- function(old.fruit, new.fruit, old.ingredient, new.ingredient) {
new.ingredient2 <- gsub('\\.', '', new.ingredient)
old.ingredient2 <- gsub('\\.', '', old.ingredient)
new.ingredient3 <- gsub('\\.', ' ', new.ingredient)
old.ingredient3 <- gsub('\\.', ' ', old.ingredient)
# file name
old.file <- paste(old.fruit, ".pie.", old.ingredient, ".arsc.Jun26.2013.r", sep="")
new.file <- paste(new.fruit, ".pie.", new.ingredient, ".arsc.Jun26.2013.r", sep="")
# read original fruit and original ingredient
flour <- readLines(old.file)
# add new file name
flour.a <- gsub(paste("# ", old.file, sep=""),
paste("# ", new.file, sep=""), flour)
# read new data file
old.data.file <- print(paste("my.data = convert.txt(\"../", old.fruit, "pie", old.ingredient2, ".txt\",", sep=""), quote=FALSE)
new.data.file <- print(paste("my.data = convert.txt(\"../", new.fruit, "pie", new.ingredient2, ".txt\",", sep=""), quote=FALSE)
flour.b <- ifelse(flour.a == old.data.file, new.data.file, flour.a)
flour.c <- ifelse(flour.b == paste('ingredient <- c(\'', old.ingredient3, '\')', sep=""),
paste('ingredient <- c(\'', new.ingredient3, '\')', sep=""), flour.b)
cat(flour.c, file = new.file, sep=c("\n"))
}
apply(fruits.and.ingredients2, 1, function(x) make.pie(x[1], x[2], x[3], x[4]))
Here is one solution that reproduces the original R script (except for the two desired changes) while also preserving the formatting of that original R script in the new R script.
setwd('c:/users/mmiller21/simple r programs/')
new.fruit <- 'peach'
flour <- readLines('apple.pie.all.purpose.flour.arsc.Jun23.2013.r')
output.flour <- paste(new.fruit, ".pie.all.purpose.flour.arsc.Jun23.2013.r", sep="")
flour.a <- gsub("# apple.pie.all.purpose.flour.arsc.Jun23.2013.r",
paste("# ", output.flour, sep=""), flour)
flour.b <- gsub( "my.data = convert.txt\\(\"../applepieallpurposeflour.txt",
paste("my.data = convert.txt\\(\"../", new.fruit, "pieallpurposeflour.txt", sep=""), flour.a)
for(i in 1:length(flour.b)) {
if(i == 1) cat(flour.b[i], file = output.flour, sep=c("\n"), fill=TRUE )
if(i > 1) cat(flour.b[i], file = output.flour, sep=c("\n"), fill=TRUE, append = TRUE)
}
Again, I apologize for my inability to format the above R code in a readable way. I have never encountered this problem on StackOverflow and do not know the solution. Regardless, the above R script solves the problem I described in the original post.
To see the formatting of the original R script you will have to click the edit button under the original post.
EDIT: June 25, 2013
I do not know what I was doing differently yesterday, but today I found that the following simple cat statement, in place of the for-loop immediately above, creates the new R script while preserving the formatting of the original R script.
cat(flour.b, file = output.flour, sep=c("\n"))

Executing function on objects of name 'i' within for-loop in R

I am still pretty new to R and very new to for-loops and functions, but I searched quite a bit on stackoverflow and couldn't find an answer to this question. So here we go.
I'm trying to create a script that will (1) read in multiple .csv files and (2) apply a function to strip twitter handles from urls in and do some other things to these files. I have developed script for these two tasks separately, so I know that most of my code works, but something goes wrong when I try to combine them. I prepare for doing so using the following code:
# specify directory for your files and replace 'file' with the first, unique part of the
# files you would like to import
mypath <- "~/Users/you/data/"
mypattern <- "file+.*csv"
# Get a list of the files
file_list <- list.files(path = mypath,
pattern = mypattern)
# List of names to be given to data frames
data_names <- str_match(file_list, "(.*?)\\.")[,2]
# Define function for preparing datasets
handlestripper <- function(data){
data$handle <- str_match(data$URL, "com/(.*?)/status")[,2]
data$rank <- c(1:500)
names(data) <- c("dateGMT", "url", "tweet", "twitterid", "rank")
data <- data[,c(4, 1:3, 5)]
}
That all works fine. The problem comes when I try to execute the function handlestripper() within the for-loop.
# Read in data
for(i in data_names){
filepath <- file.path(mypath, paste(i, ".csv", sep = ""))
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
}
When I execute this code, I get the following error: Error in data$URL : $ operator is invalid for atomic vectors. I know that this means that my function is being applied to the string I called from within the vector data_names, but I don't know how to tell R that, in this last line of my for-loop, I want the function applied to the objects of name i that I just created using the assign command, rather than to i itself.
Inside your loop, you can change this:
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
to
tmp <- read.delim(filepath, colClasses = "character", sep = ",")
assign(i, handlestripper(tmp))
I think you should make as few get and assign calls as you can, but there's nothing wrong with indexing your loop with names as you are doing. I do it all the time, anyway.

Resources