Renaming files in folder with R - r

I have many files in my working directory with the same name followed by a number as such "name_#.csv". They each contain the same formatted time series data.
The file names are very long so when I import them into dataframes the df name is super long and I'd like to rename each as "df_#" so that I can then create another function to plot each individually or quickly plot, for example, the first three without typing in this megalong name.
I don't want to concatenate anything to the current names, but take each one and rename it completely ending with the number in the list as it iterates through the files.
Here is an example I have so far.
name = list.files(pattern="*.csv")
for (i in 1:length(name)) assign(paste0("df", name[i]), read.csv(name[i], skip = 15 ))
This is just adding a 'df' to the front and not changing the whole name.
I'm also not sure if it makes sense to proceed this way. Essentially my data is three replicates of time series data on the same sample and I eventually want to take three at a time and plot them on the same graph and so forth until the end of the files.

You can name the file in your global environment without renaming the original file in the folder by just telling R you want to assign it that name in the loop with a few modifications to your original code. For instance:
# Define file path to desired folder
file_path <- "Desktop/SO Example/" #example file path, though could eb working directory
# Your code to get CSV file names in the folder
name <- list.files(path = file_path, pattern="*.csv")
# Modify the loop to assign a new name
for(x in seq_along(name)){
assign(paste0("df_", x),
read.csv(paste0(file_path, name[x]), skip = 15))
}
This will load the data as df_1, df_2, etc. I believe in your assign function you were using paste0("df", name[i]), which concatenates "df" with the filename in position i, not the value of I in the loop - that is why you were getting df prepended to each name on import.

Related

Is there a way to report the file name of a imported dataset?

I'm trying to set up a export to .xlsx file that will include the name of the dataset into its title.
I have the functions and everything fine to add objects into the title and export it, but I dont know report the name of my original dataset as an object which I can then add into the function.
(Using Rstudio 1.3)
Before analysing my data, I import the dataset, "DS". I then call this "input".
input <- DS
data("input")
After all analysis is done, I set up the name I want to append and call it "name". I made it to include the row name, column name, and then a .xlsx at the end to save it as a .xlsx file (it was just saving without file extension before that)
name <- paste(analysis.score$pairs$row,
analysis.score$pairs$column,
".xlsx", sep = "_")
write.xlsx(analysis.score, name)
My resulting file will be something like "row_column_.xlsx"
What I need is a command to report what the file name of the dataset is (in this example DS), so that I can include it into the name to paste onto the file.
I've tried using name(input) but it returnns the names of all the columns in the file.
I have a number of datasets to analyse, and would like it so that I just have to put each dataset title in once at the begining of the script.
Sorry if this doesnt make sense, I'm very new to this (started Monday)
Thanks!
I don't know of an importing function that saves as an attribute to the data the file name that holds the data outside of R. That said, it would be pretty easy to make one.
my_import <- function(filename, ...){
require(rio)
require(stringr)
x <- import(filename, ...)
## strip off leading absolute or relative path information
attr(x, "filename") <- str_extract(filename, "[\\w\\d\\.\\_\\-]*$")
return(x)
}
Then, as long as you have the rio and stringr packages installed, you would be able to do the following:
df <- my_import("my_file.xlsx")
attr(df, "filename")
# [1] "my_file.xlsx"

Reading multiple .csv files with varying names from URL

I'm trying to read multiple .csv files from an URL starting with http. All files can be found on the same website. Generally, the structure of the file's name is: yyyy_mm_dd_location_XX.csv
Now, there are three different locations (lets say locA, locB, locC) for which there is a file for every day of the month each. So, the file name would be e.g. "2009_10_01_locA_XX.csv", "2009_10_02_locA_XX.csv" and so forth.
The structure, meaning the number of columns of all csv files is the same, however the length is not.
I'd like to combine all these files into one csv file but have problems reading them from the website due to the changing names.
Thanks a lot for any ideas!
Here is a way to programmatically generate the names of the files, and then run download.file() to download them. Since no reproducible example was given with the question, one needs to change the code to the correct HTTP location to access the files.
startDate <- as.Date("2019-10-01","%Y-%m-%d")
dateVec <- date + 0:4 # create additional dates by adding integers
library(lubridate)
downloadFileNames <- unlist(lapply(dateVec,function(x) {
locs <- c("locA","locB","locC")
paste(year(x),month(x),day(x),locs,"XX",sep="_")
}))
head(downloadFileNames)
We print the head() of the vector to show the correct naming pattern.
> head(downloadFileNames)
[1] "2019_10_1_locA_XX" "2019_10_1_locB_XX" "2019_10_1_locC_XX"
[4] "2019_10_2_locA_XX" "2019_10_2_locB_XX" "2019_10_2_locC_XX"
>
Next, we'll create a directory to store the files, and download them.
# create a subdirectory to store the files
if(!dir.exists("./data")) dir.create("./data")
# download files, as https://www.example.com/2019_10_01_locA_XX.csv
# to ./data/2019_10_01_locA_XX.csv, etc.
result <- lapply(downloadFileNames,function(x){
download.file(paste0("https://www.example.com/",x,".csv"),
paste0("./data/",x,".csv"))
})
Once the files are downloaded, we can use list.files() to retrieve the path names, read the data with read.csv(), and combine them into a single data frame with do.call().
theFiles <- list.files("./data",pattern = ".csv",full.names = TRUE)
dataList <- lapply(theFiles,read.csv)
data <- do.call(rbind,dataList)

Why does "write.dat" (R) save data files within folders?

In order to conduct some analysis using a particular software, I am required to have separate ".dat" files for each participant, with each file named as the participant number, all saved in one directory.
I have tried to do this using the "write.dat" function in R (from the 'multiplex' package).
I have written a loop that outputs a ".dat" file for each participant in a dataset. I would like each file that is outputted to be named the participant number, and for them all to be stored in the same folder.
## Using write.dat
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
data_list[[i]] <- newdata %>%
filter(SJNB == participants_ID[i])
write.dat(data_list[[i]], paste0("/Filepath/Directory/", participants_ID[i], ".dat"))
}
## Using write_csv this works perfectly:
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
newdata %>%
filter(SJNB == participants_ID[i]) %>%
write_csv(paste0("/Filepath/Directory/", participants_ID[i], ".csv"), append = FALSE)
}
If I use the function "write_csv", this works perfectly (saving .csv files for each participant). However, if I use the function "write.dat" each participant file is saved inside a separate folder - the folder name is the participant number, and the file inside the folder is called "data_list[[i]]". In order to get all of the data_list files into the same directory, I then have to rename them which is time consuming.
I could theoretically output the files to .csv and then convert them to .dat, but I'm just intrigued to know if there's anything I could do differently to get the write.dat function to work the way I'm trying it :)
The documentation on write.dat is subminimal, but it would appear that you have confused a directory path with a file name . You have deliberately created a directory named "/Filepath/Directory/[participants_ID[i]].dat" and that's where each output file is placed. That you cannot assing a name to the x.dat file itself appears to be a defect in the package as supplied.
However, not all is lost. Inside your loop, replace your write.dat line with the following lines, or something similar (not tested):
edit
It occurs to me that there's a smoother solution, albeit using the dreaded eval:
Again inside the loop, (assuming participants_ID[i] is a char string)
eval(paste0(participants_ID[i],'<- dataList[[i]]'))
write.dat(participants_ID[i], "/Filepath/Directory/")
previous answer
write.dat(data_list[[i]], "/Filepath/Directory/")
thecommand = paste0('mv /Filepath/Directory/dataList[[i]] /Filepath/Directory/',[participants_ID[i]],'.dat',collapse="")
system(thecommand)

R3.4.1 Read data from multiple .csv files

I'm trying to build up a function that can import/read several data tables in .csv files, and then compute statistics on the selected files.
Each of the 332 .csv file contains a table with the same column names: Date, Pollutant and id. There are a lot of missing values.
This is the function I wrote so far, to compute the mean of values for a pollutant:
pollutantmean <- function(directory, pollutant, id = 1:332) {
library(dplyr)
setwd(directory)
good<-c()
for (i in (id)){
task1<-read.csv(sprintf("%03d.csv",i))
}
p<-select(task1, pollutant)
good<-c(good,complete.cases(p))
mean(p[good,])
}
The problem I have is that each time it goes through the loop a new file is read and the data already read are replaced by the data from the new file.
So I end up with a function working perfectly fine with 1 single file, but not when I want to select multiple files
e.g. if I ask for id=10:20, I end up with the mean calculated only on file 20.
How could I change the code so that I can select multiple files?
Thank you!
My answer offers a way of doing what you want to do (if I understood everything correctly) without using a loop. My two assumptions are: (1) you have 332 *.csv files with the same header (column names) - so all file are of the same structure, and (2) you can combine your tables into one big data frame.
If these two assumptions are correct, I would use a list of your files to import your files as data frames (so this answer does not contain a loop function!).
# This creates a list with the name of your file. You have to provide the path to this folder.
file_list <- list.files(path = [your path where your *.csv files are saved in], full.names = TRUE)
# This will create a list of data frames.
mylist <- lapply(file_list, read.csv)
# This will 'row-bind' the data frames of the list to one big list.
mydata <- rbindlist(mylist)
# Now you can perform your calculation on this big data frame, using your column information to filter or subset to get information of just a subset of this table (if necessary).
I hope this helps.
Maybe something like this?
library(dplyr)
pollutantmean <- function(directory, pollutant, id = 1:332) {
od <- setwd(directory)
on.exit(setwd(od))
task_list <- lapply(sprintf("%03d.csv", id), read.csv)
p_list <- lapply(task_list, function(x) complete.cases(select(x, pollutant)))
mean(sapply(p_list, mean))
}
Notes:
- Put all your library calls at the beginning of your scripts, they will be much easier to read. Never inside a function.
- To set a working directory inside a function is also a bad idea. When the function returns, that change will still be on and you might get lost. The better way is to set wd's outside functions, but since you've set it inside the function, I've addapted the code accordingly.

To stack up results in one masterfile in R

Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)

Resources