Rename dates files with R - r

I have named each file for each day of the years between 2012 and 2016 like this: 2012-01-01.csv; 2012-01-02.csv... until 2016-12-31.csv. But for 2017 and 2018 the files are named like this: 20170101.csv; 20170102.csv... Could someone help me include the hyphens in the second files so that they have the same name as the first ones?
Thanks!

Maybe you can try the following code with list.files + file.rename
old <- list.files(pattern = ".*.csv")
new <- paste0(as.Date(gsub(".csv","",old,fixed = TRUE),format = c("%Y-%m-%d","%Y%m%d")),".csv")
file.rename(old,new)

Edit: use this instead
setwd("C:\\Users\\...Path to your data")
DataFileNames <- list.files(pattern="\\.csv$")
sub("(\\d{4})(\\d{2})(\\d{2})(.*)","\\1-\\2-\\3\\4",DataFileNames)
file.rename(DataFileNames,NewDataFileNames)
Old answer:
Theres a lot of missing information from your question, but you should
be able to adjust the code below to suit your needs.Mostly, you'll
need to edit the read.csv line if you have headers and adjust other
parameters.
Note: This will overwrite all your tables, so make sure the data is imported properly with the read.csv in the lapply before you
run the write.csv (last line)
setwd("C:\\Users\\...Path to your data")
DataFileNames <- list.files(pattern="\\.csv$")
Datafiles <- lapply(DataFileNames, read.csv, header=FALSE)
DataFileNames <- sub("(\\d{4})(\\d{2})(\\d{2}).csv","\\1-\\2-\\3",DataFileNames)
lapply(1:length(Datafiles), function(x) write.csv(Datafiles[x], DataFileNames[x]))

Related

Naming a .csv with text that can be updated each year

I'm looking for a way to automate the update of files names. The code will be used annually to download several .csv files. I would like to be able to change the 2020_2021 portion of the name to whatever assessment (i.e. 2021_2022, 2022_2023etc.) year it is at the beginning of the script so the file names don't have to be updated manually.
write.csv(SJRML_00010,
file = "SJRML__00010_2020_2021.csv")
write.csv(SJRML_00095,
file = "SJRML_00095_2020_2021.csv")
write.csv(SJRML_00480,
file = "SJRML_00480_2020_2021.csv")
lastyear <- 2020
prevassessment <- sprintf("%i_%i", lastyear, lastyear+1)
nextassessment <- sprintf("%i_%i", lastyear+1, lastyear+2)
prevassessment
# [1] "2020_2021"
filenames <- c("SJRML__00010_2020_2021.csv", "SJRML_00095_2020_2021.csv")
gsub(prevassessment, nextassessment, filenames, fixed = TRUE)
# [1] "SJRML__00010_2021_2022.csv" "SJRML_00095_2021_2022.csv"
You can do the gsub on a vector of filenames or one at a time, however you are implementing your processing.
To create a .csv with name that can be updated
Year <- "_2020"
then
write.csv(file_name, paste0("file_name", Year,".csv"))
This returns file_name_2020.csv

R find and replace text in .xlsx files

So my task is very simple, I would like to use R to solve this. I have hundreds of excel files (.xlsx) in a folder and I want to replace an especific text without altering formating of worksheet and preserving the rest of text in the cell, for example:
Text to look for:
F13 A
Replace for:
F20
Text in a current cell:
F13 A Year 2019
Desired result:
F20 Year 2019
I have googled a lot and havent found something appropiate, even though it seems to be a common task. I have a solution using Powershell but it is very slow and I cant believe that there is no simple way using R. Im sure someone had the same problem before, Ill take any sugestions.
You can try :
text_to_look <- 'F13 A'
text_to_replace <- 'F20'
all_files <- list.files('/path/to/files', pattern = '\\.xlsx$', full.names = TRUE)
lapply(all_files, function(x) {
df <- openxlsx::read.xlsx(x)
#Or use readxl package
#df <- readxl::read_excel(x)
df[] <- lapply(df, function(x) {x[grep(text_to_look, x)] <- text_to_replace;x})
openxlsx::write.xlsx(df, basename(x))
})

Replace values within dataframe with filename while importing using read.table (R)

I am trying to clean up some data in R. I have a bunch of .txt files: each .txt file is named with an ID (e.g. ABC001), and there is a column (let's call this ID_Column) in the .txt file that contains the same ID. Each column has 5 rows (or less - some files have missing data). However, some of the files have incorrect/missing IDs (e.g. ABC01). Here's an image of what each file looks like:
https://i.stack.imgur.com/lyXfV.png
What I am trying to do here is to import everything AND replace the ID_Column with the filename (which I know to all be correct).
Is there any way to do this easily? I think this can probably be done with a for loop but I would like to know if there is any other way. Right now I have this:
all_files <- list.files(pattern=".txt")
data <- do.call(rbind, lapply(all_files, read.table, header=TRUE))
So, basically, I want to know if it is possible to use lapply (or any other function) to replace data$ID_Column with the filenames in all_files. I am having trouble as each filename is only represented once in all_files, while each ID_Column in data is represented 5 times (but not always, due to missing data). I think the solution is to create a function and call it within lapply, but I am having trouble with that.
Thanks in advance!
I would just make a function that uses read.table and adds the file's name as a column.
all_files <- list.files(pattern=".txt")
data <- do.call(rbind, lapply(all_files, function(x){
a = read.table(x, header=TRUE);
a$ID_Column=x
return(a)
}
)

lapply r to one column of a csv file

I have a folder with several hundred csv files. I want to use lappply to calculate the mean of one column within each csv file and save that value into a new csv file that would have two columns: Column 1 would be the name of the original file. Column 2 would be the mean value for the chosen field from the original file. Here's what I have so far:
setwd("C:/~~~~")
list.files()
filenames <- list.files()
read_csv <- lapply(filenames, read.csv, header = TRUE)
dataset <- lapply(filenames[1], mean)
write.csv(dataset, file = "Expected_Value.csv")
Which gives the error message:
Warning message: In mean.default("2pt.csv"[[1L]], ...) : argument is not numeric or logical: returning NA
So I think I have 2(at least) problems that I cannot figure out.
First, why doesn't r recognize that column 1 is numeric? I double, triple checked the csv files and I'm sure this column is numeric.
Second, how do I get the output file to return two columns the way I described above? I haven't gotten far with the second part yet.
I wanted to get the first part to work first. Any help is appreciated.
I didn't use lapply but have done something similar. Hope this helps!
i= 1:2 ##modify as per need
##create empty dataframe
df <- NULL
##list directory from where all files are to be read
directory <- ("C:/mydir/")
##read all file names from directory
x <- as.character(list.files(directory,,pattern='csv'))
xpath <- paste(directory, x, sep="")
##For loop to read each file and save metric and file name
for(i in i)
{
file <- read.csv(xpath[i], header=T, sep=",")
first_col <- file[,1]
d<-NULL
d$mean <- mean(first_col)
d$filename=x[i]
df <- rbind(df,d)
}
###write all output to csv
write.csv(df, file = "C:/mydir/final.csv")
CSV file looks like below
mean filename
1999.000661 hist_03082015.csv
1999.035121 hist_03092015.csv
Thanks for the two answers. After much review, it turns out that there was a much easier way to accomplish my goal. The csv files that I had were originally in one file. I split them into multiple files by location. At the time, I thought this was necessary to calculate mean on each type. Clearly, that was a mistake. I went to the original file and used aggregate. Code:
setwd("C:/~~")
allshots <- read.csv("All_Shots.csv", header=TRUE)
EV <- aggregate(allshots$points, list(Location = allshots$Loc), mean)
write.csv(EV, file= "EV_location.csv")
This was a simple solution. Thanks again or the answers. I'll need to get better at lapply for future projects so they were not a waste of time.

How to not overwrite file in R

I am trying to copy and paste tables from R into Excel. Consider the following code from a previous question:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
This code writes each table into separate Excel files (i.e. "1X.csv", "2X.csv", etc..). Is there any way of "shifting" each table down some rows instead of rewriting the previous table each time? I have also tried this code:
output <- as.data.frame(output)
wb = loadWorkbook("X.xlsx", create=TRUE)
createSheet(wb, name = "output")
writeWorksheet(wb,output,sheet="output",startRow=1,startCol=1)
writeNamedRegion(wb,output,name="output")
saveWorkbook(wb)
But this does not copy the dataframes exactly into Excel.
I think, as mentioned in the comments, the way to go is to first merge the data frames in R and then writing them into (one) output file:
# get vector of filenames
filenames <- list.files(path=getwd())
# for each filename: load file and create outline
outlines <- lapply(filenames, function(filename) {
data <- read.csv(filename)
outline <- data[,2]
outline <- as.data.frame(table(outline))
outline
})
# merge all outlines into one data frame (by appending them row-wise)
outlines.merged <- do.call(rbind, outlines)
# save merged data frame
write.csv(outlines.merged, "all.csv")
Despite what microsoft would like you to believe, .csv files are not excel files, they are a common file type that can be read by excel and many other programs.
The best approach depends on what you really want to do. Do you want all the tables to read into a single worksheet in excel? If so you could just write to a single file using the append argument to the write.csv or other functions. Or use a connection that you keep open so each new one is appended. You may want to use cat to put a couple of newlines before each new table.
Your second attempt looks like it uses the XLConnect package (but you don't say, so it could be something else). I would think this the best approach, how is the result different from what you are expecting?

Resources