File picking using pattern in R [duplicate] - r

This question already has answers here:
R-project filepath from concatenation
(1 answer)
Passing directory path as parameter in R
(1 answer)
Closed 8 years ago.
I have a directory which is having multiple files which starts with 001.csv, 002.csv and so on. I want to pick those files in a function for which I pass as an argument to the function.
For ex.
myFiles<-function(x=1:30){
// I should pick only those files which starts with 001.csv till 030.csv.
}
I tried using pattern matching but I am not sure how to make pattern matching using another variable which consists of vectors. I even tried using paste function so as to paste the full file path but it was giving me file name as 1.csv and not 001.csv
tt<-function(dirname,type,nums=1:30){
filenames<-list.files(dirname)
c<-nums
myVector<-0
for(i in 1:length(c)){
myVector[i]<-paste(dirname,"/",c[i],".csv",sep="")
#print(myVector[i])
}
}

One way you are able to get the correct names is to pad the start of the numbers with 0s using formatC e.g.
paste0(formatC(seq(1:30), width = 3, format = "d", flag = "0"), ".csv")

Related

R -find and replace within a script, iteratively [duplicate]

This question already has an answer here:
R: list files based on pattern
(1 answer)
Closed 1 year ago.
I have a somewhat complex script that is working well. It imports multiple .csvs, combines them, adjusts them, re-sorts them and writes them out as multiple new .csvs. All good.
The problem is that I need to run this script on each of 2100 files. Each .csv file has a name incorporating a seven or eight digit non-numeric string which also has other specific identifiers. There are numerous files with the same string suffix and the script works on all of them at once. An example of the naming system:
gfdlesm2g_45Fall_17100202.csv
ccsm4_45Fall_10270102.csv
bnuesm_45Fall_5130205.csv
mirocesmchem_45Fall_5010007.csv
The script begins with fnames <- dir("~/Desktop/modified_files/", pattern = "*_45Fall_1030001.csv")
And I need to replace the "1030001", in this case, with the next number. Right now I am using Find and Replace in RStudio to replace the seven (or eight) digit number each time the script has completed. I know there has to be a better way than to do this all manually for 2100 files.
All the research I've found is for iterating within a dataframe or whatever, in the columns or rows, and I can't process how to make this work for my needs.
I am thinking that if I made a vector of all the numbers (really they're names), like "01080204", "01090003", "01100001", "18020116", "18020125", "15080303", "16020301", "03170006", "04010101", "04010201", etc
There must be a way to say, in code, "now pick the next name, and run the script". I looked at the lapply, mapply, sapply family and couldn't seem to figure it out.
If you are looking for pattern in files _45Fall_ you can use list.files.
fnames <- list.files("~/Desktop/modified_files/", pattern = "*_45Fall_\\d+\\.csv$")

how to get the last part of strings with different lengths ended by ".nc" [duplicate]

This question already has answers here:
Get filename without extension in R
(9 answers)
Find file name from full file path
(4 answers)
Closed 3 years ago.
I have several download links (i.e., strings), and each string has different length.
For example let's say these fake links are my strings:
My_Link1 <- "http://esgf-data2.diasjp.net/pr/gn/v20190711/pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
My_Link2 <- "http://esgf-data2.diasjp.net/gn/v20190711/pr_-present_r1i1p1f1_gn_19500101-19591231.nc"
My goals:
A) I want to have only the last part of each string ended by .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc
pr_-present_r1i1p1f1_gn_19500101-19591231.nc
B) I want to have only the last part of each string before .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231
pr_-present_r1i1p1f1_gn_19500101-19591231
I tried to find a way on the net, but I failed. It seems this can be done in Python as documented here:
How to get everything after last slash in a URL?
Does anyone know the same method in R?
Thanks so much for your time.
A shortcut to get last part of the string would be to use basename
basename(My_Link1)
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
and for the second question if you want to remove the last ".nc" we could use sub like
sub("\\.nc", "", basename(My_Link1))
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231"
With some regex here is another way to get first part :
sub(".*/", "", My_Link1)

R - how to write a function to read a CSV file [duplicate]

This question already has answers here:
Calculate the mean of one column from several CSV files
(2 answers)
Closed 4 years ago.
I have CSV files named "001", "002",..."100" stored in the working directory. I need to write a function to read any of these files. I tried the function below, but it doesn't work.
func = function(ID)
{
inp = read.csv("ID.csv")
}
I think this is because "ID.csv" is a character whereas ID is a numeric variable, but I am not sure. Can someone please explain the reason and suggest the right code?
Sounds like you sort of understand the problem. "ID.csv" is a string literal and it is literally looking for a file named ID.csv. If I were you, I would input ID as a string like you have it (i.e. "001" instead of 1). Then try this:
func = function(ID)
{
inp = read.csv(paste(ID,".csv",sep=""))
}

Naming output files in R [duplicate]

This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 6 years ago.
I'm working in R and I would like to export a txt file putting in its name the value of a particular variable; I read about the command paste and it works perfectly here:
write.table(mydata,file=paste(cn,"data.txt"))
where cn is the value to put at the beginning of the file data.txt. I would like to automatically put this file in an output folder where I keep all the other results. I try to do something like this:
write.table(mydata,file=paste(cn,"./output/data.txt"))
But it doesn't work. Any suggestion?
paste() just creates a string by concatenating the individual values and uses a space as default separator:
write.table(mydata, file = paste("./output/", cn ,"data.txt", sep = ""))
or with paste0(...), which is equivalent to paste(..., sep = ""):
write.table(mydata, file = paste0("./output/", cn ,"data.txt"))

In R, How to remove some unwanted charaters from the CSV file names and also extract dates? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a folder which contains some 2000 CSVs with file names that contain character '[ ]' in it - e.g.: [Residential]20151001_0000_1.csv
I want to:
Remove '[]' from names so that we have file name as:
Residential_20151001_0000_1.csv
and place new files within a new folder.
The read all the files from that new folder in one data frame (without header) after skipping first row from each file.
Also extract 20151001 as date (e.g. 2015-10-01) in a new vector as list such that the new vector is:
File Name Date
Residential_20151001_0000_1.csv 2015-10-01
This code will answer your first question albeit with a small change in logic.
Firstly, lets create a backup of all the csv containing [] by copying them to another folder. For eg - If your csvs were in directory "/Users/xxxx/Desktop/Sub", we will copy them in the folder Backup.
Therefore,
library(stringr)
library(tools)
setwd("/Users/xxxx/Desktop/Sub")
dir.create("Backup")
files<-data.frame(file=list.files(path=".", pattern = "*.csv"))
for (f in files)
file.copy(from= file.path("/Users/xxxx/Desktop/Sub", files$file), to= "/Users/xxxx/Desktop/Sub/Backup")
This has now copied all the csv files to folder Backup.
Now lets rename the files in your original working directory by removing the "[]".
I have taken a slightly longer route by creating a dataframe with the old names and new names to make things easier for you.
Name<-file_path_sans_ext(files$file)
files<-cbind(files, Name)
files$Name<-gsub("\\[", "",files$Name)
files$Name<-gsub("\\]", "_",files$Name)
files$Name<-paste(files$Name,".csv",sep="")
This dataframe looks like:
files
file Name
1 [Residential]20150928_0000_4.csv Residential_20150928_0000_4.csv
2 [Residential]20151001_0000_1.csv Residential_20151001_0000_1.csv
3 [Residential]20151101_0000_3.csv Residential_20151101_0000_3.csv
4 [Residential]20151121_0000_2.csv Residential_20151121_0000_2.csv
5 [Residential]20151231_0000_5.csv Residential_20151231_0000_5.csv
Now lets rename the files to remove the "[]". The idea here is to replace file with Name:
for ( f in files$file)
file.rename(from=file.path("/Users/xxxx/Desktop/Sub", files$file),
to=file.path("/Users/xxxx/Desktop/Sub",files$Name))
You've renamed your files now. If you run: list.files(path=".", pattern = "*.csv") You will get the new files:
"Residential_20150928_0000_4.csv"
"Residential_20151001_0000_1.csv"
"Residential_20151101_0000_3.csv"
"Residential_20151121_0000_2.csv"
"Residential_20151231_0000_5.csv"
Try it!
In order:
After googling r replace part of string I found: R - how to replace parts of variable strings within data frame. This should get you up and running for this issue.
For skipping the first line, read the documentation of read.csv. There you will find the skip argument.
Have a look at the strftime/strptime functions. Alternatively, have a look at lubridate.

Resources