Reading a file into R with partly unknown filename - r

Is there a way to read a file into R where I do not know the complete file name. Something like.
read.csv("abc_*")
In this case I do not know the complete file name after abc_

If you have exactly one file matching your criteria, you can do it like this:
read.csv(dir(pattern='^abc_')[1])
If there is more than one file, this approach would just use the first hit. In a more elaborated version you could loop over all matches and append them to one dataframe or something like that.
Note that the pattern uses regular expressions and thus is a bit different from what you did expect (and what I wrongly assumed at my first shot to answer the question). Details can be found using ?regex
If you have a directory you want to submit, you have do modify the dir command accordingly:
read.csv(dir('path/to/your/file', full.names=T, pattern="^abc"))
The submitted path in your case may be c:\\users\\user\\desktop, and then the pattern as above. full.names=T forces dir() to output a whole path and not only the file name. Try running dir(...) without the read.csv to understand what is happening there.
If you want to give your path as a complete string, it again gets a bit more complicated:
filepath <- 'path/to/your/file/abc_'
read.csv(dir(dirname(filepath), full.names=T, pattern=paste("^", basename(filepath), sep='')))
That process will fail if your filename contains any regular expression keywords. You would have to substitute then with their corresponding escape sequences upfront. But that again is another topic.

Related

How to create a list of file with dbf extension?

I faced a problem when creating a list of files with .dbf (lowercase) extension. Here is the code:
listdbf <- dir(pattern = "*.dbf")
If I run thid code nothing happens.
However when I run code with uppercase extension specification listdbf <- dir(pattern = "*.DBF") everything is ok.
But I definetely need to write code with lowercase extension specification.
How can I overcome this issue?
I can't be sure with out the list of files in the directory, but I suspect it maybe related to the pattern is taken as a regex, which means '.' is a wild card representing any character. You need to escape it to represent the dot/period character. I think the pattern you really want is pattern = "\\.dbf$". This will also ensure it's at the end of the name.

Using R to read all files in a specific format and with specific extension

I want to read all the files in the xlsx format starting with a string named "csmom". I have used list.files function. But I do not know how to set double pattern. Please see the code. I want to read all the files starting csmom string and they all should be in .xlsx format.
master1<-list.files(path = "C:/Users/Admin/Documents/csmomentum funal",pattern="^csmom")
master2<-list.files(path = "C:/Users/Admin/Documents/csmomentum funal",pattern="^\\.xlsx$")
#jay.sf solution works for creating a regular expression to pull out the condition that you want.
However, generally speaking if you want to cross two lists to find the subset of elements that are contained in both (in your case the files that satisfy both conditions), you can use intersect().
intersect(master1, master2)
Will show you all the files that satisfy pattern 1 and pattern 2.

readcsv fails to read # character in Julia

I've been using asd=readcsv(filename) to read a csv file in Julia.
The first row of the csv file contains strings which describe the column contents; the rest of the data is a mix of integers and floats. readcsv reads the numbers just fine, but only reads the first 4+1/2 string entries.
After that, it renders "". If I ask the REPL to display asd[1,:], it tells me it is 1x65 Array{Any,2}.
The fifth column in the first row of the csv file (this seems to be the entry it chokes on) is APP #1 bias voltage [V]; but asd[1,5] is just APP . So it looks to me as though readcsv has choked on the "#" character.
I tried using "quotes=false" keyword in readcsv, but it didn't help.
I used to use xlsread in Matlab and it worked fine.
Has anybody out there seen this sort of thing before?
The comment character in Julia is #, and this applies when reading files from delimited text files.
But luckily, the readcsv() and readdlm() functions have an optional argument to help in these situations.
You should try readcsv(filename; comment_char = '/').
Of course, the example above assumes that you don't have any / characters in your first line. If you do, then you'll have to change that / above to something else.

list of files with space in the name

I would like to get the list of files with a specific extention in a folder. However, these files has space in the name. So for example, imagining I have files named file test1.txt, file test2.txt, file test3.txt, file test4.txt, if I do
list.files(pattern="file test*.txt")
I got
character(0)
NOTA: Apparentely, using simply pattern="file test*" it works fine but I need the extention file as well.
Try:
list.files(pattern="file test.*.txt")
Actually, what this says is:
list.files(pattern="file test(.*).txt")
(which also works). . refers to any character and * refers to the idea that this character should be present 0 or more times (see ?regex).
In your kast example you said that using pattern="file test*" works but you need a way to search for the extension as well.
All you have to do is Change your code to pattern="file test.*.txt". This would make your code search for any filename that matched "file testX.txt" with any one character in place of X.

Using R to list all files with a specified extension

I'm very new to R and am working on updating an R script to iterate through a series of .dbf tables created using ArcGIS and produce a series of graphs.
I have a directory, C:\Scratch, that will contain all of my .dbf files. However, when ArcGIS creates these tables, it also includes a .dbf.xml file. I want to remove these .dbf.xml files from my file list and thus my iteration. I've tried searching and experimenting with regular expressions to no avail. This is the basic expression I'm using (Excluding all of the various experimentation):
files <- list.files(pattern = "dbf")
Can anyone give me some direction?
files <- list.files(pattern = "\\.dbf$")
$ at the end means that this is end of string. "dbf$" will work too, but adding \\. (. is special character in regular expressions so you need to escape it) ensure that you match only files with extension .dbf (in case you have e.g. .adbf files).
Try this which uses globs rather than regular expressions so it will only pick out the file names that end in .dbf
filenames <- Sys.glob("*.dbf")
Peg the pattern to find "\\.dbf" at the end of the string using the $ character:
list.files(pattern = "\\.dbf$")
Gives you the list of files with full path:
Sys.glob(file.path(file_dir, "*.dbf")) ## file_dir = file containing directory
I am not very good in using sophisticated regular expressions, so I'd do such task in the following way:
files <- list.files()
dbf.files <- files[-grep(".xml", files, fixed=T)]
First line just lists all files from working dir. Second one drops everything containing ".xml" (grep returns indices of such strings in 'files' vector; subsetting with negative indices removes corresponding entries from vector).
"fixed" argument for grep function is just my whim, as I usually want it to peform crude pattern matching without Perl-style fancy regexprs, which may cause surprise for me.
I'm aware that such solution simply reflects drawbacks in my education, but for a novice it may be useful =) at least it's easy.

Resources