I want to find a file foo.bar on a disk (in order to read that file). If I am using Windows, I can use:
list.files(path = "C:/", pattern = "^foo\\.bar$", recursive = TRUE, full.names = TRUE)
but this is quite slow (11 seconds on my machine) -- in contrast, Search Everything returns the result in less than 2 seconds. Is there a faster pure R way? If there are multiple matches, any of the matches will do. Cross-platform solutions are preferred.
Related
I am very new to Julia and I have a question regarding reading some files. I need to read 12500 .txt files from the same directory and save them all into 1 array but I'm having performance issues. Is there a fast way of doing this? My code takes around like 60 seconds which is way more than I can afford. Here is what I have:
function load_train(directory)
data = []
dir = joinpath("./aclImdb/train/",directory)
for f in readdir(dir)
s = read(joinpath(dir,f),String)
push!(data,s)
end
data
end
trainPos = load_train("pos/")
I need to change the files creation / modification dates for an exercice.
According to http://theautomatic.net/2018/12/18/how-to-change-file-last-modified-date-with-r/ , I can find files info with file.info()function.
I can also change the dates modification of all the files in a folder combining sapply and Sys.setFileTime :
sapply(list.files("path", full.names = TRUE),
function(FILE) Sys.setFileTime(FILE, "1975-01-01"))
How can I also change date of creation ? (ctime <S3: POSIXct>)
What I expect : for the exercice : I want to change randomly the dates of creation and modification of files (in a range of a year, files are not are created at the same day)
What I tried (changing only date of modification) :
sapply(list.files("C:/Users/cariou-w/Nextcloud/sync-uncloud/1920-1921/master/web_scrapping/plans", full.names = TRUE),
function(FILE) Sys.setFileTime(FILE, paste0("2020-11-", sample(days,1))))
There is no generic cross platform way to change file creation time in R - many file systems do not even track it. You can use system or system2 to call whatever the appropriate shell command is for you. E.g. on OS X, touch -t 202001011234 /path/to/my/file
I have an R script that takes a file as input, and I want a general way to know whether the input is a file that exists, and is not a directory.
In Python you would do it this way: How do I check whether a file exists using Python?, but I was struggling to find anything similar in R.
What I'd like is something like below, assuming that the file.txt actually exists:
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
is.file(input.good) # should return TRUE
is.file(input.bad) #should return FALSE
R has something called file.exists(), but this doesn't distinguish files from directories.
There is a dir.exists function in all recent versions of R.
file.exists(f) && !dir.exists(f)
The solution is to use file_test()
This gives shell-style file tests, and can distinguish files from folders.
E.g.
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
file_test("-f", input.good) # returns TRUE
file_test("-f", input.bad) #returns FALSE
From the manual:
Usage
file_test(op, x, y) Arguments
op a character string specifying the test to be performed. Unary
tests (only x is used) are "-f" (existence and not being a directory),
"-d" (existence and directory) and "-x" (executable as a file or
searchable as a directory). Binary tests are "-nt" (strictly newer
than, using the modification dates) and "-ot" (strictly older than):
in both cases the test is false unless both files exist.
x, y character vectors giving file paths.
You can also use is_file(path) from the fs package.
I am new to programming and i am running this script to clean a large text file (over 12000 lines) and write it to another .txt file. The problem is when a run this with a smaller file (roughly around 500 line) it executes fast, therefore my conclusion was it is taking time due to the size of the file. So if someone can guide me to make this code efficient it will be highly appreciated.
input_file = open('bNEG.txt', 'rt', encoding='utf-8')
l_p = LanguageProcessing()
sentences=[]
for lines in input_file.readlines():
tokeniz = l_p.tokeniz(lines)
cleaned_url = l_p.clean_URL(tokeniz)
remove_words = l_p.remove_non_englishwords(cleaned_url)
stopwords_removed = l_p.remove_stopwords(remove_words)
cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n"
output_file = open('cNEG.txt', 'w', encoding='utf-8')
sentences.append(cleaned_sentence)
output_file.writelines(sentences)
input_file.close()
output_file.close()
EDIT: Below is the corrected code as mentioned in the answer with few other alteration to suit my requirements
input_file = open('chromehistory_log.txt', 'rt', encoding='utf-8')
output_file = open('dNEG.txt', 'w', encoding='utf-8')
l_p = LanguageProcessing()
#sentences=[]
for lines in input_file.readlines():
#print(lines)
tokeniz = l_p.tokeniz(lines)
cleaned_url = l_p.clean_URL(tokeniz)
remove_words = l_p.remove_non_englishwords(cleaned_url)
stopwords_removed = l_p.remove_stopwords(remove_words)
#print(stopwords_removed)
if stopwords_removed==[]:
continue
else:
cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n"
#sentences.append(cleaned_sentence)
output_file.writelines(cleaned_sentence)
input_file.close()
output_file.close()
To have the discussion as answer:
Two problems are here:
You open / create the outputfile and write the data in the loop - for every line of the input file. Additional you are collection all data in an array (sentences).
You have two possibilities:
a) Create the file before the loop, and write in the loop just "cleaned_sentence" (and delete the collecting "sentences").
b) Collect everything in "sentences" and write "sentences" at once after the loop.
Disadvantage of a) is: this is a bit slower than b) (as long as the OS di not have to swap memory for b). But the advantage is: This is much less memory consuming and will work no matter how big the file is and how less memory in the computer is installed.
What is the most robust method to move an entire directory from say /tmp/RtmpK4k1Ju/oldname to /home/jeroen/newname? The easiest way is file.rename however this doesn't always work, for example when from and to are on different disks. In that case the entire directory needs to be recursively copied.
Here is something I came up with, however it's a bit involved, and I'm not sure it will work cross-platform. Is there a better way?
dir.move <- function(from, to){
stopifnot(!file.exists(to));
if(file.rename(from, to)){
return(TRUE)
}
stopifnot(dir.create(to, recursive=TRUE));
setwd(from)
if(all(file.copy(list.files(all.files=TRUE, include.dirs=TRUE), to, recursive=TRUE))){
#success!
unlink(from, recursive=TRUE);
return(TRUE)
}
#fail!
unlink(to, recursive=TRUE);
stop("Failed to move ", from, " to ", to);
}
I think file.copy shall be sufficient.
file.copy(from, to, overwrite = recursive, recursive = FALSE,
copy.mode = TRUE)
From ?file.copy:
from, to: character vectors, containing file names or paths. For
‘file.copy’ and ‘file.symlink’ ‘to’ can alternatively
be the path to a single existing directory.
and:
recursive: logical. If ‘to’ is a directory, should directories in
‘from’ be copied (and their contents)? (Like ‘cp -R’ on
POSIX OSes.)
From the description about recursive we know from can have directories. Therefore in your above code listing all files before copy is unnecessary. And just remember the to directory would be the parent of the copied from. For example, after file.copy("dir_a/", "new_dir/", recursive = T), there'd be a dir_a under new_dir.
Your code have done the deletion part pretty well. unlink has a nice recursive option, which file.remove doesn't.
unlink(x, recursive = FALSE, force = FALSE)
Why not just invoke the system directly:
> system('mv /tmp/RtmpK4k1Ju/oldname /home/jeroen/newname')