file.create syntax - r

Super big newbie to R. I'm a bit stuck on the file.create function. I've used it successfully to create a file in the set working directory and also when I've already created a separate file path and assigned that file path to a variable.
However, why can't I use file.create and simply list the desired file path and file name without the file.path function? Does the file.create function not possess the capacity to automatically create the file in the specified directory, but requires the file.path function to secure the path to the directory?
Any clarification would be greatly appreciated. I do apologize if this question is rather elementary but I'd like to get the fundamentals down.
Here's the code that worked:
BasicDir <- "/Users/slam1924/Desktop/LearnR Tutorials"
setwd(BasicDir)
file.create("myfile.doc")
fp1 <- file.path("/Users/slam1924/Desktop/Vocal Covers", "mytext.doc")
fp1
file.create(fp1)
Alternative:
file.create(file.path("/Users/slam1924/Desktop/Vocal Covers", "mytext.doc"))
Here's the code that failed:
file.create("/Users/slam1924/Desktop/Vocal Covers", "mytext.doc")

Start by reading the help for the function. help(file.create). The usage is file.create(..., showWarnings = TRUE)
Under Details you'll see
file.create creates files with the given names if they do not already
exist and truncates them if they do.
So when you try
file.create("/Users/slam1924/Desktop/Vocal Covers", "mytext.doc")
It's trying to create two files, one of which ("/Users/slam1924/Desktop/Vocal Covers") is likely already a directory.
If the file or directory already exists, you'll see an error like:
[1] FALSE
Warning message:
In file.create("data") :
cannot create file 'data', reason 'Permission denied'
You could fix this by sending the function one string. Change your code that failed to:
file.create("/Users/slam1924/Desktop/Vocal Covers/mytext.doc")

Related

How to Read Data from .rda with read.table [duplicate]

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit
Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.
I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.
I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.
The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?
I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

readxl::read_xls returns "libxls error: Unable to open file"

I have multiple .xls (~100MB) files from which I would like to load multiple sheets (from each) into R as a dataframe. I have tried various functions, such as xlsx::xlsx2 and XLConnect::readWorksheetFromFile, both of which always run for a very long time (>15 mins) and never finish and I have to force-quit RStudio to keep working.
I also tried gdata::read.xls, which does finish, but it takes more than 3 minutes per one sheet and it cannot extract multiple sheets at once (which would be very helpful to speed up my pipeline) like XLConnect::loadWorkbook can.
The time it takes these functions to execute (and I am not even sure the first two would ever finish if I let them go longer) is way too long for my pipeline, where I need to work with many files at once. Is there a way to get these to go/finish faster?
In several places, I have seen a recommendation to use the function readxl::read_xls, which seems to be widely recommended for this task and should be faster per sheet. This one, however, gives me an error:
> # Minimal reproducible example:
> setwd("/Users/USER/Desktop")
> library(readxl)
> data <- read_xls(path="test_file.xls")
Error:
filepath: /Users/USER/Desktop/test_file.xls
libxls error: Unable to open file
I also did some elementary testing to make sure the file exists and is in the correct format:
> # Testing existence & format of the file
> file.exists("test_file.xls")
[1] TRUE
> format_from_ext("test_file.xls")
[1] "xls"
> format_from_signature("test_file.xls")
[1] "xls"
The test_file.xls used above is available here.
Any advice would be appreciated in terms of making the first functions run faster or the read_xls run at all - thank you!
UPDATE:
It seems that some users are able to open the file above using the readxl::read_xls function, while others are not, both on Mac and Windows, using the most up to date versions of R, Rstudio, and readxl. The issue has been posted on the readxl GitHub and has not been resolved yet.
I downloaded your dataset and read each excel sheet in this way (for example, for sheets "Overall" and "Area"):
install.packages("readxl")
library(readxl)
library(data.table)
dt_overall <- as.data.table(read_excel("test_file.xls", sheet = "Overall"))
area_sheet <- as.data.table(read_excel("test_file.xls", sheet = "Area"))
Finally, I get dt like this (for example, only part of the dataset for the "Area" sheet):
Just as well, you can use the read_xls function instead read_excel.
I checked, it also works correctly and even a little faster, since read_excel is a wrapper over read_xls and read_xlsx functions from readxl package.
Also, you can use excel_sheets function from readxl package to read all sheets of your Excel file.
UPDATE
Benchmarking is done with microbenchmark package for the following packages/functions: gdata::read.xls, XLConnect::readWorksheetFromFile and readxl::read_excel.
But XLConnect it's a Java-based solution, so it requires a lot of RAM.
I found that I was unable to open the file with read_xl immediately after downloading it, but if I opened the file in Excel, saved it, and closed it again, then read_xl was able to open it without issue.
My suggested workaround for handling hundreds of files is to build a little C# command line utility that opens, saves, and closes an Excel file. Source code is below, the utility can be compiled with visual studio community edition.
using System.IO;
using Excel = Microsoft.Office.Interop.Excel;
namespace resaver
{
class Program
{
static void Main(string[] args)
{
string srcFile = Path.GetFullPath(args[0]);
Excel.Application excelApplication = new Excel.Application();
excelApplication.Application.DisplayAlerts = false;
Excel.Workbook srcworkBook = excelApplication.Workbooks.Open(srcFile);
srcworkBook.Save();
srcworkBook.Close();
excelApplication.Quit();
}
}
}
Once compiled, the utility can be called from R using e.g. system2().
I will propose a different workflow. If you happen to have LibreOffice installed, then you can convert your excel files to csv programatically. I have Linux, so I do it in bash, but I'm sure it can be possible in macOS.
So open a terminal and navigate to the folder with your excel files and run in terminal:
for i in *.xls
do soffice --headless --convert-to csv "$i"
done
Now in R you can use data.table::fread to read your files with a loop:
Scenario 1: the structure of files is different
If the structure of files is different, then you wouldn't want to rbind them together. You could run in R:
files <- dir("path/to/files", pattern = ".csv")
all_files <- list()
for (i in 1:length(files)){
fileName <- gsub("(^.*/)(.*)(.csv$)", "\\2", files[i])
all_files[[fileName]] <- fread(files[i])
}
If you want to extract your named elements within the list into the global environment, so that they can be converted into objects, you can use list2env:
list2env(all_files, envir = .GlobalEnv)
Please be aware of two things: First, in the gsub call, the direction of the slash. And second, list2env may overwrite objects in your Global Environment if they have the same name as the named elements within the list.
Scenario 2: the structure of files is the same
In that case it's likely you want to rbind them all together. You could run in R:
files <- dir("path/to/files", pattern = ".csv")
joined <- list()
for (i in 1:length(files)){
joined <- rbindlist(joined, fread(files[i]), fill = TRUE)
}
On my system, i had to use path.expand.
R> file = "~/blah.xls"
R> read_xls(file)
Error:
filepath: ~/Dropbox/signal/aud/rba/balsheet/data/a03.xls
libxls error: Unable to open file
R> read_xls(path.expand(file)) # fixed
Resaving your file and you can solve your problem easily.
I also find this problem before but I get the answer from your discussion.
I used the read_excel() to open those files.
I was seeing a similar error and wanted to share a short-term solution.
library(readxl)
download.file("https://mjwebster.github.io/DataJ/spreadsheets/MLBpayrolls.xls", "MLBPayrolls.xls")
MLBpayrolls <- read_excel("MLBpayrolls.xls", sheet = "MLB Payrolls", na = "n/a")
Yields (on some systems in my classroom but not others):
Error: filepath: MLBPayrolls.xls libxls error: Unable to open file
The temporary solution was to paste the URL of the xls file into Firefox and download it via the browser. Once this was done we could run the read_excel line without error.
This was happening today on Windows 10, with R 3.6.2 and R Studio 1.2.5033.
If you have downloaded the .xls data from the internet, even if you are opening it in Ms.Excel, it will open a prompt first asking to confirm if you trust the source, see below screenshot, I am guessing this is the reason R (read_xls) also can't open it, as it's considered unsafe. Save it as .xlsx file and then use read_xlsx() or read_excel().
Even thought this is not a code-based solution, I just changed the type file. For instance, instead of xls I saved as csv or xlsx. Then I opened it as regular one.
I worked it for me, because when I opened my xlsfile, I popped up the message: "The file format and extension of 'file.xls'' don't match. The file could be corrupted or unsafe..."

box_dir_create() function from boxr package is not creating a folder in Box

I've been having difficulty getting boxr to successfully create a file within my box directory. My code reads:
library(boxr)
box_auth()
my_file_dir <- box_setwd("76009318507")
box_dir_create(dir_name="TEST", parent_dir_id = my_file_dir)
after running which, I get the following output:
box.com remote folder reference
name :
dir id :
size :
modified at :
created at :
uploaded by :
owned by :
shared link : None
parent folder name :
parent folder id :
Checking my box directory, I find no folders have been created.
I've tried using additional arguments within box_dir_create, but according to the documentation only dir_name and parent_dir_name are accepted.
Any help is much appreciated. I understand this is a somewhat obscure R package, so I've included links to the documentation below:
https://cran.r-project.org/web/packages/boxr/boxr.pdf
https://github.com/r-box/boxr
I got an answer via the package's developer, and I figured I'd pay it forward for any fellow travelers in the future.
It turns out that box_setwd() sets a default directory but returns nothing. Using
box_dir_create(dir_name="TEST", parent_dir_id = "76009318507")
creates the folder successfully. It will not do so if a folder of the same name is already created.
After more digging, I was also told that box_dir_create() is quietly passing back a lot of useful information, including the newly created directory's ID. To access it you can save the function results as a variable, like so:
b <- box_dir_create("test_dir")
names(b) # lots of info
b$id # what you want
box_ul(b$id, "image_file.jpg") # is this file by file?
box_push(b$id, "image_directory/") # or a directory wide operation?
Thanks for your help, and I hope this helps someone else down the road. Cheers!

R extension write local data

I am creating a package and would like to store settings data locally, since it is unique for each user of the package and so that the setting does not have to be set each time the package is loaded.
How can I do this in the best way?
You could save your necessary data in an object and save it using saveRDS()
whenever a change it made or when user is leaving or giving command for saving.
It saves the R object as it is under a file name in the specified path.
saveRDS(<obj>, "path/to/filename.rds")
And you can load it next time when package is starting using loadRDS().
The good thing of loadRDS() is that you can assign a new name to the obj. (So you don't have to remember its old obj name. However the old obj name is also loaded with the object and will eventually pollute your namespace.
newly.assigned.name <- loadRDS("path/to/filename.rds")
# or also possible:
loadRDS("path/to/filename.rds") # and use its old name
Where to store
Windows
Maybe here:
You can use %systemdrive%%homepath% environment variable to accomplish
this.
The two command variables when concatenated gives you the desired
user's home directory path as below:
Running echo %systemdrive% on command prompt gives:
C:
Running echo %homepath% on command prompt gives:
\Users\
When used together it becomes:
C:\Users\
Linux/OsX
Either in the package location of the user,
path.to.package <- find.package("name.of.your.pacakge",
lib.loc = NULL, quiet = FALSE,
verbose = getOption("verbose"))
# and then construct with
destination.folder.path <- file.path(path.to.package,
"subfoldername", "filename")`
# the path to the final destination
# You should use `file.path()` to construct such paths, because it detects automatically the correct ('/' or '\') separators for the file paths in Unix-derived systems (Linux/Mac Os X) versus Windows.
Or use the $HOME variable of the user and there in a file - the name of which beginning with "." - this is convention in Unix-systems (Linux/Mac OS X) for such kind of file which save configurations of software programs.
e.g. ".your-packages-name.rds".
If anybody has a better solution, please help!

How to capture list of csv files with complicated pattern by system.file()?

I have list of csv files that needed to used in my R package as external data. I used system.file() let these csv files available for my package vignette code. I quickly looked into regexp features of R in SO, get initial idea to do this. However, my function is not effective for me and I got error instead, because it did not capture pattern that appeared in inst/extdata directory, so some csv file is missing in my output. I expect all csv files can be captured by system.file() and printed out on console or stored in object. Can any one point me how to fix issue in getExtDat ? How can I capture all csv files in inst/extdata with matched pattern and print them out in R session ? Any efficient way to deal with csv files with complicated pattern ?
Note :
I asked similar question in SO, but my post was not precisely stated (old post was deleted). Here I come up with brand new correction. Thanks for help
Here is my external data set in my packages, list of csv files in extdata :
myPkg
- inst
- extdata
- wgEncodeOpenChromChipK562CmycAlnRep1.csv
- wgEncodeOpenChromChipK562CmycAlnRep2.csv
- wgEncodeOpenChromChipK562CmycAlnRep3.csv
- wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep1.csv
- wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep2.csv
- wgEncodeSydhTfbsK562CmycIfna30StdAlnRep1.csv
- wgEncodeSydhTfbsK562CmycIfna30StdAlnRep2.csv
- wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep1.csv
- wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep2.csv
- wgEncodeSydhTfbsK562CmycIggrabAlnRep1.csv
- wgEncodeSydhTfbsK562CmycIggrabAlnRep2.csv
- wgEncodeSydhTfbsK562CmycStdAlnRep1.csv
- wgEncodeSydhTfbsK562CmycStdAlnRep2.csv
- R
I intend to use system.file() to load my external data for my package use in package vignette code. because "wgEncode" pattern appeared in all csv files.
my desired output: all csv files in inst/extdata must be detected by system.file() and printed to console (or stored in object) :
output :
wgEncodeOpenChromChipK562CmycAlnRep1.csv
wgEncodeOpenChromChipK562CmycAlnRep2.csv
wgEncodeOpenChromChipK562CmycAlnRep3.csv
wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep1.csv
wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep2.csv
wgEncodeSydhTfbsK562CmycIfna30StdAlnRep1.csv
wgEncodeSydhTfbsK562CmycIfna30StdAlnRep2.csv
wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep1.csv
wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep2.csv
wgEncodeSydhTfbsK562CmycIggrabAlnRep1.csv
wgEncodeSydhTfbsK562CmycIggrabAlnRep2.csv
wgEncodeSydhTfbsK562CmycStdAlnRep1.csv
wgEncodeSydhTfbsK562CmycStdAlnRep2.csv
or :
csvFile <- print(getExtDat)
How can I achieve my desired output ? Any way to fix up my function to capture all csv files by using system.file() into console ? Thanks in advance :)
allfiles <- list.files(system.file(package = 'package_name'),
recursive = TRUE)
CSVfiles <- allfiles[grep("inst/extdata/+.*csv", allfiles)]
CSVfiles

Resources