How to Read Data from .rda with read.table [duplicate]

How to Read Data from .rda with read.table [duplicate] - r

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit

Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.

I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.

I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.

The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?

I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

Related

Can convert a string to an object but can't save() it -- why? [duplicate]

I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)

Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)

My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.

You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)

you can use
save.image("myfile.RData")

This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.

readxl::read_xls returns "libxls error: Unable to open file"

I have multiple .xls (~100MB) files from which I would like to load multiple sheets (from each) into R as a dataframe. I have tried various functions, such as xlsx::xlsx2 and XLConnect::readWorksheetFromFile, both of which always run for a very long time (>15 mins) and never finish and I have to force-quit RStudio to keep working.
I also tried gdata::read.xls, which does finish, but it takes more than 3 minutes per one sheet and it cannot extract multiple sheets at once (which would be very helpful to speed up my pipeline) like XLConnect::loadWorkbook can.
The time it takes these functions to execute (and I am not even sure the first two would ever finish if I let them go longer) is way too long for my pipeline, where I need to work with many files at once. Is there a way to get these to go/finish faster?
In several places, I have seen a recommendation to use the function readxl::read_xls, which seems to be widely recommended for this task and should be faster per sheet. This one, however, gives me an error:
> # Minimal reproducible example:
> setwd("/Users/USER/Desktop")
> library(readxl)
> data <- read_xls(path="test_file.xls")
Error:
filepath: /Users/USER/Desktop/test_file.xls
libxls error: Unable to open file
I also did some elementary testing to make sure the file exists and is in the correct format:
> # Testing existence & format of the file
> file.exists("test_file.xls")
[1] TRUE
> format_from_ext("test_file.xls")
[1] "xls"
> format_from_signature("test_file.xls")
[1] "xls"
The test_file.xls used above is available here.
Any advice would be appreciated in terms of making the first functions run faster or the read_xls run at all - thank you!
UPDATE:
It seems that some users are able to open the file above using the readxl::read_xls function, while others are not, both on Mac and Windows, using the most up to date versions of R, Rstudio, and readxl. The issue has been posted on the readxl GitHub and has not been resolved yet.

I downloaded your dataset and read each excel sheet in this way (for example, for sheets "Overall" and "Area"):
install.packages("readxl")
library(readxl)
library(data.table)
dt_overall <- as.data.table(read_excel("test_file.xls", sheet = "Overall"))
area_sheet <- as.data.table(read_excel("test_file.xls", sheet = "Area"))
Finally, I get dt like this (for example, only part of the dataset for the "Area" sheet):
Just as well, you can use the read_xls function instead read_excel.
I checked, it also works correctly and even a little faster, since read_excel is a wrapper over read_xls and read_xlsx functions from readxl package.
Also, you can use excel_sheets function from readxl package to read all sheets of your Excel file.
UPDATE
Benchmarking is done with microbenchmark package for the following packages/functions: gdata::read.xls, XLConnect::readWorksheetFromFile and readxl::read_excel.
But XLConnect it's a Java-based solution, so it requires a lot of RAM.

I found that I was unable to open the file with read_xl immediately after downloading it, but if I opened the file in Excel, saved it, and closed it again, then read_xl was able to open it without issue.
My suggested workaround for handling hundreds of files is to build a little C# command line utility that opens, saves, and closes an Excel file. Source code is below, the utility can be compiled with visual studio community edition.
using System.IO;
using Excel = Microsoft.Office.Interop.Excel;
namespace resaver
{
class Program
{
static void Main(string[] args)
{
string srcFile = Path.GetFullPath(args[0]);
Excel.Application excelApplication = new Excel.Application();
excelApplication.Application.DisplayAlerts = false;
Excel.Workbook srcworkBook = excelApplication.Workbooks.Open(srcFile);
srcworkBook.Save();
srcworkBook.Close();
excelApplication.Quit();
}
}
}
Once compiled, the utility can be called from R using e.g. system2().

I will propose a different workflow. If you happen to have LibreOffice installed, then you can convert your excel files to csv programatically. I have Linux, so I do it in bash, but I'm sure it can be possible in macOS.
So open a terminal and navigate to the folder with your excel files and run in terminal:
for i in *.xls
do soffice --headless --convert-to csv "$i"
done
Now in R you can use data.table::fread to read your files with a loop:
Scenario 1: the structure of files is different
If the structure of files is different, then you wouldn't want to rbind them together. You could run in R:
files <- dir("path/to/files", pattern = ".csv")
all_files <- list()
for (i in 1:length(files)){
fileName <- gsub("(^.*/)(.*)(.csv$)", "\\2", files[i])
all_files[[fileName]] <- fread(files[i])
}
If you want to extract your named elements within the list into the global environment, so that they can be converted into objects, you can use list2env:
list2env(all_files, envir = .GlobalEnv)
Please be aware of two things: First, in the gsub call, the direction of the slash. And second, list2env may overwrite objects in your Global Environment if they have the same name as the named elements within the list.
Scenario 2: the structure of files is the same
In that case it's likely you want to rbind them all together. You could run in R:
files <- dir("path/to/files", pattern = ".csv")
joined <- list()
for (i in 1:length(files)){
joined <- rbindlist(joined, fread(files[i]), fill = TRUE)
}

On my system, i had to use path.expand.
R> file = "~/blah.xls"
R> read_xls(file)
Error:
filepath: ~/Dropbox/signal/aud/rba/balsheet/data/a03.xls
libxls error: Unable to open file
R> read_xls(path.expand(file)) # fixed

Resaving your file and you can solve your problem easily.
I also find this problem before but I get the answer from your discussion.
I used the read_excel() to open those files.

I was seeing a similar error and wanted to share a short-term solution.
library(readxl)
download.file("https://mjwebster.github.io/DataJ/spreadsheets/MLBpayrolls.xls", "MLBPayrolls.xls")
MLBpayrolls <- read_excel("MLBpayrolls.xls", sheet = "MLB Payrolls", na = "n/a")
Yields (on some systems in my classroom but not others):
Error: filepath: MLBPayrolls.xls libxls error: Unable to open file
The temporary solution was to paste the URL of the xls file into Firefox and download it via the browser. Once this was done we could run the read_excel line without error.
This was happening today on Windows 10, with R 3.6.2 and R Studio 1.2.5033.

If you have downloaded the .xls data from the internet, even if you are opening it in Ms.Excel, it will open a prompt first asking to confirm if you trust the source, see below screenshot, I am guessing this is the reason R (read_xls) also can't open it, as it's considered unsafe. Save it as .xlsx file and then use read_xlsx() or read_excel().

Even thought this is not a code-based solution, I just changed the type file. For instance, instead of xls I saved as csv or xlsx. Then I opened it as regular one.
I worked it for me, because when I opened my xlsfile, I popped up the message: "The file format and extension of 'file.xls'' don't match. The file could be corrupted or unsafe..."

Reading a binary file in R [duplicate]

I saw some similar qestions and I tried to work it out on my own, but I couldn't. This is my problem:
I have to load a isfar.RData file to use it in other computation (which are not important to describe here). And I would like to simply see how looks data in this isfar.RData file e.g. what numbers, columns, rows it carries.
First I load my file:
isfar<-load("C:/Users/isfar.RData")
When I try to obtain this information (I'm using Rcmdr) by ls() function or marking isfar at the beginning after loading I get in the output window: [1] "isfar" instead of the table. Why?
Thanks a lot, I appreciate all of the answers! Hope it's comprehensible what I wrote, Im not a native speaker.

I think the problem is that you load isfar data.frame but you overwrite it by value returned by load.
Try either:
load("C:/Users/isfar.RData")
head(isfar)
Or more general way
load("C:/Users/isfar.RData", ex <- new.env())
ls.str(ex)

you can try
isfar <- get(load('c:/users/isfar.Rdata'))
this will assign the variable in isfar.Rdata to isfar . After this assignment, you
can use str(isfar) or ls(isfar) or head(isfar) to get a rough look of the isfar.

Look at the help page for load. What load returns is the names of the objects created, so you can look at the contents of isfar to see what objects were created. The fact that nothing else is showing up with ls() would indicate that maybe there was nothing stored in your file.
Also note that load will overwrite anything in your global environment that has the same name as something in the file being loaded when used with default behavior. If you mainly want to examine what is in the file, and possibly use something from that file along with other objects in your global environment then it may be better to use the attach function or create a new environment (new.env) and load the file into that environment using the envir argument to load.

This may fit better as a comment but I don't have enough reputation, so I put it here.
It worth mentioning that the load() function will retain the object name that was originally saved no matter how you name the .Rdata file.
Please check the name of the data.frame object used in the save() function. If you were using RStudio, you could check the upper right panel, Global Environment-Data, to find the name of the data you load.

If you have a lot of variables in your Rdata file and don't want them to clutter your global environment, create a new environment and load all of the data to this new environment.
load(file.path("C:/Users/isfar.RData"), isfar_env <- new.env() )
# Access individual variables in the RData file using '$' operator
isfar_env$var_name
# List all of the variable names in RData:
ls(isfar_env)

You can also import the data via the "Import Dataset" tab in RStudio, under "global environment."
Use the text data option in the drop down list and select your .RData file from the folder.
Once the import is complete, it will display the data in the console.
Hope this helps.

It sounds like the only varaible stored in the .RData file was one named isfar.
Are you really sure that you saved the table? The command should have been:
save(the_table, file = "isfar.RData")
There are many ways to examine a variable.
Type it's name at the command prompt to see it printed. Then look at str, ls.str, summary, View and unclass.

You don't seem to need to assign it to a variable. That bit magically happens. In fact, assigning it to a variable might mean you end up with two variables with the same data.
get(load('C:/Users/isfar.Rdata'))
Or if it's in the same folder as your R code...
get(load('isfar.Rdata'))

isfar<-load("C:/Users/isfar.RData")
if(is.data.frame(isfar)){
names(isfar)
}
If isfar is a dataframe, this will print out the names of its columns.

num <- seq(0, 5, length.out=10) #create object num
num
[1] 0.00 1.25 2.50 3.75 5.00
save(num, file = 'num.RData') #save num ro RData
rm(num) #remove num
load("num.RData") #load num from RData
num
[1] 0.00 1.25 2.50 3.75 5.00
> isfar<-load("num.RData")
> typeof(isfar)
[1] "character"
> isfar #list objects saved in RData
[1] "num"

How to see data from .RData file?

I saw some similar qestions and I tried to work it out on my own, but I couldn't. This is my problem:
I have to load a isfar.RData file to use it in other computation (which are not important to describe here). And I would like to simply see how looks data in this isfar.RData file e.g. what numbers, columns, rows it carries.
First I load my file:
isfar<-load("C:/Users/isfar.RData")
When I try to obtain this information (I'm using Rcmdr) by ls() function or marking isfar at the beginning after loading I get in the output window: [1] "isfar" instead of the table. Why?
Thanks a lot, I appreciate all of the answers! Hope it's comprehensible what I wrote, Im not a native speaker.

I think the problem is that you load isfar data.frame but you overwrite it by value returned by load.
Try either:
load("C:/Users/isfar.RData")
head(isfar)
Or more general way
load("C:/Users/isfar.RData", ex <- new.env())
ls.str(ex)

you can try
isfar <- get(load('c:/users/isfar.Rdata'))
this will assign the variable in isfar.Rdata to isfar . After this assignment, you
can use str(isfar) or ls(isfar) or head(isfar) to get a rough look of the isfar.

Look at the help page for load. What load returns is the names of the objects created, so you can look at the contents of isfar to see what objects were created. The fact that nothing else is showing up with ls() would indicate that maybe there was nothing stored in your file.
Also note that load will overwrite anything in your global environment that has the same name as something in the file being loaded when used with default behavior. If you mainly want to examine what is in the file, and possibly use something from that file along with other objects in your global environment then it may be better to use the attach function or create a new environment (new.env) and load the file into that environment using the envir argument to load.

This may fit better as a comment but I don't have enough reputation, so I put it here.
It worth mentioning that the load() function will retain the object name that was originally saved no matter how you name the .Rdata file.
Please check the name of the data.frame object used in the save() function. If you were using RStudio, you could check the upper right panel, Global Environment-Data, to find the name of the data you load.

If you have a lot of variables in your Rdata file and don't want them to clutter your global environment, create a new environment and load all of the data to this new environment.
load(file.path("C:/Users/isfar.RData"), isfar_env <- new.env() )
# Access individual variables in the RData file using '$' operator
isfar_env$var_name
# List all of the variable names in RData:
ls(isfar_env)

You can also import the data via the "Import Dataset" tab in RStudio, under "global environment."
Use the text data option in the drop down list and select your .RData file from the folder.
Once the import is complete, it will display the data in the console.
Hope this helps.

It sounds like the only varaible stored in the .RData file was one named isfar.
Are you really sure that you saved the table? The command should have been:
save(the_table, file = "isfar.RData")
There are many ways to examine a variable.
Type it's name at the command prompt to see it printed. Then look at str, ls.str, summary, View and unclass.

You don't seem to need to assign it to a variable. That bit magically happens. In fact, assigning it to a variable might mean you end up with two variables with the same data.
get(load('C:/Users/isfar.Rdata'))
Or if it's in the same folder as your R code...
get(load('isfar.Rdata'))

isfar<-load("C:/Users/isfar.RData")
if(is.data.frame(isfar)){
names(isfar)
}
If isfar is a dataframe, this will print out the names of its columns.

num <- seq(0, 5, length.out=10) #create object num
num
[1] 0.00 1.25 2.50 3.75 5.00
save(num, file = 'num.RData') #save num ro RData
rm(num) #remove num
load("num.RData") #load num from RData
num
[1] 0.00 1.25 2.50 3.75 5.00
> isfar<-load("num.RData")
> typeof(isfar)
[1] "character"
> isfar #list objects saved in RData
[1] "num"

How to save() with a particular variable name

I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)

Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)

My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.

You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)

you can use
save.image("myfile.RData")

This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.