Loading someone else's .rdata file, can't access the data - r

My professor has sent me an .rdata file and wants me to do some analysis on the contents. Although I'm decent with R, I've never saved my work in .rdata files, and consequently haven't ever worked with them.
When I try to load the file, it looks like it's working:
> load('/home/swansone/Desktop/anes.rdata')
> ls()
[1] "25383-0001-Data"
But I can't seem to get at the data:
> names("25383-0001-Data")
NULL
I know that there is data in the .rdata file (it's 13 MB, there's definitely a lot in there) Am I doing something wrong? I'm at a loss.
Edit:
I should note, I've also tried not using quotes:
> names(25383-0001-Data)
Error: object "Data" not found
And renaming:
> ls()[1] <- 'nes'
Error in ls()[1] <- "nes" : invalid (NULL) left side of assignment

You're going to run into a lot of issues with an object that doesn't begin with a letter or . and a letter (as mentioned in An Introduction to R).
Use backticks to access this object (the "Names and Identifiers" section of help("`") explains why this works) and assign the object to a new, syntactically validly named object.
Data <- `25383-0001-Data`

Maybe it has to do with the unusual use of dashes in the name and backquotes work:
names(`25383-0001-Data`)
Edit:
More for reference (since Joshua already answered the main question perfectly), you can also reassign an object from ls() (what Wilduck tried in the question) using get(). This might be useful if the object of the name contains very weird characters:
foo <- 1:5
bar <- get(ls()[1])
bar
[1] 1 2 3 4 5
This of course requires the index of foo in ls() to be [1], but looking up the index of the required object is not too hard.

Related

R: How far does it go? (Plus venting)

I have an object called defaultPacks, containing the names of packages installed on all the computers I use. Much abbreviated:
defaultPacks <- c(
"AER",
"plyr",
"dplyr"
)
I want to save this object to file in a shared directory all of them can reach. I am using Dropbox for this, with sync always paused when R is running.
save(defaultPacks,
file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata"))
Then I want to load the object and install the packages the names of which are in the object defaultPacks.
SyncPacks <- function(fileString){
defaultPacks <- load(file=fileString)
install.packages(defaultPacks, repos="http://cran.us.r-project.org")
}
SyncPacks(file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata")
If I do this, I get a warning:
Warning in install.packages: package ‘defaultPacks’ is not available (for R version 3.2.1)
I look what is in defaultPacks immediately after I load and assign it: the string "defaultPacks". So it seems to loading just be a string rather than an object.
So I go back to my save, and try
save(get(defaultPacks), file.path(etc.))
This gives me an different error:
Error in save(get("defaultPacks"), file = file.path("C:", "Users", "andrewH", :
object ‘get("defaultPacks")’ not found.
Then I tried dynGet() with the same result.
So where before it was treating a symbol as a string, now it is treating a function as a string.
So I try the list option for save:
save(list = defaultPacks, file = file.path(etc))
And get yet another error:
Error in save(list = defaultPacks, file = file.path("C:", "Users", "andrewH", :
objects ‘AER’, ‘plyr’, ‘dplyr’, (etc.) not found
So where before I couldn't get to my character vector, now I am shooting right past it, evaluating defaultPacks to find the strings, and then treating each string as a symbol, and evaluating it to its (nonexistent) object.
So, I want to know how to make this work. But I am asking for something more than that. I have this problem, or an analogous problem, all the time. After several years of using R, I still have it a couple of times a week. I don't know how many steps of evaluation R is going to take on any given occasion. I hand a function an object name, and the function treats it as a string. I hand a function a string, and the R function converts it to a symbol and tries to evaluate it. Here, I don't understand why the save function does not save the object I gave it, and then give it back with load.
I've read the discussions on scoping in ten different R books, from Chambers "Software for Data Analysis" to Wickham's "Advanced R." Twice. Four times in some cases. I know about the four environments of a function, and the difference between the call stack and the chain of environmental parents. And yet, it is clear that I am missing something basic. It is not just that I don't know why save does not take a name in its ... argument and save it as an object (unless the problem is at the load end). I don't know how I can know. The function description says, of the ...s, "the names of the objects to be saved (as symbols or character strings)." So why is it saving a name as a string? Or why is load returning a string, if save saved an object? And how could I predict that?
Experienced R programmers, I know you can tell in advance how a given R function is going to treat one of its arguments. You know how far it will be evaluated. You can make it go as far as you want it to, and then STOP. You don't have to write str()'s into your functions every time you want to figure out what the heck it thinks its arguments mean. How do you do it?
Bloody "R Inferno". It's an understatement.
One way of seeing the problem is to note that the value of defaultPacks changes from before to after these operations.
> fname = tempfile()
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> save(defaultPacks, file=fname)
> defaultPacks = load(fname)
> identical(orig, defaultPacks)
[1] FALSE
The problem starts with an understanding of what save() does. From ?save, the object that is saved is named defaultPacks and it has value c("AER", "plyr", "dplyr"). save() could save multiple objects, each with a name and associated value, so it somehow has to save the name of each object.
load() restores the objects that save() has written, and returns (from ?load) a "character vector of the names of objects created". In this case load() restores (creates in the global environment) the symbol defaultPacks, populates it with the character vector of default packages, and returns the name (i.e., character vector of length 1 "defaultPacks") of the object(s) it has restored. The return value then overwrites the restored value, and we have defaultPacks = "defaultPacks".
install.packages doesn't do anything fancy with it's first argument, which from ?install.packages is a "character vector of the names of packages whose current versions should be downloaded". The character vector happens to be the symbol defaultPacks, but the error comes from the value of the symbol, which is the character vector "defaultPacks".
save() and load() more or less have to work the way they do to support multiple objects. On the other hand saveRDS() and readRDS() (ok, why read instead of load?) have a contract to save a single object. The name of the saved object does not need to be stored to be able to recover the values associated with it. So saveRDS(defaultPacks, fname); defaultPacks = readRDS(fname) works, and in particular the value of defaultPacks before and after this series of operations remains unchanged.
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> saveRDS(defaultPacks, fname)
> defaultPacks = readRDS(fname)
> identical(orig, defaultPacks)
[1] TRUE
Without meaning to be too much of a jerk, the answer to the question "Experienced R programmers...how do you do it?" the answer is implied by the ? above -- by carefully reading the manual. Also, there are not that many places in base R code where evaluation is non-standards -- formulas and library are the main culprits -- so recognizing what the problem is not can help to focus on what is actually going on.

getting the name of a dataframe from loading a .rda file in R

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit
Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.
I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.
I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.
The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?
I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

Kindly check the R command

I am doing following in Cooccur library in R.
> fb<-read.table("Fb6_peaks.bed")
> f1<-read.table("F16_peaks.bed")
everything is ok with the first two commands and I can also display the data:
> fb
> f1
But when I give the next command as given below
> explore_pairs(c("fb", "f1"))
I get an error message:
Error in sum(sapply(tf1_s, score_sample, tf2_hits = tf2_s, hit_list = hit_l)) :
invalid 'type' (list) of argument
Could anyone suggest something?
Despite promising to release a version to the Bioconductor depository in the article the authors published over a year ago, they have still not delivered. The gz file that is attached to the article is not of a form that my installation recognizes. Your really should be corresponding with the authors for this question.
The nature of the error message suggests that the function is expecting a different data class. You should be looking at the specification for the arguments in the help(explore_pairs) file. If it is expecting 2 matrices, then wrapping data.matrix around the arguments may solve the problem, but if it is expecting a class created by one of that packages functions then you need to take the necessary step to construct the right objects.
The help file for explore_pairs does exist (at least in the MAN directory) and says the first argument should be a character vector with further provisos:
\arguments{
\item{factornames}{an vector of character strings, each naming a GFF-like
data frame containing the binding profile of a DNA-binding factor.
There is also a load utility, load_GFF, which I assume is designed for creation of such files.
Try rename your data frame:
names(fb)=c("seq","start","end")
Check the example datasets. The column names are as above. I set the names and it worked.

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.
1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading
In that case, the solution offered was:
attach(filename)
ls(pos = 2)
detach()
If there are naming conflicts between objects in the file and those in the global environment, this warning appears:
The following object(s) are masked _by_ '.GlobalEnv':
I tried creating a new environment, but I cannot seem to attach into that.
For instance, this produces the same error:
lsfile <- function(filename){
tmpEnv <- new.env()
evalq(attach(filename), envir = tmpEnv)
tmpls <- ls(pos = 2)
detach()
return(tmpls)
}
lsfile(filename)
Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?
2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?
For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?
I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.
Since this question has just been referenced let's clarify two things:
attach() simply calls load() so there is really no point in using it instead of load
if you want selective access to prevent masking it's much easier to simply load the file into a new environment:
e = local({load("foo.RData"); environment()})
You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.
FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)
I just use an env= argument to load():
> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
>
The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.
You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.
x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10
Thanks to #Dirk and #Joshua.
I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.
lsfile <- function(list_files){
aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
attach(list_files[ix])
tmpls <- ls(pos = 2)
return(tmpls)
}
return(aggregate_ls)
}
lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))
This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.
So, question #1 can be resolved by either ignoring the warnings (as #Joshua suggested) or by using whatever magic foreach summons.
For part 2, loading an object, I think #Joshua has the right idea - "get" will do.
The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.
#Joshua's answer is far simpler than my foreach detour.

Cleaning up the workspace "hiding" objects [duplicate]

This question already has answers here:
hiding personal functions in R
(7 answers)
Closed 9 years ago.
It's getting to the point where I have about 100 or so personal functions that I use for line by line data analysis. I generally use f.<mnemonic> nomenclature for my functions, but I'm finding that they're starting to get in the way of my work. Is there any way to hide them from the workspace? Such that ls() doesn't show them, but I can still use them?
If you have that many functions which you use on a repeated basis, consider putting them into a package. They can then live in their own namespace, which removes ls() clutter and also allows you to remove the f. prefix.
You can also put the function definitions into a separate environment, and then attach() that environment. (This is similar to Hong Ooi's suggestion, without the added step of making that into a loadable package.) I have this code in my .Rprofile file to set up some utility functions I commonly use:
local(env = my.fns, { # create a new env. all variables created below go into this env.
foo <- function (bar) {
# whatever foo does
}
# put as many function definitions here as you want
})
attach(my.fns)
All the functions inside my.fns are now available at the commandline, but the only thing that shows up in ls() is my.fns itself.
Try this to leave out the "f-dots":
fless <- function() { ls(env=.GlobalEnv)[!grepl("^f\\.", ls(env=.GlobalEnv) )]}
The ls() function looks at objects in an environment. If you only used (as I initially did) :
fless <- function() ls()[!grepl("^f\\.", ls())]
You get ... nothing. Adding .GlobalEnv moves the focus for ls() out to the usual workspace. The indexing is pretty straightforward. You are just removing (with the ! operator) anything that starts with "f." and since the "." is a special character in regex expressions, you need to escape it, ... and since the "\" is also a special character, the escape needs to be doubled.
A couple of options not already mentioned are
objects with names beginning with . are not shown by ls() (by default; you can turn this on with argument all.names = TRUE in the ls() call), so you could rename everything to .f.<mnemonic> in the source files.
In a similar vein to #Aaron's answer but use sys.source() to to source directly into an environment.
An example using sys.source() is shown below:
env <- attach(NULL, name = "myenv")
sys.source(fnames, env)
where fnames is a list of file names/paths from which to read your functions.

Resources