How to open an R data file in R window - r

I have some data in R that I intend to analyze. However, the file is not displaying the data. Instead, It is only showing a variable in the data. The following is the procedure I used to load the data and the output produced.
load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data=load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data
[1] "HeightFinger"
How do I get to view the data?

If you read ?help, it says that the return value of load is:
A character vector of the names of objects created, invisibly.
This suggests (but admittedly does not state) that the true work of the load command is by side-effect, in that it inserts the objects into an environment (defaulting to the current environment, often but not always .GlobalEnv). You should immediately have access to them from where you called load(...).
For instance, if I can guess at variables you might have in your rda file:
x
# Error: object 'x' not found
# either one of these on windows, NOT BOTH
dat = load("C:\\Users\\user\\AppData\\Local\\Temp\\1_29_923-Macdonell.RData")
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData")
dat
# [1] "x" "y" "z"
x
# [1] 42
If you want them to be not stored in the current environment, you can set up an environment to place them in. (I use parent=emptyenv(), but that's not strictly required. There are some minor ramifications to not including that option, none of them earth-shattering.)
myenv <- new.env(parent = emptyenv())
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData",
envir = myenv)
dat
# [1] "x" "y" "z"
x
# Error: object 'x' not found
ls(envir = myenv)
# [1] "x" "y" "z"
From here you can get at your data in any number of ways:
ls.str(myenv) # similar in concept to str() but for environments
# x : num 42
# y : num 1
# z : num 2
myenv$x
# [1] 42
get("x", envir = myenv)
# [1] 42
Side note:
You may have noticed that I used dat as my variable name instead of data. Though you are certainly allowed to use that, it can bite you if you use variable names that match existing variables or functions. For instance, all of your code will work just fine as long as you load your data. If, however, you run some of your code without pre-loading your objects into your data variable, you'll likely get an error such as:
mean(data$x)
# Error in data$x : object of type 'closure' is not subsettable
That error message is not immediately self-evident. The problem is that if not previously defined as in your question, then data here refers to the function data. In programming terms, a closure is a special type of function, so the error really should have said:
# Error in data$x : object of type 'function' is not subsettable
meaning that though dat can be subsetted and dat$x means something, you cannot use the $ subset method on a function itself. (You can't do mean$x when referring to the mean function, for example.) Regardless, even though this here-modified error message is less confusing, it is still not clearly telling you what/where the problem is located.
Because of this, many seasoned programmers will suggest you use unique variable names (perhaps more than just x :-). If you use my suggestion and name it dat instead, then the mistake of not preloading your data will instead error with:
mean(dat$x)
# Error in mean(dat$x) : object 'dat' not found
which is a lot more meaningful and easier to troubleshoot.

There are two ways to save R objects, and you've got them mixed up. In the first way, you save() any collection of objects in an environment to a file. When you load() that file, those objects are re-created with their original names in your current environment. This is how R saves and resotres workspaces.
The second way stores (serializes) a single R object into a file with the saveRDS() function, and recreates it in your environment with the readRDS() function. If you don't assign the results of readRDS(), it'll just print to your screen and drift away.
Examples below:
# Make a simple dataframe
testdf <- data.frame(x = 1:10,
y = rnorm(10))
# Save it out using the save() function
savedir <- tempdir()
savepath <- file.path(savedir, "saved.Rdata")
save(testdf, file = savepath)
# Delete it
rm(testdf)
# Load without assigning - and it's back in your environment
load(savepath)
testdf
# But if you assign the results of load, you just get the name of the object
wrong <- load(savepath)
wrong
# Compare with the RDS:
rds_path <- file.path(savedir, "testdf.rds")
saveRDS(testdf, file = rds_path)
rm(testdf)
testdf <- readRDS(file = rds_path)
testdf
Why the two different approaches? The save()-environment approach is good for creating a checkpoint of your entire environment that you can restore later - that's what R uses it for - but that's about it. It's too easy for such an environment to get cluttered, and if an object you load() has the same name as an object in your current environment, it will overwrite that object:
testdf$z <- "blah"
load(savepath)
testdf # testdf$z is gone
The RDS method lets you assign the name on read, as you're looking to do here. It's a little more annoying to save multiple objects, sure, but you probably shouldn't be saving objects very often anyway - recreating objects from scratch is the best way to ensure that your R code does what you think it does.

Related

How do you solve "could not find function "deparse<-" | "as.name<-" | "eval<-"" errors when trying to dynamically name dataframes in R? [duplicate]

I am using R to parse a list of strings in the form:
original_string <- "variable_name=variable_value"
First, I extract the variable name and value from the original string and convert the value to numeric class.
parameter_value <- as.numeric("variable_value")
parameter_name <- "variable_name"
Then, I would like to assign the value to a variable with the same name as the parameter_name string.
variable_name <- parameter_value
What is/are the function(s) for doing this?
assign is what you are looking for.
assign("x", 5)
x
[1] 5
but buyer beware.
See R FAQ 7.21
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f
You can use do.call:
do.call("<-",list(parameter_name, parameter_value))
There is another simple solution found there:
http://www.r-bloggers.com/converting-a-string-to-a-variable-name-on-the-fly-and-vice-versa-in-r/
To convert a string to a variable:
x <- 42
eval(parse(text = "x"))
[1] 42
And the opposite:
x <- 42
deparse(substitute(x))
[1] "x"
The function you are looking for is get():
assign ("abc",5)
get("abc")
Confirming that the memory address is identical:
getabc <- get("abc")
pryr::address(abc) == pryr::address(getabc)
# [1] TRUE
Reference: R FAQ 7.21 How can I turn a string into a variable?
Use x=as.name("string"). You can use then use x to refer to the variable with name string.
I don't know, if it answers your question correctly.
strsplit to parse your input and, as Greg mentioned, assign to assign the variables.
original_string <- c("x=123", "y=456")
pairs <- strsplit(original_string, "=")
lapply(pairs, function(x) assign(x[1], as.numeric(x[2]), envir = globalenv()))
ls()
assign is good, but I have not found a function for referring back to the variable you've created in an automated script. (as.name seems to work the opposite way). More experienced coders will doubtless have a better solution, but this solution works and is slightly humorous perhaps, in that it gets R to write code for itself to execute.
Say I have just assigned value 5 to x (var.name <- "x"; assign(var.name, 5)) and I want to change the value to 6. If I am writing a script and don't know in advance what the variable name (var.name) will be (which seems to be the point of the assign function), I can't simply put x <- 6 because var.name might have been "y". So I do:
var.name <- "x"
#some other code...
assign(var.name, 5)
#some more code...
#write a script file (1 line in this case) that works with whatever variable name
write(paste0(var.name, " <- 6"), "tmp.R")
#source that script file
source("tmp.R")
#remove the script file for tidiness
file.remove("tmp.R")
x will be changed to 6, and if the variable name was anything other than "x", that variable will similarly have been changed to 6.
I was working with this a few days ago, and noticed that sometimes you will need to use the get() function to print the results of your variable.
ie :
varnames = c('jan', 'feb', 'march')
file_names = list_files('path to multiple csv files saved on drive')
assign(varnames[1], read.csv(file_names[1]) # This will assign the variable
From there, if you try to print the variable varnames[1], it returns 'jan'.
To work around this, you need to do
print(get(varnames[1]))
If you want to convert string to variable inside body of function, but you want to have variable global:
test <- function() {
do.call("<<-",list("vartest","xxx"))
}
test()
vartest
[1] "xxx"
Maybe I didn't understand your problem right, because of the simplicity of your example. To my understanding, you have a series of instructions stored in character vectors, and those instructions are very close to being properly formatted, except that you'd like to cast the right member to numeric.
If my understanding is right, I would like to propose a slightly different approach, that does not rely on splitting your original string, but directly evaluates your instruction (with a little improvement).
original_string <- "variable_name=\"10\"" # Your original instruction, but with an actual numeric on the right, stored as character.
library(magrittr) # Or library(tidyverse), but it seems a bit overkilled if the point is just to import pipe-stream operator
eval(parse(text=paste(eval(original_string), "%>% as.numeric")))
print(variable_name)
#[1] 10
Basically, what we are doing is that we 'improve' your instruction variable_name="10" so that it becomes variable_name="10" %>% as.numeric, which is an equivalent of variable_name=as.numeric("10") with magrittr pipe-stream syntax. Then we evaluate this expression within current environment.
Hope that helps someone who'd wander around here 8 years later ;-)
Other than assign, one other way to assign value to string named object is to access .GlobalEnv directly.
# Equivalent
assign('abc',3)
.GlobalEnv$'abc' = 3
Accessing .GlobalEnv gives some flexibility, and my use case was assigning values to a string-named list. For example,
.GlobalEnv$'x' = list()
.GlobalEnv$'x'[[2]] = 5 # works
var = 'x'
.GlobalEnv[[glue::glue('{var}')]][[2]] = 5 # programmatic names from glue()

How to get the underling data of a MA plot?

I would like to use the plotMA function of limma.
The example of the documentation works fine:
A <- runif(1000,4,16)
y <- A + matrix(rnorm(1000*3,sd=0.2),1000,3)
status <- rep(c(0,-1,1),c(950,40,10))
y[,1] <- y[,1] + status
plotMA(y, array=1, status=status, values=c(-1,1), hl.col=c("blue","red"))
Now I would like to access the underlying data that is used for the plot as I would like to use the data in a different context, not just the plot. I currently don't see a way to access the data; of course I could implement the method myself and only use the data, but that feels wrong.
Is there a way to access the underlying data used for the MA plot?
Looking at the code of plotMA we see that several variables are created and used for plotting. These variables are not returned however.
You could now copy and paste the function to write your own function, which plots and returns the data. This is however, error-prone,if there is a new version of the function you may rely on old code.
So what you can do instead is to use trace to insert arbitrary code into plotMA notably some code which stores the data in your global environment. I illustrate the idea with a toy example:
f <- function(x) {
y <- x + rnorm(length(x))
plot(x, y)
invisible()
}
If we would like to use y in this function we could do something like this
trace(f, exit = quote(my_y <<- y))
# [1] "f"
ls()
# [1] "f"
f(1:10)
# Tracing f(1:10) on exit
ls()
# [1] "f" "my_y"
And now we can access my_y.
What you should do:
Look at the code of plotMA
Identify which part of the data you need (e.g. x, y and sel)
Use trace(plotMA, exit = quote({my_data <<- list(x, y, sel)}), where = asNamespace("limma"))
Run plotMA
Access the data via my_data
Note. Check out ?trace to fully understand the possibilities of it. In particular, if you want to inject your code not at the end (exit) but at another psoition (maybe because intermediate variables are overwritten and you need the first results) for which you would need to use the at parameter of trace
Update
Maybe the easiest is to get a full dump of all local variables defined in the function:
trace("plotMA", exit = quote(var_dump <<- mget(ls())), where = asNamespace("limma"))

rredis nested calls cause error .redisError("Invalid argument") in redisZRange() (and probably other functions as well)

Having the following sorted set (just for a test)
library(rredis)
redisConnect()
redisZAdd("AsomepossiblychangedtextSensor", 1, "w")
redisZAdd("AsomepossiblychangedtextSensor", 1, "x")
redisZAdd("AsomepossiblychangedtextSensor", 2, "y")
it's possible to select it using
redisZRange("AsomepossiblychangedtextSensor")
Imagine that the text between "A" and "Sensor" can change one could show the name of the key like this:
redisKeys("A*Sensor")
which returns the full name "AsomepossiblychangedtextSensor".
If I want to combine it and show this set
redisZRange(redisKeys("A*Sensor"))
an error is returned
.redisError("Invalid argument") : Invalid argument
It is caused by f <- match.call() in .redisCmd which takes the call from redisKeys
and can be solved (workaround) by storing the key in an R object
ak <- redisKeys("A*Sensor")
redisZRange(ak)
Is there a better solution for this problem? In the comments we see
# We use match.call here instead of, for example, as.list() to try to
# avoid making unnecessary copies of (potentially large) function arguments.

Why doesn't rm(ls()) work and "list" is needed?

I just got started learning R with the "datacamp" site and I ran into a syntax misunderstanding at the beginning.
It says that rm(list = ls()) is a very useful command to clear everything from your workspace but I don't understand what list = is for.
a. They haven't yet taught me the meaning of = in R and I didn't find an explanation at the documentation. = is like <-? What's the difference?
b. If the input of rm() can be a list of variables names, and the output of ls() is a list of var names, why can't I just use rm(ls())?
Passing arguments by position vs name
The = symbol plays a special role in naming arguments to a function call.
Consider two essentially identical functions:
f <- function(..., y=3) (2+sum(...))^y
g <- function(y=3, ...) (2+sum(...))^y
If y= is not named, the results are generally different:
f(y=5) # 32
g(y=5) # 32
f(5) # 343
g(5) # 32
rm is like f -- type ?rm to see -- so if you want to call rm(list = ls()), write it out in full.
Representing object names
In most of R, if you write f(g()), evaluation flows naturally:
g() is evaluated to 8 and substituted into f(g()) for f(8)
f(8) is evaluated to 1000
rm breaks this pattern in its unnamed ... arguments, which basically just exist for interactive use. Only manually typed variable names are allowed.† As a result, rm(ls()) won't run.
Hadley Wickham provides another nice example:
ggplot2 <- "plyr"
library(ggplot2) # loads ggplot2, not plyr!
† Okay, you can use the ... without manually typed names, like
do.call(library, as.list(ggplot2)) # loads plyr!
but don't mess with that unless you know what you're doing.

Vectorize Environment Access in R

So I have created an environment (which I am trying to use as a hashtable).
To clarify I'm accessing the values stored in the environment with this:
hash[["uniqueIDString"]] ## hash takes a uniqueID and returns a
## dataframe subset that is precomputed
I also have a function called func which returns some subset of the rows returned by hash. It works fine for single calls but it isn't vectorized so I can't use it in a transform which is kind of vital.
The following doesn't work:
df <- transform(df,FOO = func(hash[[ID]])$FOO)
It gives me an error about having the wrong number of arguments for the hash which I presume is because it's passing a vector of IDs to my environment and the environment doesn't know what to do.
EDIT: The exact error is:
Error in hash[[ID]] :
wrong arguments for subsetting an environment
EDIT: With Rob's suggestion I receive the following error:
Error in hash[ID] :
object of type 'environment' is not subsettable
EDIT: For clarification I'm attempting to use the hash lookup in the context of a transform where the values in the ID column are looked up in the hashtable and passed to func so that the output can become a new column.
I use environments as hash tables a lot, the way to retrieve values corresponding to multiple keys is to usemget:
hash <- new.env()
hash[['one']] <- 'this is one'
hash[['two']] <- 'this is two'
mget(c('one', 'two'), envir = hash)
which returns a list
$one
[1] "this is one"
$two
[1] "this is two"
If you need the output as a vector, use unlist:
unlist(mget(c('one', 'two'), envir = hash))
gives you
one two
"this is one" "this is two"
UPDATE If your IDs come from a data frame, you'd need to use unlist to convert that column to vector:
df <- data.frame(id=c('one', 'two'))
unlist(mget(unlist(df$id), envir = hash))

Resources