I am wondering if it is possible to set vector names by reference in R.
I often use data.table::fread to read text files, and then I clean up the variable names by wrapping setnames (which also works on a plain data.frame) and a string cleanup function similar to:
clean_var_name <- function(s) {
gsub("^_+|_+$","",gsub("(\\s|\\-|[[:punct:]])+", "_", tolower(s) ) )
}
so my function looks like:
clean_names <- function(x){
require(data.table)
if(is.data.frame(x)){setnames(x, names(x), clean_var_name(names(x)))} # this part works
else if(is.vector(x)){ do_something_here } # this is the question
}
I'm wondering if there is a way to include the case of vectors in the same function in a way that performs names(x) <- clean_var_name(names(x)) by reference.
v <- c(`thIs.Is.A.Terrible-Name`=1, `this One is TOO`=2)
dt <- data.table(t(v))
clean_names(dt)
dt
# this_is_a_terrible_name this_one_is_too
# 1: 1 4
# would like to be able to do same for clean_names(v)
I'm also open to explanations of why this is a bad idea (side effects, functional programming, etc.)
Use setattr function:
library(data.table)
x <- 1:10
address(x)
# [1] "0x713cfd0"
setattr(x,"names",letters[1:10])
address(x)
# [1] "0x713cfd0"
Related
I have a data.frame with columns:
names(data) = ("newid","Player.WR","data_col.WR","Trend.WR","Player.QB","data_col.QB","Trend.QB","Player.RB","data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
However, I need to flip the first and second portions of each name at the period so it looks like this:
names(data) = ("newid", "WR.Player", "WR.data_col", "WR.Trend", "QB.Player", "QB.data_col", "QB.Trend", "RB.Player", "RB.data_col", "RB.Trend", "TE.Player", "TE.data_col", "TE.Trend")
My initial thought was to try to do a strsplit and then somehow do an lapply statement to reorder, but I wasn't sure how to make the lapply work.
Thanks!
With a vector of names v, you could also try:
v <- c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE")
gsub(
'(.*)\\.(.*)',
'\\2\\.\\1',
v
)
Output:
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend" "RB.Player"
[9] "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
And to directly assign it to names:
names(data) <- gsub('(.*)\\.(.*)', '\\2\\.\\1', v)
I would suggest next approach using a function to exchange position of values and lapply():
#Data
vec <- c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
#Split
L <- lapply(vec,strsplit,split='\\.')
#Format function
myfun <- function(x)
{
y <- x[[1]]
#if check
if(length(y)!=1)
{
z <- paste0(y[c(2,1)],collapse = '.')
} else
{
z <- y
}
return(z)
}
#Apply
L2 <- lapply(L,FUN = myfun)
#Bind
do.call(c,L2)
Output:
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend"
[8] "RB.Player" "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
Last output can be saved in a new vector like vecnamesnew <- do.call(c,L2)
Arg0naut91's answer is quite concise, and I would recommend using Arg0naut91's approach. However, for the sake of providing a (somewhat) concise solution using strsplit and lapply with (perhaps) a bit more readability for those unfamiliar with gsub syntax, I submit the following:
names<-c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
newnames<-lapply(names,function(x) paste(rev(unlist(strsplit(x,split="\\."),use.names=FALSE)),collapse="."))
print(newnames)
which yields
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend"
[8] "RB.Player" "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
as output.
For example, suppose I would like to be able to define a function that returned the name of the assignment variable concatenated with the first argument:
a <- add_str("b")
a
# "ab"
The function in the example above would look something like this:
add_str <- function(x) {
arg0 <- as.list(match.call())[[1]]
return(paste0(arg0, x))
}
but where the arg0 line of the function is replaced by a line that will get the name of the variable being assigned ("a") rather than the name of the function.
I've tried messing around with match.call and sys.call, but I can't get it to work. The idea here is that the assignment operator is being called on the variable and the function result, so that should be the parent call of the function call.
I think that it's not strictly possible, as other solutions explained, and the reasonable alternative is probably Yosi's answer.
However we can have fun with some ideas, starting simple and getting crazier gradually.
1 - define an infix operator that looks similar
`%<-add_str%` <- function(e1, e2) {
e2_ <- e2
e1_ <- as.character(substitute(e1))
eval.parent(substitute(e1 <- paste0(e1_,e2_)))
}
a %<-add_str% "b"
a
# "ab"
2 - Redefine := so that it makes available the name of the lhs to the rhs through a ..lhs() function
I think it's my favourite option :
`:=` <- function(lhs,rhs){
lhs_name <- as.character(substitute(lhs))
assign(lhs_name,eval(substitute(rhs)), envir = parent.frame())
lhs
}
..lhs <- function(){
eval.parent(quote(lhs_name),2)
}
add_str <- function(x){
res <- paste0(..lhs(),x)
res
}
a := add_str("b")
a
# [1] "ab"
There might be a way to redefine <- based on this, but I couldn't figure it out due to recursion issues.
3 - Use memory address dark magic to hunt lhs (if it exists)
This comes straight from: Get name of x when defining `(<-` operator
We'll need to change a bit the syntax and define the function fetch_name for this purpose, which is able to get the name of the rhs from a *<- function, where as.character(substitute(lhs)) would return "*tmp*".
fetch_name <- function(x,env = parent.frame(2)) {
all_addresses <- sapply(ls(env), pryr:::address2, env)
all_addresses <- all_addresses[names(all_addresses) != "*tmp*"]
all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)
x_address <- tracemem(x)
untracemem(x)
x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))
ind <- match(x_address_short, all_addresses_short)
x_name <- names(all_addresses)[ind]
x_name
}
`add_str<-` <- function(x,value){
x_name <- fetch_name(x)
paste0(x_name,value)
}
a <- NA
add_str(a) <- "b"
a
4- a variant of the latter, using .Last.value :
add_str <- function(value){
x_name <- fetch_name(.Last.value)
assign(x_name,paste0(x_name,value),envir = parent.frame())
paste0(x_name,value)
}
a <- NA;add_str("b")
a
# [1] "ab"
Operations don't need to be on the same line, but they need to follow each other.
5 - Again a variant, using a print method hack
Extremely dirty and convoluted, to please the tortured spirits and troll the others.
This is the only one that really gives the expected output, but it works only in interactive mode.
The trick is that instead of doing all the work in the first operation I also use the second (printing). So in the first step I return an object whose value is "b", but I also assigned a class "weird" to it and a printing method, the printing method then modifies the object's value, resets its class, and destroys itself.
add_str <- function(x){
class(x) <- "weird"
assign("print.weird", function(x) {
env <- parent.frame(2)
x_name <- fetch_name(x, env)
assign(x_name,paste0(x_name,unclass(x)),envir = env)
rm(print.weird,envir = env)
print(paste0(x_name,x))
},envir = parent.frame())
x
}
a <- add_str("b")
a
# [1] "ab"
(a <- add_str("b") will have the same effect as both lines above. print(a <- add_str("b")) would also have the same effect but would work in non interactive code, as well.
This is generally not possible because the operator <- is actually parsed to a call of the <- function:
rapply(as.list(quote(a <- add_str("b"))),
function(x) if (!is.symbol(x)) as.list(x) else x,
how = "list")
#[[1]]
#`<-`
#
#[[2]]
#a
#
#[[3]]
#[[3]][[1]]
#add_str
#
#[[3]][[2]]
#[1] "b"
Now, you can access earlier calls on the call stack by passing negative numbers to sys.call, e.g.,
foo <- function() {
inner <- sys.call()
outer <- sys.call(-1)
list(inner, outer)
}
print(foo())
#[[1]]
#foo()
#[[2]]
#print(foo())
However, help("sys.call") says this (emphasis mine):
Strictly, sys.parent and parent.frame refer to the context of the
parent interpreted function. So internal functions (which may or may
not set contexts and so may or may not appear on the call stack) may
not be counted, and S3 methods can also do surprising things.
<- is such an "internal function":
`<-`
#.Primitive("<-")
`<-`(x, foo())
x
#[[1]]
#foo()
#
#[[2]]
#NULL
As Roland pointed, the <- is outside of the scope of your function and could only be located looking at the stack of function calls, but this fail. So a possible solution could be to redefine the '<-' else than as a primitive or, better, to define something that does the same job and additional things too.
I don't know if the ideas behind following code can fit your needs, but you can define a "verbose assignation" :
`:=` <- function (var, value)
{
call = as.list(match.call())
message(sprintf("Assigning %s to %s.\n",deparse(call$value),deparse(call$var)))
eval(substitute(var <<- value))
return(invisible(value))
}
x := 1:10
# Assigning 1:10 to x.
x
# [1] 1 2 3 4 5 6 7 8 9 10
And it works in some other situation where the '<-' is not really an assignation :
y <- data.frame(c=1:3)
colnames(y) := "b"
# Assigning "b" to colnames(y).
y
# b
#1 1
#2 2
#3 3
z <- 1:4
dim(z) := c(2,2)
#Assigning c(2, 2) to dim(z).
z
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
>
I don't think the function has access to the variable it is being assigned to. It is outside of the function scope and you do not pass any pointer to it or specify it in any way. If you were to specify it as a parameter, you could do something like this:
add_str <- function(x, y) {
arg0 <-deparse(substitute(x))
return(paste0(arg0, y))
}
a <- 5
add_str(a, 'b')
#"ab"
I have created a function 'mywsobj' which takes one input & 2 output
user input: environment
output1: a data.frame name "wsobj"
(data of list of objects,their classes,their memory usage)
output to console
output2: a barplot of list of objects and their memory usage based.
So far all ok,
My question1: but How to save that "wsobj" data.frame from inside the function in that user-specified input environment or at least in the .GlobalEnv ?
I tried reading things like: use of <<, use of pos/parent.environment etc.
but things are so far beyond me.
My question2: Is it possible to specify project-specific/user-specfic environment in R/specially in Rstudio server ?
Though my code here may not matter still it is below:
# creating SOME data
dfAtoZ<- data.frame(LETTERS)
df1to1Cr <- data.frame(1:10000000)
vec1to1Cr <- as.vector(1:10000000)
mat1to1Cr <- as.matrix(1:10000000)
set.seed<-10
randvec<-runif(1000,min=-100,max=100)
# creating MY function
mywsobj<-function(myenvironmentName)
{#step1 creating vector of object names
wslist <- vector(length=length(ls(myenvironmentName)))
for(i in 1:length(ls(myenvironmentName)))
{wslist[i]<-ls(myenvironmentName)[i]}
# wslist<-cbind(unlist(ls()))
#step2 creating vector of object classes
wsclass <- vector(length=length(wslist))
wsmemKb <- vector(mode="numeric",length=length(wslist))
for(i in 1:length(wslist))
{wsclass[i]<-class(get(wslist[i]))
wsmemKb[i]<- object.size(get(wslist[i]))/1000*1024}
#step4 combining them in a data.frame
wobj<-data.frame(cbind(wslist,wsclass,wsmemKb))
# library(sqldf)
# sqldf("pragma table_info(wobj)") shows col 3(wsmem) still non-numeric
wobj[,3] <- as.numeric( as.character(wobj[,3]) )
# create data to return matrix of memory consumption
objmemsize <- rev(sort(sapply(ls(envir=myenvironmentName),
function (object.name)object.size(get(object.name))/1000)))
# draw bar plot
barplot(objmemsize,main="Memory usage by object in myenvironment",
ylab="KiloByte", xlab="Variable name",
col=heat.colors(length(objmemsize)))
# result <- sqldf("select * from wobj order by 1/wsmemKb")
return(wobj)
# return(data.frame(myenvironmentName,wobj))
# return(assign("myname",wobj,envir = .GlobalEnv))
# attach(wobj,pos=2,"wobj")
return(barplot)
}
# use of mywsobj function
mywsobj(.GlobalEnv)
# saving output of mywsobj function
output<-as.data.frame(mywsobj(.GlobalEnv))
Not sure if this is what you're after. But, you can assign values to that environment with $.
my_fun <- function(in.env) {
# you may want to check if input argument is an environment
# do your computations
set.seed(45)
x <- sample(10)
y <- runif(10)
in.env$val <- sum(x*y)
}
my_fun(my.env <- new.env())
ls(my.env)
[1] "val"
my.env$val
# [1] 22.30493
Alternatively, you can also use assign as follows:
my_fun <- function(in.env) {
# you may want to check if input argument is an environment
# do your computations
set.seed(45)
x <- sample(10)
y <- runif(10)
assign("val", sum(x*y), envir=in.env)
}
# assign to global environment
my_fun(globalenv())
> val
# [1] 22.30493
# assign to local environment, say, v
v <- new.env()
my_fun(v)
> v$val
# [1] 22.30493
I want to use information from a field and include it in a R function, e.g.:
data #name of the data.frame with only one raw
"(if(nclusters>0){OptmizationInputs[3,3]*beta[1]}else{0})" # this is the raw
If I want to use this information inside a function how could I do it?
Another example:
A=c('x^2')
B=function (x) A
B(2)
"x^2" # this is the return. I would like to have the return something like 2^2=4.
Use body<- and parse
A <- 'x^2'
B <- function(x) {}
body(B) <- parse(text = A)
B(3)
## [1] 9
There are more ideas here
Another option using plyr:
A <- 'x^2'
library(plyr)
body(B) <- as.quoted(A)[[1]]
> B(5)
[1] 25
A <- "x^2"; x <- 2
BB <- function(z){ print( as.expression(do.call("substitute",
list( parse(text=A)[[1]], list(x=eval(x) ) )))[[1]] );
cat( "is equal to ", eval(parse(text=A)))
}
BB(2)
#2^2
#is equal to 4
Managing expressions in R is very weird. substitute refuses to evaluate its first argument so you need to use do.call to allow the evaluation to occur before the substitution. Furthermore the printed representation of the expressions hides their underlying representation. Try removing the fairly cryptic (to my way of thinking) [[1]] after the as.expression(.) result.
In improving an rbind method, I'd like to extract the names of the objects passed to it so that I might generate unique IDs from those.
I've tried all.names(match.call()) but that just gives me:
[1] "rbind" "deparse.level" "..1" "..2"
Generic example:
rbind.test <- function(...) {
dots <- list(...)
all.names(match.call())
}
t1 <- t2 <- ""
class(t1) <- class(t2) <- "test"
> rbind(t1,t2)
[1] "rbind" "deparse.level" "..1" "..2"
Whereas I'd like to be able to retrieve c("t1","t2").
I'm aware that in general one cannot retrieve the names of objects passed to functions, but it seems like with ... it might be possible, as substitute(...) returns t1 in the above example.
I picked this one up from Bill Dunlap on the R Help List Serve:
rbind.test <- function(...) {
sapply(substitute(...()), as.character)
}
I think this gives you what you want.
Using the guidance here How to use R's ellipsis feature when writing your own function?
eg substitute(list(...))
and combining with with as.character
rbind.test <- function(...) {
.x <- as.list(substitute(list(...)))[-1]
as.character(.x)
}
you can also use
rbind.test <- function(...){as.character(match.call(expand.dots = F)$...)}