Hidden objects in list - r

I am creating a custom object for a package and I want to have a list of two objects, but for one of those elements to be 'hidden'
For example:
l = list(data = data.frame(a = 1:3, b = 4:6), hidden = list(obj1 = 1, obj2 = 2))
When I interact with the list I want to only interact with the data element and the other be only accessed specifically.
So, if i typed l
> l
a b
1 1 4
2 2 5
3 3 6
Which I can manage with a custom print method. But I also want to be able to do
> l[,1]
[1] 1 2 3
Which I don't think is possible with a custom print method.
I don't have any specific requirements for how the other element should be accessed, but something 'R friendly' I guess.
Is there a different class I should be using or creating a new class? Any advice would be appreciated.

You could indeed define a custom class for your object. Let
class(l) <- "myclass"
Then you may define custom-specific methods for your functions of interest. For instance, in the case of l[, 1] we have
`[.myclass` <- function(x, ...) `[`(x[[1]], ...)
which takes this double list and then calls the usual [ function on the first list element:
l[, 1]
# [1] 1 2 3
The same can be done with other functions, such as print:
fun.myclass <- function(x, ...) fun(x[[1]], ...)
And you still can always access the second object in the usual way,
l$hidden
# $obj1
# [1] 1
#
# $obj2
# [1] 2

I think it would be cleaner for you to use attributes :
l <- list(data = data.frame(a = 1:3, b = 4:6),
hidden = list(obj1 = 1, obj2 = 2))
foo <- function(x){
attr(x$data,"hidden") <- x$hidden
x$data
}
l <- foo(l)
l
# a b
# 1 1 4
# 2 2 5
# 3 3 6
l[,1]
# [1] 1 2 3
attr(l,"hidden")
#
# [1] 1
#
#
# [1] 2
#

Related

R: How to interpolate a string variable into a vector [duplicate]

I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1

Difference between dataframe's $ and [] functions

How & Why, are a dataframe's $ and [] functions different when assigning values.
Can I tweak the abc.df[,"b"] = get("b") line to have same effect as abc.df$b = get("b")
abc.df = NULL
a = 1:10
abc.df = data.frame(a)
b_vector = 11:20
b_list = rep(list(c(1,2)),10)
sp_colmns1 = c("b_vector")
# This works :
abc.df$b_vector_method1 = get(sp_colmns1) # Method 1
abc.df[,"b_vector_method2"] = get(sp_colmns1) # Method 2
print(abc.df)
sp_colmns2 = c("b_list")
# Similarly :
# The same code as above, but does not work
# Only difference is b_list is a list
abc.df$b_list_method1 = get(sp_colmns2) # Method 1 (Works)
# TODO: Need to get the reason for & Solve the error on following line
# abc.df[,"b_list_method2"] = get(sp_colmns2) # Method 2 (Doesnt work)
print(abc.df)
You could add the list with any name "new" and change the column name in a second step with the string you saved somewhere else.
abc.df$new <- get(sp_colmns2)
names(abc.df)[which(names(abc.df) == "new")] <- "b_list_method2"
# > head(abc.df)
# a b_list_method2
# 1 1 1, 2
# 2 2 1, 2
# 3 3 1, 2
# 4 4 1, 2
# 5 5 1, 2
# 6 6 1, 2
After quite a lot of trial and error, this seems to work.
The solution turns out to be quite a simple one...
list(get(sp_colmns2)) instead of get(sp_colmns2)
abc.df = NULL
a = 1:10
abc.df = data.frame(a)
b_vector = 11:20
b_list = rep(list(c(1,2)),10)
sp_colmns1 = c("b_vector")
# This works :
abc.df$b_vector_method1 = get(sp_colmns1) # Method 1
abc.df[,"b_vector_method2"] = get(sp_colmns1) # Method 2
print(abc.df)
sp_colmns2 = c("b_list")
# Similarly :
# The same code as above, but does not work
# Only difference is b_list is a list
abc.df$b_list_method1 = get(sp_colmns2) # Method 1 (Works)
# TODO: Need to get the reason for & Solve the error on following line
abc.df[,"b_list_method2"] = list(get(sp_colmns2)) # Method 2 (Doesnt work)
print(abc.df)

Conditional lapply

So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6

Create a numeric vector with names in one statement?

I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1

How to write an R function that evaluates an expression within a data-frame

Puzzle for the R cognoscenti: Say we have a data-frame:
df <- data.frame( a = 1:5, b = 1:5 )
I know we can do things like
with(df, a)
to get a vector of results.
But how do I write a function that takes an expression (such as a or a > 3) and does the same thing inside. I.e. I want to write a function fn that takes a data-frame and an expression as arguments and returns the result of evaluating the expression "within" the data-frame as an environment.
Never mind that this sounds contrived (I could just use with as above), but this is just a simplified version of a more complex function I am writing. I tried several variants ( using eval, with, envir, substitute, local, etc) but none of them work. For example if I define fn like so:
fn <- function(dat, expr) {
eval(expr, envir = dat)
}
I get this error:
> fn( df, a )
Error in eval(expr, envir = dat) : object 'a' not found
Clearly I am missing something subtle about environments and evaluation. Is there a way to define such a function?
The lattice package does this sort of thing in a different way. See, e.g., lattice:::xyplot.formula.
fn <- function(dat, expr) {
eval(substitute(expr), dat)
}
fn(df, a) # 1 2 3 4 5
fn(df, 2 * a + b) # 3 6 9 12 15
That's because you're not passing an expression.
Try:
fn <- function(dat, expr) {
mf <- match.call() # makes expr an expression that can be evaluated
eval(mf$expr, envir = dat)
}
> df <- data.frame( a = 1:5, b = 1:5 )
> fn( df, a )
[1] 1 2 3 4 5
> fn( df, a+b )
[1] 2 4 6 8 10
A quick glance at the source code of functions using this (eg lm) can reveal a lot more interesting things about it.
A late entry, but the data.table approach and syntax would appear to be what you are after.
This is exactly how [.data.table works with the j, i and by arguments.
If you need it in the form fn(x,expr), then you can use the following
library(data.table)
DT <- data.table(a = 1:5, b = 2:6)
`[`(x=DT, j=a)
## [1] 1 2 3 4 5
`[`(x=DT, j=a * b)
## [1] 2 6 12 20 30
I think it is easier to use in more native form
DT[,a]
## [1] 1 2 3 4 5
and so on. In the background this is using substitute and eval
?within might also be of interest.
df <- data.frame( a = 1:5, b = 1:5 )
within(df, cx <- a > 3)
a b cx
1 1 1 FALSE
2 2 2 FALSE
3 3 3 FALSE
4 4 4 TRUE
5 5 5 TRUE

Resources