Succinctly assign names and values simultaneously - r

I find myself often writing the following two lines. Is there a succinct alternative?
newObj <- vals
names(newObj) <- nams
# This works, but is ugly and not necessarily preferred
'names<-'(newObj <- vals, nams)
I'm looking for something similar to this (which of course does not work):
newObj <- c(nams = vals)
Wrapping it up in a function is an option as well, but I am wondering if the functionality might already be present.
sample data
vals <- c(1, 2, 3)
nams <- c("A", "B", "C")

You want the setNames function
# Your example data
vals <- 1:3
names <- LETTERS[1:3]
# Using setNames
newObj <- setNames(vals, names)
newObj
#A B C
#1 2 3

The names<- method often (if not always) copies the object internally. setNames is simply a wrapper for names<-,
If you want to assign names and values succinctly in code and memory, then the setattr function, from either the bit or data.table packages will do this by reference (no copying)
eg
library(data.table) # or library(bit)
setattr(vals, 'names', names)
Perhaps slightly less succinct, but you could write yourself a simple wrapper
name <- function(x, names){ setattr(x,'names', names)}
val <- 1:3
names <- LETTERS[1:3]
name(val, names)
# and it has worked!
val
## A B C
## 1 2 3
Note that if you assign to a new object, both the old and new object will have the names!

Related

How do I reference a function parameter inside inside a data.table with a column of the same name?

Here, I have a data.table foo
foo <- data.table(t = c(1,1,2,2,3), b = rnorm(5))
foo
t b
1: 1 0.07014277
2: 1 1.71144087
3: 2 -0.60290798
4: 2 -0.47216639
5: 3 -0.63537131
and a function, myfunc()
myfunc <- function(dt, t){
# Subset dt by t, then do stuff
dt <- dt[J(t = t), by = "t"]
# Complicated stuff here..
score <- mean(dt$b)
return(score)
}
myfunc() takes two parameters:
dt the data.table to operate on
t a value of t that can be used to subset dt (on the t column, of course)
My problem is, in my subset operation dt <- dt[J(t = t), by = "t"], I can't figure out how to make R reference the function variable t for the second t. I've tried variations of dt <- dt[J(t = get(t, -1)), by = "t"] with no luck.
I know I should probably just change my function variable name, but in reality they are quite verbose and specific and I'd rather not. Also note that, in reality, myfunc() is a complicated plotting function.
One possible option is this:
myfunc <- function(dt, t){
env <- environment()
dt <- dt[t==get('t',env)]
mean(dt$b)
}
Another approach: while perhaps not strictly a "solution" to your current problem, you may find it of interest. Consider data.table version>= 1.14.3. In this case, we can use env param of DT[i,j,by,env,...], to indicate the datatable column as "t", and the function parameter as t. Notice that this will work on column t with function parameter t, even if dt contains columns named col and val
myfunc <- function(dt, t){
dt <- dt[col==val, env=list(col="t", val=t)]
mean(dt$b)
}
In both case, usage and output is as below:
Usage
myfunc(dt = foo, t = 3)
Output:
[1] 0.1292877
Input:
set.seed(123)
foo <- data.table(t = c(1,1,2,2,3), b = rnorm(5))
foo looks like this:
> foo
t b
1: 1 -0.56047565
2: 1 -0.23017749
3: 2 1.55870831
4: 2 0.07050839
5: 3 0.12928774
The ambiguity is not at the level of base-function-t versus column named "t". It's at the level of parameter named t and column named t> Heres a modified function that succeeds (at least if there has been a setkey(foo, "t") operation prior:
myfunc <- function(dt, d){
# Subset dt by t, then do stuff
dt1 <- dt[ t==d]
# Complicated stuff here..
score <- dt1[ , paste(b, collapse="_")]
return(score)
}
myfunc(foo, d=1)
#[1] "a_b"
Obviously I need to come up with an interior function that made sense for a character variable column. This would seem to solve the apparent problem that you are having with a column named "t". Just change the name of the parameter in your function's parameter list to something other than "t". The scoping and environments in data.table j-calls (the expressions evaluated as the second arguments) are different than in regular R.

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

How to assign to a subset of an R object with a name given as string

I have the name of a matrix as string and would like to assign to a column of that matrix.
A <- matrix(1:4,2)
v <- 10:11
name <- "A"
get(name)[,2] <- v
This does not work because the LHS is just a value (i.e. a vector) and has lost the meaning of "the second column of A".
eval(parse(text=paste0(name,'[,2]<- v')))
This does the job, but a lot of people discourage the use of such a structure. What is the recommended way to go about this?
EDIT:
Most comments on similar problems I have found discourage the use of object names that can only be passed as strings and instead promote the use of lists, i.e.
l <- list(A=matrix(1:4,2))
v <- 10:11
name <- "A"
l[[name]][,2] <- v
but this does not really answer my question.
For changing names of columns, you should work on a data.frame and not on a matrix:
A <- matrix(1:4,2)
v <- 10:11
name <- "A"
A <- as.data.frame(A)
v <- as.data.frame(v)
colnames(A)[2] <- name
A[,2] <- v
Is this what you were looking for?

Adding data frames into a list within a forloop

I have a for loop that generates a dataframe every time it loops through. I am trying to create a list of data frames but I cannot seem to figure out a good way to do this.
For example, with vectors I usually do something like this:
my_numbers <- c()
for (i in 1:4){
my_numbers <- c(my_numbers,i)
}
This will result in a vector c(1,2,3,4). I want to do something similar with dataframes, but accessing the list of data frames is quite difficult when i use:
my_dataframes <- list(my_dataframes,DATAFRAME).
Help please. The main goal is just to create a list of dataframes that I can later on access dataframe by dataframe. Thank you.
I'm sure you've noticed that list does not do what you want it to do, nor should it. c also doesn't work in this case because it flattens data frames, even when recursive=FALSE.
You can use append. As in,
data_frame_list = list()
for( i in 1:5 ){
d = create_data_frame(i)
data_frame_list = append(data_frame_list,)
}
Better still, you can assign directly to indexed elements, even if those elements don't exist yet:
data_frame_list = list()
for( i in 1:5 ){
data_frame_list[[i]] = create_data_frame(i)
}
This applies to vectors, too. But if you want to create a vector c(1,2,3,4) just use 1:4, or its underlying function seq.
Of course, lapply or the *lply functions from plyr are often better than looping depending on your application.
Continuing with your for loop method, here's a little example of creating and accessing.
> my_numbers <- vector('list', 4)
> for (i in 1:4) my_numbers[[i]] <- data.frame(x = seq(i))
And we can access the first column of each data frame with,
> sapply(my_numbers, "[", 1)
# $x
# [1] 1
#
# $x
# [1] 1 2
#
# $x
# [1] 1 2 3
#
# $x
# [1] 1 2 3 4
Other ways of accessing the data is my_numbers[[1]] for the first data set,
lapply(my_numbers, "[", 1,) to access the first row of each data frame, etc.
You can use operator [[ ]] for this purpose.
l <- list()
df1 <- data.frame(name = 'df1', a = 1:5 , b = letters[1:5])
df2 <- data.frame(name = 'df2', a = 6:10 , b = letters[6:10])
df3 <- data.frame(name = 'df3', a = 11:20 , b = letters[11:20])
df <- rbind(df1,df2,df3)
for(df_name in unique(df$name)){
l[[df_name]] <- df[df$name == df_name,]
}
In this example, there are three separate data frames and in order to store them
in a list using a for loop, we place them in one. Using the operator [[ we can even name the data frame in the list as we want and store it in the list normally.

apply treats numbers as characters

I couldn't find a solution for this problem online, as simple as it seems.
Here's it is:
#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))
#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])
#Look at the output--all columns treated as character columns...
test
#Look at the format of the original data--the first two columns are integers.
str(tf)
In general terms, I want to differentiate what function I apply over a row/column based on what type of data that row/column contains.
Here, I want a simple mean if the column is numeric and the first unique value if the column is a character column. As you can see, apply treats all columns as characters the way I've written this function.
Just write a specialised function and put it within sapply... don't use apply(dtf, 2, fun). Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors") and see for yourself.
sapply(tf, class)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "factor"
sapply(tf, storage.mode)
X1.3 X4.6 c..A....A....A..
"integer" "integer" "integer"
EDIT
Or even better - use lapply:
fn <- function(x) {
if(is.numeric(x) & !is.factor(x)) {
mean(x)
} else if (is.character(x)) {
unique(x)[1]
} else if (is.factor(x)) {
as.character(x)[1]
}
}
dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)
as.data.frame(lapply(dtf, fn))
a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
a b c
1 2 5 A
I find the numcolwise and catcolwise functions from the plyr package useful here, for a syntactically simple solution:
First let's name the columns, to avoid ugly column names when doing the aggregation:
tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))
Then you get your desired result with this one-liner:
> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
a b d
1 2 5 A
Explanation: numcolwise(f) converts its argument ( in this case f is the mean function ) into a function that takes a data-frame and applies f only to the numeric columns of the data-frame. Similarly the catcolwise converts its function argument to a function that operates only on the categorical columns.
You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.

Resources