I'm trying to convert a data.frame in R to mpfr format by multiplying by an mpfr unit constant. This works, as demonstrated in the code below, when applied to a column (result variable 'mpfr_col'), but for both approaches shown for working with a data.frame, it does not. The relevant errors for each attempt are listed in comment.
library(Rmpfr)
prec <- 256
m1 <- mpfr(1,prec)
col_build <- 1:10
test_df <- data.frame(col_build, col_build, col_build)
mpfr_col <- m1*(col_build)
mpfr_df <- m1*test_df # (list) object cannot be coerced to type 'double'
for(colnum in 1:length(colnames(test_df))){
test_df[,colnum] <- m1*test_df[,colnum] # attempt to replicate an object of type 'S4'
}
Answer:
Use [[colnum]] to access the columns instead of [,colnum]:
for(colnum in length(colnames(test_df))){
test_df[[colnum]] <- m1*test_df[[colnum]]
}
(Note: the print method of data.frame will fail, but the 'mpfr-izing' work. You can print it either by printing the columns individually or using as_tibble(test_df).
Explanation
The original fails because the [,colnum] assignment doesn't coerce the argument, I think. Using [[ returns an element (aka a column) of the list (aka the data.frame).
See this bit of Hadley Wickham's Advanced R book:
[ selects sub-lists. It always returns a list; if you use it with a
single positive integer, it returns a list of length one. [[ selects
an element within a list. $ is a convenient shorthand: x$y is
equivalent to x[["y"]].
And the help from Extract.data.frame {base}:
When [ and [[ are used to add or replace a whole column, no coercion
takes place but value will be replicated (by calling the generic
function rep) to the right length if an exact number of repeats can be
used.
When I perform:
a <- seq(1,1.5,0.1)
b <- c(1,1.1,1.4,1.5)
x <- rep(c(a,b),times=c(2,1))
Error in rep(c(a, b), c(2, 1)) : invalid 'times' argument
Why?
When we concatenate (c) two vectors, it becomes a single vector. If the idea would be to replicate 'a' by 2 and 'b' by 1, we place them in a list, and use rep. The output will be a list, which can be unlisted to get a vector.
unlist(rep(list(a,b), c(2,1)))
Marked answer is already perfect: Here an alternative using mapply
unlist(mapply(function(x,n)rep(x,n),list(a,b),c(2,1)))
I have a data frame in R where the majority of columns are values, but there is one character column. For each column excluding the character column I want to subset the values that are over a threshold and obtain the corresponding value in the character column.
I'm unable to find a built-in dataset that contains the pattern of data I want, so a dput of my data can be accessed here.
When I use subsetting, I get the output I'm expecting:
> df[abs(df$PA3) > 0.32,1]
[1] "SSI_01" "SSI_02" "SSI_04" "SSI_05" "SSI_06" "SSI_07" "SSI_08" "SSI_09"
When I try to iterate over the columns of the data frame using apply, I get a recursion error:
> apply(df[2:10], 2, function(x) df[abs(df[[x]])>0.32, 1])
Error in .subset2(x, i, exact = exact) :
recursive indexing failed at level 2
Any suggestions where I'm going wrong?
The reason your solution didn't work is that the x being passed to your user-defined function is actually a column of df. Therefore, you could get your solution working with a small modification (replacing df[[x]] with x):
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])
You could use the ... argument to apply to pass an extra argument. In this case, you would want to pass the first column:
apply(df[2:10], 2, function(x, y) y[abs(x) > 0.32], y=df[,1])
Yet another variation:
apply(abs(df[-1]) > .32, 2, subset, x=df[[1]])
The cute trick here is to "curry" subset by specifying the x parameter. I was hoping I could do it with [ but that doesn't deal with named parameters in the typical way because it is a primitive function :..(
A quick and non-sophisticated solution might be:
sapply(2:10, function(x) df[abs(df[,x])>0.32, 1])
Try:
lapply(df[,2:10],function(x) df[abs(x)>0.32, 1])
Or using apply:
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])
Many thanks in advance for any advices or hints.
I'm working with data frames. The simplified coding is as follows:
`
f<-funtion(name){
x<-tapply(name$a,list(name$b,name$c),sum)
1) y<-dataset[[deparse(substitute(name))]]
#where dataset is an already existed list object with names the same as the
#function argument. I would like to avoid inputting two arguments.
z<-vector("list",n) #where n is also defined already
2) for (i in 1:n){z[[i]]<-x[y[[i]],i]}
...
}
lapply(list_names,f)
`
The warning message is:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
and the output is incorrect. I tried debugging and found the conflict may lie in line 1) and 2). However, when I try f(name) it is perfectly fine and the output is correct. I guess the problem is in lapply and I searched for a while but could not get to the point. Any ideas? Many thanks!
The structure of the data
Thanks Joran. Checking again I found the problem might not lie in what I had described. I produce the full code as follows and you can copy-paste to see the error.
n<-4
name1<-data.frame(a=rep(0.1,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
name2<-data.frame(a=rep(0.2,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
name3<-data.frame(a=rep(0.3,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
#d is the name for the observations. d corresponds to b.
dataset<-vector("list",3)
names(dataset)<-c("name1","name2","name3")
dataset[[1]]<-list(c(1,2),c(1,2,3,4),c(1,2,3,4,5,10),c(4,5,8))
dataset[[2]]<-list(c(1,2,3,5),c(1,2),c(1,2,10),c(2,3,4,5,8,10))
dataset[[3]]<-list(c(3,5,8,10),c(1,2,5,7),c(1,2,3,4,5),c(2,3,4,6,9))
f<-function(name){
x<-tapply(name$a,list(name$b,name$c),sum)
rownames(x)<-sort(unique(name$d)) #the row names for
y<-dataset[[deparse(substitute(name))]]
z<-vector("list",n)
for (i in 1:n){
z[[i]]<-x[y[[i]],i]}
nn<-length(unique(unlist(sapply(z,names)))) # the number of names appeared
names_<-sort(unique(unlist(sapply(z,names)))) # the names appeared add to the matrix
# below
m<-matrix(,nrow=nn,ncol=n);rownames(m)<-names_
index<-vector("list",n)
for (i in 1:n){
index[[i]]<-match(names(z[[i]]),names_)
m[index[[i]],i]<-z[[i]]
}
return(m)
}
list_names<-vector("list",3)
list_names[[1]]<-name1;list_names[[2]]<-name2;list_names[[3]]<-name3
names(list_names)<-c("name1","name2","name3")
lapply(list_names,f)
f(name1)
the lapply(list_names,f) would fail, but f(name1) will produce exactly the matrix I want. Thanks again.
Why it doesn't work
The issue is the calling stack doesn't look the same in both cases. In lapply, it looks like
[[1]]
lapply(list_names, f) # lapply(X = list_names, FUN = f)
[[2]]
FUN(X[[1L]], ...)
In the expression being evaluated, f is called FUN and its argument name is called X[[1L]].
When you call f directly, the stack is simply
[[1]]
f(name1) # f(name = name1)
Usually this doesn't matter, but with substitute it does because substitute cares about the name of the function argument, not its value. When you get to
y<-dataset[[deparse(substitute(name))]]
inside lapply it's looking for the element in dataset named X[[1L]], and there isn't one, so y is bound to NULL.
A way to get it to work
The simplest way to deal with this is probably to just have f operate on character strings and pass names(list_names) to lapply. This can be accomplished fairly easily by changing the beginning of f to
f<-function(name){
passed.name <- name
name <- list_names[[name]]
x<-tapply(name$a,list(name$b,name$c),sum)
rownames(x)<-sort(unique(name$d)) #the row names for
y<-dataset[[passed.name]]
# the rest of f...
and changing lapply(list_names, f) to lapply(names(list_names),f). This should give you what you want with nearly minimal modification, but you also might consider also renaming some of your variables so the word name isn't used for so many different things--the function names, the argument of f, and all the various variables containing name.
I'd like to know the reason why the following does not work on the matrix structure I have posted here (I've used the dput command).
When I try running:
apply(mymatrix, 2, sum)
I get:
Error in FUN(newX[, i], ...) : invalid 'type' (list) of argument
However, when I check to make sure it's a matrix I get the following:
is.matrix(mymatrix)
[1] TRUE
I realize that I can get around this problem by unlisting the data into a temp variable and then just recreating the matrix, but I'm curious why this is happening.
?is.matrix says:
'is.matrix' returns 'TRUE' if 'x' is a vector and has a '"dim"'
attribute of length 2) and 'FALSE' otherwise.
Your object is a list with a dim attribute. A list is a type of vector (even though it is not an atomic type, which is what most people think of as vectors), so is.matrix returns TRUE. For example:
> l <- as.list(1:10)
> dim(l) <- c(10,1)
> is.matrix(l)
[1] TRUE
To convert mymatrix to an atomic matrix, you need to do something like this:
mymatrix2 <- unlist(mymatrix, use.names=FALSE)
dim(mymatrix2) <- dim(mymatrix)
# now your apply call will work
apply(mymatrix2, 2, sum)
# but you should really use (if you're really just summing columns)
colSums(mymatrix2)
The elements of your matrix are not numeric, instead they are list, to see this you can do:
apply(m,2, class) # here m is your matrix
So if you want the column sum you have to 'coerce' them to be numeric and then apply colSums which is a shortcut for apply(x, 2, sum)
colSums(apply(m, 2, as.numeric)) # this will give you the sum you want.