as.matrix not preserving the data mode of an empty data.frame - r

I have found something odd today, I wanted to ask you if there was a logical reason for what I am seeing, or if you think this is a bug that should be reported to the R-devel team:
df <- data.frame(a = 1L:10L)
class(df$a)
# [1] "integer"
m <- as.matrix(df)
class(m[, "a"])
# [1] "integer"
No surprise so far: as.matrix preserves the data mode, here "integer". However, with an empty (no rows) data.frame:
df <- data.frame(a = integer(0))
class(df$a)
# [1] "integer"
m <- as.matrix(df)
class(m[, "a"])
# [1] "logical"
Any idea why the mode changes from "integer" to "logical" here? I am using version 2.13.1
Thank you.

This is because of this one line in as.matrix.data.frame:
if (any(dm == 0L)) return(array(NA, dim = dm, dimnames = dn))
Basically, if any dimensions are zero, you get an array "full" of NA. I say "full" because there aren't really any observations because one of the dimensions is zero.
The reason the class is logical is because that's the class of NA. There are special NA for other classes, but they're not really necessary here. For example:
> class(NA)
[1] "logical"
> class(NA_integer_)
[1] "integer"
> class(NA_real_)
[1] "numeric"
> class(NA_complex_)
[1] "complex"
> class(NA_character_)
[1] "character"

Related

VarSelLCM returns error but input variables seem to be factors and integer

I have this simple dataset
https://www.mediafire.com/file/ntmu0tvtpm73h2i/data.xlsx/file
library(readxl)
library(VarSelLCM)
data <- read_xlsx("C:/User/data.xlsx")
data$type <- as.factor(data$type)
data$T3 <-as.integer(data$T3)
data$T5 <-as.integer(data$T5)
data$T14 <-as.integer(data$T14)
data$T15 <-as.integer(data$T15)
data$T18 <-as.integer(data$T18)
data$T22 <-as.integer(data$T22)
When I run lapply(data, class)
I get
$`T3`
[1] "integer"
$T5
[1] "integer"
$T14
[1] "integer"
$T15
[1] "integer"
$T18
[1] "integer"
$T22
[1] "integer"
$type
[1] "factor"
So everything seems to be OK.
But when I run
res_with <- VarSelCluster(data, gvals = 1:4, nbcores = 1, crit.varsel = "BIC")
I get the Error
Error in VSLCMdataMixte(x) :
At least one variable is neither numeric, integer nor factor!
The problem perhaps could be that when I run
> print(typeof(data$type))
I get this:
[1] "integer"
So it seems that data$type is integer?? though I converted it to factor? And it indeed is a factor as shown by lapply??
But even if the data$type variable was integer, it should be correct for VarSelCluster, because it requires integer variables, this is totally confusing.
How could I solve this?
VarSelCluster requires data as a data.frame, so first you need to set class(data)="data.frame"

`write.dbf` fails with an object of class `tbl_df`

I do a lot of my work with .dbf files, and also with dplyr. There's a bug in write.dbf() that prevents writing a tbl_df object to a .dbf file.
Unfortunately, the error message is poorly written and it's therefore difficult to figure out exactly what is happening.
Here's a MWE
library(dplyr)
library(foreign)
d <- data_frame( x = 1:4, y = rnorm(4) )
write.dbf(d, "test.dbf")
Error in write.dbf(d, "test.dbf") : unknown column type in data frame
The solution here is to force the class of d to a bare data.frame
class(d)
[1] "tbl_df" "tbl" "data.frame"
df <- as.data.frame(d)
class(df)
[1] "data.frame"
write.dbf(as.data.frame(df), "test.dbf") # works
I've filed a bug report with the foreign people, but hopefully this post can save someone else some pain.
I'm not sure it's fair to assert a bug in foreign. Consider this:
library(dplyr)
df <- data.frame(x=1:10, y=11:20)
class(df)
# [1] "data.frame"
mode(df$x) # as expected
# [1] "numeric"
mode(df[,"x"]) # as expected
# [1] "numeric"
dp <- data_frame(x=1:10, y=11:20)
class(dp)
# [1] "tbl_df" "tbl" "data.frame"
mode(dp$x)
# [1] "numeric" # as expected
mode(dp[,"x"])
# [1] "list" # WTF?!
There are many, many functions in R that use, e.g., mode(my.data.frame[,"mycolumn"]) to test the mode of a column in a dataframe, but with a tbl_df object, the mode returned is "list".

How to avoid implicit character conversion when using apply on dataframe

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?
Let's wrap up multiple comments into an explanation.
the use of apply converts a data.frame to a matrix. This
means that the least restrictive class will be used. The least
restrictive in this case is character.
You're supplying 1 to apply's MARGIN argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply designed for matrices
and data.frames on a vector. This is not the right tool for the job.
In ths case I'd use lapply or sapply as rmk points out to grab the classes of
the single t2 column as seen below:
Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply family of functions do.
Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"

p.adjust error: 'oderVector1'

I'm having a problem with the function p.adjust. I have a list containing 741 p-values and I want to use the p.adjust() function to correct for multiple testing (FDR testing). This is what I have so far:
> x <- as.vector(pvalues1)
> p.adjust(x, method="fdr" n=length(x))
But I get the following error
Error in order (p, decreasing = TRUE) :
unimplemented type 'list' in 'orderVector1'
Can anyone help me with this?
The problem you have is the your list containing the p-values is a vector already. What you wanted was a numeric vector. A list is just a general vector:
> l <- list(A = runif(1), B = runif(1))
> l
$A
[1] 0.7053136
$B
[1] 0.7053284
> as.vector(l)
$A
[1] 0.7053136
$B
[1] 0.7053284
> is.vector(l)
[1] TRUE
One option is to unlist() the list, to produce a numeric vector:
> unlist(l)
A B
0.7053136 0.7053284
the benefit of that is that it preserves the names. An alternative is plain old as.numeric(), which looses the names, but is otherwise the same as unlist():
> as.numeric(l)
[1] 0.7053136 0.7053284
For big vectors, you might not want to use the names in unlist(), so an alternative that will speed that version up is:
> unlist(l, use.names = FALSE)
[1] 0.7053136 0.7053284

How to read logical data from a file in R

I generated a file which contains a logical value either a "TRUE" or a "FALSE" on each line. Now I would like to read the logical data from the file into R. However the data that are read in are of mode "character" not logical values. I was wondering how to read the data as logical values from the file.
My R code is
cat(FALSE,"\n", file="1.txt", append=FALSE);
for (i in 2:5) cat(TRUE,"\n",file="1.txt", append=TRUE);
a=scan(file="1.txt", what="logical")
The output is:
> mode(a)
[1] "character"
> mode(a[1])
[1] "character"
> a[1]
[1] "FALSE"
I want a[1] to be logical value.
Thanks and regards!
Ah, now I get it. You have to read ?scan very carefully to see that what you've done is not what scan() wants for the what argument. I missed this first time and then wondered why your code wasn't working. This is the key section:
what: the type of ‘what’ gives the type of data to be read. The
supported types are ‘logical’, ‘integer’, ‘numeric’,
‘complex’, ‘character’, ‘raw’ and ‘list’.
and the key phrase is type. So you need to pass an object of the correct type to argument what.
In your example:
> typeof("logical")
[1] "character"
So scan() reads in an object of type "character".
The solution is simply to use what = TRUE, or indeed anything that R considers a logical (see comments to this answer), instead
> typeof(TRUE)
[1] "logical"
> ## or
> typeof(logical())
[1] "logical"
## So now read in with what = TRUE
> a <- scan(file="1.txt", what = TRUE)
Read 5 items
> class(a)
[1] "logical"
> typeof(a)
[1] "logical"
read.table() is more logical in how you tell it what the data to be read in are. The equivalent call there would be:
> b <- read.table("1.txt", colClasses = "logical")[,]
> class(b)
[1] "logical"
> typeof(b)
[1] "logical"
HTH
Use a=='TRUE'->a.

Resources