can expand.grid create "numeric" instead of "list"? - r

I was expecting this to produce an object of mode numeric
R> mode(expand.grid(c(1,2),c(3,4)))
R> "list"
Is there an easy fix for making it "numeric"?

You are making numericals, the code below shows what you are making - and how to make it a matrix instead of lists:
> x <- as.matrix(expand.grid(c(1,2), c(3,4)))
> x
Var1 Var2
[1,] 1 3
[2,] 2 3
[3,] 1 4
[4,] 2 4
As you can see the components are numerical lists/vectors:
> str(x)
num [1:4, 1:2] 1 2 1 2 3 3 4 4
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "Var1" "Var2"
> x[,1]
[1] 1 2 1 2

Related

Bind_rows() error: "Argument 1 must have names" // Occurs after tidyverse update

After a global tidyverse update, I have noted a change of behaviour in my code and after many researches I am desperately unable to solve the issue. Basically I need to convert a list of elements (including lists) to a dataframe.
Here is a reprex:
Data
x <- list(
col1 = list("a", "b", "c", NA),
col2 = list(1, 2, 3, 4),
col3 = list("value1", "value2", "value1", c("value1", "value2")))
Expected behaviour and output (before tidyverse update):
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x))) %>% as.data.frame()
> x
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 <NA> 4 value1, value2
> str(x)
# 'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
Problem encountered after update
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# Error: Argument 1 must have names.
# Run `rlang::last_error()` to see where the error occurred.
# In addition: Warning message:
# Outer names are only allowed for unnamed scalar atomic inputs
> rlang::last_error()
# <error/rlang_error>
# Argument 1 must have names.
# Backtrace:
# 1. purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# 2. dplyr::bind_rows(res, .id = .id)
# Run `rlang::last_trace()` to see the full context.
This error seems to be well known, and I have explored many options with the purrr::flatten_() family, and others found on Stackoverflow but was not able to solve.
Thank you if any help is providable!
The first part of your attempt gives you a list for every column irrespective of it's length.
x <- data.frame((sapply(x, c)))
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1:List of 4
# ..$ : chr "a"
# ..$ : chr "b"
# ..$ : chr "c"
# ..$ : logi NA
# $ col2:List of 4
# ..$ : num 1
# ..$ : num 2
# ..$ : num 3
# ..$ : num 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
You can unlist the above for columns with only 1 element.
x[] <- lapply(x, function(p) if(max(lengths(p)) == 1) unlist(p) else p)
x
# col1 col2 col3
#1 a 1 value1
#2 b 2 value2
#3 c 3 value1
#4 <NA> 4 value1, value2
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
One option utilizing dplyr, tibble and purrr could be:
imap_dfc(x, ~ tibble(!!.y := .x)) %>%
mutate(across(where(~ all(lengths(.) == 1)), ~ unlist(.)))
col1 col2 col3
<chr> <dbl> <list>
1 a 1 <chr [1]>
2 b 2 <chr [1]>
3 c 3 <chr [1]>
4 <NA> 4 <chr [2]>
no tidyverse solution, but it seems to work..
library( rlist )
as.data.frame( rlist::list.cbind( x ) )
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 NA 4 value1, value2

How to use tryCatch when the outcome is not distinct

I have a function that I have made which returns a dataframe with two variables. As a simple example lets have:
test <- function(x) {y <- matrix( 5 , nrow= x , ncol = 2)
z<- data.frame(y)
return(z) }
I want to find out on which x values this function gives an error. (on our example I think for negative values, but I just want to convey the concept.) So I try:
z <- rep(0)
testnumbers <- c(0,1,2,3,4,-1,5)
for (i in 1:length(testnumbers)) {
tempo <- tryCatch( testfun(testnumbers[i]) , error= function(e) return(0) )
if (tempo == 0 ) z[i] <- {testnumbers[i] next}
}
What is wrong with my process and how can I find where in my function does not work?
If you're looking to run all of the testnumbers regardless of any of them failing, I suggest a slightly different tact.
Base R
This borrows from Rui's use of inherits which is more robust and unambiguous. It goes one step further by preserving not just which one had the error, but the actual error text as well:
testfun <- function(x) {
y <- matrix(5, nrow = x, ncol = 2)
z <- as.data.frame(y)
z
}
testnumbers <- c(0, 1, 2, 3, 4, -1, 5)
rets <- setNames(
lapply(testnumbers, function(n) tryCatch(testfun(n), error=function(e) e)),
testnumbers
)
sapply(rets, inherits, "error")
# 0 1 2 3 4 -1 5
# FALSE FALSE FALSE FALSE FALSE TRUE FALSE
Filter(function(a) inherits(a, "error"), rets)
# $`-1`
# <simpleError in matrix(5, nrow = x, ncol = 2): invalid 'nrow' value (< 0)>
(The setNames(lapply(...), ...) is because the inputs are numbers so sapply(..., simplify=F) did not preserve the names, something I thought was important.)
All of this falls in line with what some consider good practice: if you're doing one function to a lot of "things", then do it in a list, and therefore in one of the *apply functions.
tidyverse
There is a function in purrr that formalizes this a little: safely, which returns a function wrapped around its argument. For instance:
library(purrr)
safely(testfun)
# function (...)
# capture_error(.f(...), otherwise, quiet)
# <environment: 0x0000000015151d90>
It is returning a function that can then be passed. A one-time call would look like one of the following:
safely(testfun)(0)
# $result
# [1] V1 V2
# <0 rows> (or 0-length row.names)
# $error
# NULL
testfun_safe <- safely(testfun)
testfun_safe(0)
# $result
# [1] V1 V2
# <0 rows> (or 0-length row.names)
# $error
# NULL
To use it here, you can do:
rets <- setNames(
lapply(testnumbers, safely(testfun)),
testnumbers
)
str(rets[5:6])
# List of 2
# $ 4 :List of 2
# ..$ result:'data.frame': 4 obs. of 2 variables:
# .. ..$ V1: num [1:4] 5 5 5 5
# .. ..$ V2: num [1:4] 5 5 5 5
# ..$ error : NULL
# $ -1:List of 2
# ..$ result: NULL
# ..$ error :List of 2
# .. ..$ message: chr "invalid 'nrow' value (< 0)"
# .. ..$ call : language matrix(5, nrow = x, ncol = 2)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
Filter(Negate(is.null), sapply(rets, `[[`, "error"))
# $`-1`
# <simpleError in matrix(5, nrow = x, ncol = 2): invalid 'nrow' value (< 0)>
and to get to the results of all runs (including the errant one):
str(sapply(rets, `[[`, "result"))
# List of 7
# $ 0 :'data.frame': 0 obs. of 2 variables:
# ..$ V1: num(0)
# ..$ V2: num(0)
# $ 1 :'data.frame': 1 obs. of 2 variables:
# ..$ V1: num 5
# ..$ V2: num 5
# $ 2 :'data.frame': 2 obs. of 2 variables:
# ..$ V1: num [1:2] 5 5
# ..$ V2: num [1:2] 5 5
# $ 3 :'data.frame': 3 obs. of 2 variables:
# ..$ V1: num [1:3] 5 5 5
# ..$ V2: num [1:3] 5 5 5
# $ 4 :'data.frame': 4 obs. of 2 variables:
# ..$ V1: num [1:4] 5 5 5 5
# ..$ V2: num [1:4] 5 5 5 5
# $ -1: NULL
# $ 5 :'data.frame': 5 obs. of 2 variables:
# ..$ V1: num [1:5] 5 5 5 5 5
# ..$ V2: num [1:5] 5 5 5 5 5
or just the results without the failed run:
str(Filter(Negate(is.null), sapply(rets, `[[`, "result")))
# List of 6
# $ 0:'data.frame': 0 obs. of 2 variables:
# ..$ V1: num(0)
# ..$ V2: num(0)
# $ 1:'data.frame': 1 obs. of 2 variables:
# ..$ V1: num 5
# ..$ V2: num 5
# $ 2:'data.frame': 2 obs. of 2 variables:
# ..$ V1: num [1:2] 5 5
# ..$ V2: num [1:2] 5 5
# $ 3:'data.frame': 3 obs. of 2 variables:
# ..$ V1: num [1:3] 5 5 5
# ..$ V2: num [1:3] 5 5 5
# $ 4:'data.frame': 4 obs. of 2 variables:
# ..$ V1: num [1:4] 5 5 5 5
# ..$ V2: num [1:4] 5 5 5 5
# $ 5:'data.frame': 5 obs. of 2 variables:
# ..$ V1: num [1:5] 5 5 5 5 5
# ..$ V2: num [1:5] 5 5 5 5 5
You were actually quite close. I'm not sure what did the trick in the end but I
Changed the 1:length(testnumbers) as this is unneccessary
Changed return(0) to a character
Wrapped your if in another if as it kept failing if the length was larger than 1 or could not be assessed.
Then you get the correct results. You could try and change the code bit by bit to see what was wrong.
test <- function(x) {y <- matrix( 5 , nrow = x , ncol = 2)
z<- data.frame(y)
return(z) }
errored <- numeric()
testnumbers <- c(0,1,2,3,4,-1,5)
for (i in testnumbers) {
tempo <- tryCatch(test(i), error = function(e) "error")
if (length(tempo) == 1) {
if (tempo == "error") errored <- c(errored, i)
}
}
errored
> -1
You need tryCatch to return the error, not zero.
testfun <- function(x) {
y <- matrix(5, nrow = x, ncol = 2)
z <- as.data.frame(y)
z
}
testnumbers <- c(0, 1, 2, 3, 4, -1, 5)
z <- numeric(length(testnumbers))
for (i in seq_along(testnumbers)) {
tempo <- tryCatch(testfun(testnumbers[i]), error = function(e) e)
if (inherits(tempo, "error")) {
z[i] <- testnumbers[i]
}
}
z
#[1] 0 0 0 0 0 -1 0
Also,
In order to coerce a matrix to data.frame use as.data.frame.
I have removed the calls to return since the last value of a function is its return value.
rep(0) is the same as just 0, replaced by numeric(length(testnumbers)).
seq_along(testnumbers) is always better than 1:length(testnumbers). Try it with testnumbers of length zero and see what happens.

Getting back original names from rpart.object

I have saved models which were created using the rpart package in R. I am trying to retrieve some information from these saved models; specifically from rpart.object. While the documentation - rpart doc - is helpful there are a few things it is not clear about:
How do I find out which variables are categorical and which are numeric? Currently, what I do is refer to the 'index' column in the splits matrix. I've noticed that for numeric variables only, the entry is not an integer. Is there a cleaner way to do this?
The csplit matrix refers to the various values a categorical variable can take using integers i.e. R maps the original names to integers. Is there a way to access this mapping? For ex. if my original variable, say, Country can take any of the values France, Germany, Japan etc, the csplit matrix lets me know that a certain split is based on Country == 1, 2. Here, rpart has replaced references to France, Germany with 1, 2 respectively. How do I get the original names - France, Germany, Japan - back from the model file? Also, how do I know what the mapping between the names and the integers is?
Generally it is the terms component that would have that sort of information. See ?rpart::rpart.object.
fit <- rpart::rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit$terms # notice that the attribute dataClasses has the information
attr(fit$terms, "dataClasses")
#------------
Kyphosis Age Number Start
"factor" "numeric" "numeric" "numeric"
That example doesn't have a csplit node in its structure because none of hte variables are factors. You could make one fairly easily:
> fit <- rpart::rpart(Kyphosis ~ Age + factor(findInterval(Number,c(0,4,6,Inf))) + Start, data = kyphosis)
> fit$csplit
[,1] [,2] [,3]
[1,] 1 1 3
[2,] 1 1 3
[3,] 3 1 3
[4,] 1 3 3
[5,] 3 1 3
[6,] 3 3 1
[7,] 3 1 3
[8,] 1 1 3
> attr(fit$terms, "dataClasses")
Kyphosis
"factor"
Age
"numeric"
factor(findInterval(Number, c(0, 4, 6, Inf)))
"factor"
Start
"numeric"
The integers are just the values of the factor variables so the "mapping" is just the same as it would be from as.numeric() to the levels() of a factor. If I were trying to construct a character matrix version of the fit$csplit-matrix that substituted the names of the levels in a factor variable, this would be one path to success:
> kyphosis$Numlev <- factor(findInterval(kyphosis$Number, c(0, 4, 6, Inf)), labels=c("low","med","high"))
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ Numlev : Factor w/ 3 levels "low","med","high": 1 1 2 2 2 1 1 1 1 3 ...
> fit <- rpart::rpart(Kyphosis ~ Age +Numlev + Start, data = kyphosis)
> Levels <- fit$csplit
> Levels[] <- levels(kyphosis$Numlev)[Levels]
> Levels
[,1] [,2] [,3]
[1,] "low" "low" "high"
[2,] "low" "low" "high"
[3,] "high" "low" "high"
[4,] "low" "high" "high"
[5,] "high" "low" "high"
[6,] "high" "high" "low"
[7,] "high" "low" "high"
[8,] "low" "low" "high"
Response to comment: If you only have the model then use str() to look at it. I see an "ordered" leaf in the example I created that has the factor labels stored in an attribute named "xlevels":
$ ordered : Named logi [1:3] FALSE FALSE FALSE
..- attr(*, "names")= chr [1:3] "Age" "Numlev" "Start"
- attr(*, "xlevels")=List of 1
..$ Numlev: chr [1:3] "low" "med" "high"
- attr(*, "ylevels")= chr [1:2] "absent" "present"
- attr(*, "class")= chr "rpart"

apply() not working when checking column class in a data.frame

I have a dataframe. I want to inspect the class of each column.
x1 = rep(1:4, times=5)
x2 = factor(rep(letters[1:4], times=5))
xdat = data.frame(x1, x2)
> class(xdat)
[1] "data.frame"
> class(xdat$x1)
[1] "integer"
> class(xdat$x2)
[1] "factor"
However, imagine that I have many columns and therefore need to use apply() to help me do the trick. But it's not working.
apply(xdat, 2, class)
x1 x2
"character" "character"
Why cannot I use apply() to see the data type of each column? or What I should do?
Thanks!
You could use
sapply(xdat, class)
# x1 x2
# "integer" "factor"
using apply would coerce the output to matrix and matrix can hold only a single 'class'. If there are 'character' columns, the result would be a single 'character' class. To understand this check
str(apply(xdat, 2, I))
#chr [1:20, 1:2] "1" "2" "3" "4" "1" "2" "3" "4" "1" ...
#- attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "x1" "x2"
Now, if we check
str(lapply(xdat, I))
#List of 2
#$ x1:Class 'AsIs' int [1:20] 1 2 3 4 1 2 3 4 1 2 ...
#$ x2: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 1 2 3 4 1 2 ...

How to convert matrix (or list) into data.frame

I've got a list with different types in it. They are arranged in matrix form:
tmp <- list('a', 1, 'b', 2, 'c', 3)
dim(tmp) <- c(2,3)
tmp
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] 1 2 3
That's the form I get it out of another more complex function.
Now I want to transpose it and convert to a data.frame. So I do the following:
data <- as.data.frame(t(tmp))
data
V1 V2
1 a 1
2 b 2
3 c 3
This looks great. But it's got the wrong structure:
str(data)
'data.frame': 3 obs. of 2 variables:
$ V1:List of 3
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
$ V2:List of 3
..$ : num 1
..$ : num 2
..$ : num 3
So how do I get rid of the extra level of lists?
This should do the trick:
df <- data.frame(lapply(data.frame(t(tmp)), unlist), stringsAsFactors=FALSE)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ X1: chr "a" "b" "c"
# $ X2: num 1 2 3
The inner data.frame() call converts the matrix into a two column data.frame, with one "character" column and one "numeric" column.**
lapply(..., unlist) strips away extra list() layer.
The outer data.frame() call converts the resulting list into the data.frame you're after.
** (OK, that intermediate "character" column is really of class "factor", but it ends up making no difference in the final result. If you like, you could force it to be have class "character" by adding a stringsAsFactors=FALSE for the inner data.frame() call as well, but I don't think neglecting to do so would ever make a difference...)
Or this :
as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE))
You can inspect the result:
str(as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE)))
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3

Resources