Converting a character into a variable name - r

I need to convert a data frame into a matrix using the model.matrix function. The name of the original data frame is train, and the outcome variable of interest is called adequacy_ratio_total_percent. The below R code works.
X_train_matrix <- model.matrix(adequacy_ratio_total_percent ~ ., train)[, -1]
However, since my outcome variables may vary and I hope to simplify the changing of the outcome variables using the below code, which does not work.
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(list_outcome ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
variable lengths differ (found for 'adequacy_ratio_total_percent')
I also tried the following, which does not work either.
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(train$list_outcome ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
invalid type (NULL) for variable 'train$list_outcome'
Or the following:
list_outcome <- c("adequacy_ratio_total_percent")
X_train_matrix <- model.matrix(list_outcome[1] ~ ., train)[, -1]
Error in model.frame.default(object, data, xlev = xlev) :
variable lengths differ (found for 'adequacy_ratio_total_percent')
How can I extract the variable name from list_outcome and apply it to the model.matrix function? Thank you in advance for any advice!

Here's an answer that uses the same idea as #user20650, but with multiple possibilities for outcomes:
data(mtcars)
list_outcomes = c("qsec", "mpg")
Xmats <- lapply(list_outcomes, function(l){
model.matrix(reformulate(".", response=l), data=mtcars)
})
lapply(Xmats, head)
#> [[1]]
#> (Intercept) mpg cyl disp hp drat wt vs am gear carb
#> Mazda RX4 1 21.0 6 160 110 3.90 2.620 0 1 4 4
#> Mazda RX4 Wag 1 21.0 6 160 110 3.90 2.875 0 1 4 4
#> Datsun 710 1 22.8 4 108 93 3.85 2.320 1 1 4 1
#> Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 1 0 3 1
#> Hornet Sportabout 1 18.7 8 360 175 3.15 3.440 0 0 3 2
#> Valiant 1 18.1 6 225 105 2.76 3.460 1 0 3 1
#>
#> [[2]]
#> (Intercept) cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 1 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 1 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 1 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2022-06-28 by the reprex package (v2.0.1)

Related

tryCatch() in across() fails when the error comes from another column

I use across() and I want to put NA where the computation fails. I tried to use tryCatch() but can't make it work in my case, whereas there are situations where it works.
This works:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(blabla, error = function(e) NA) # create an intentional error for the example
)
)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 NA 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 NA 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 NA 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 NA 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 NA 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 NA 3.460 20.22 1 0 3 1
But this doesn't:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(x[which(mpg == 10000)], error = function(e) NA) # create an intentional error for the example
)
)
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(...)`.
#> Caused by error in `across()`:
#> ! Problem while computing column `drat`.
Created on 2022-07-07 by the reprex package (v2.0.1)
I thought tryCatch() was supposed to catch any error. Why doesn't it work in the second situation? How to fix it?
Note: I need to use across() in my real situation (even if it's not truly needed in the examples)
The problem isn't the tryCatch because the code you run doesn't trigger an error. Basically you are running
foo <- function(x) tryCatch(x[which(mtcars$mpg==10000)], error = function(e) NA))
foo(mtcars$drat)
# numeric(0)
And notice that no error is triggered. That expression simply returns numeric(0). And the problem is that the function needs to return a value with a non-zero length. So the error is happening after your tryCatch code runs and dplyr is trying to assign the value back into the data.frame. You will need to handle the case where no values are found separately. Perhaps
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) {
matches <- mpg == 10000
if (any(matches)) x[which(matches)] else NA
}
)
)
It looks like you just need to reference mpg with x:
library(dplyr)
head(mtcars) %>%
mutate(
across(
all_of("drat"),
function(x) tryCatch(x[which(x$mpg == 10000)], error = function(e) NA) # create an intentional error for the example
)
)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 NA 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 NA 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 NA 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 NA 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 NA 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 NA 3.460 20.22 1 0 3 1

replace row names with defined vector in R

Is there a way that the row names can be substituted based on predefined vector in R, something like:
rownames(GV) <- c(beta1='Age', beta10='Female Gender')
Or maybe case_when() will be easier for you:
library(dplyr)
df <- data.frame(a = c(1, 2, 3))
rownames(df)
#> [1] "1" "2" "3"
rownames(df) <- case_when(rownames(df) == "1" ~ "one",
rownames(df) == "2" ~ "two",
TRUE ~ rownames(df))
rownames(df)
#> [1] "one" "two" "3"
You specify new value for each contidion and the value for all rest cases (where is TRUE ~ rownames(df) line) - for the rest cases I'm leaving the previous row names above.
We could do the following:
rownames(mtcars)[which(rownames(mtcars) == "Datsun 710")] <- "My Rowname"
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If we want to rename more rownames we can use %in%, but as #gss mentions in the comments, this comes with a caveat: not matter the order of the names in the character vector succeeding %in% the names will be replaced in the order they appear in rownames(). Compare the following two calls:
rownames(mtcars)[which(rownames(mtcars) %in% c("Datsun 710", "Mazda RX4 Wag"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Which has the same result as:
rownames(mtcars)[which(rownames(mtcars) %in% c("Mazda RX4 Wag", "Datsun 710"))] <- c("My Rowname1","My Rowname2")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> My Rowname1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> My Rowname2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to rename all the rows, and you have an array of the desired new names in order:
example <- head(mtcars, 3)
mynewnames <- c("First", "Second", "Third")
rownames(example) <- mynewnames
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> First 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Second 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Third 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename all the rows, and you have a named array (not necessarily in the correct order):
example <- head(mtcars, 3)
mynewnames <- c("Datsun 710" = "Datsun", "Mazda RX4" = "Mazda", "Mazda RX4 Wag" = "Also Mazda")
rownames(example) <- mynewnames[rownames(example)]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Also Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
If you want to rename only some rows, and have a named array (an ordered array makes no sense in this context):
example <- head(mtcars, 3)
mynewnames <- c("Mazda RX4" = "This Mazda", "Mazda RX4 Wag" = "That Mazda")
rownames(example)[rownames(example) %in% names(mynewnames)] <-
mynewnames[rownames(example)[rownames(example) %in% names(mynewnames)]]
example
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> This Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> That Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
This is a bit unwieldy; if you are only replacing one or two row names then #TimTeaFan's first suggestion is probably easier.
Most safe way and as OP prefers with a predefined named vector is taking the current rownames, replace those who are defined and set the rownames again. this does not fail on an incomplete vector, if it cannot be replaced it stays as it was before.
The advantage of this solution is to prevent the error below if your rename vector is incomplete.
Error in `.rowNamesDF<-`(x, value = value) :
missing values in 'row.names' are not allowed
solution
library(stringr) # used for str_replace_all()
df <- data.frame(
x = rep(1:5),
y = rep(11:15),
row.names = LETTERS[1:5]
)
df
# x y
# A 1 11
# B 2 12
# C 3 13
# D 4 14
# E 5 15
change <- c("A" = "a", "C" = "c")
row.names(df) <- str_replace_all(row.names(df), change)
df
# x y
# a 1 11
# B 2 12
# c 3 13
# D 4 14
# E 5 15

Fix subscript out of bounds error when adding column to df

I have a df with 20 columns of numerical data. I am trying to add an additional column with the "total" number of rows, however I am getting a subscript out of bounds error. This is the code I'm using:
df[,"Total"]<-rowSums(df)
This is the error:
Error in `[<-`(`*tmp*`, , "Total", value = c(Acidovorax = 13, Acinetobacter = 48143, :
subscript out of bounds
That shouldn't happen for data.frames, but can for matrix.
mt_mtx <- as.matrix(mtcars)
mtcars[,"Total"] <- rowSums(mtcars)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb Total
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 328.980
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 329.795
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 259.580
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 426.135
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 590.310
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 385.540
mt_mtx[,"Total"] <- rowSums(mt_mtx)
# Error in `[<-`(`*tmp*`, , "Total", value = c(`Mazda RX4` = 328.98, `Mazda RX4 Wag` = 329.795, :
# subscript out of bounds
The quick remedy is to convert your df back to a data.frame. If you weren't expecting this, thinking that your df was already a frame, then I suggest you go back through your code to find what accidentally coerced it to a matrix.

write r function to modify value in data frame

I have a set a variables say Var1, Var2 to Varn. They all take three possible values 0, 1, and 2. I want to replace all 2 as 1
like so
df$Var1[df$Var1 >= 1] <- 1
This does the job. But when I try to write a function to do this
MakeBinary <- function(varName dfName){dfName$varName[dfName$varNAme > = 1] <- 1}
and use this function like:
MakeBinary(Var2, df)
I got an error message: Error in $<-.data.frame(*tmp*, "varName", value = numeric(0)) :
replacement has 0 rows, data has 512.
I just want to know why I got this message. Thanks. My sample size is 512.
If we are passing column name as string, then use [[ instead of $ and return the dataset
MakeBinary <- function(varName, dfName){
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary("Var2", df)
example with mtcars
MakeBinary("carb", head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Unquoted arguments for variable names can be passed as well, but it needs to be converted to string
MakeBinary <- function(varName, dfName){
varName <- deparse(substitute(varName))
dfName[[varName]][dfName[[varName]] >= 1] <- 1
dfName
}
MakeBinary(Var2, df)
Using a reproducible example with mtcars
MakeBinary(carb, head(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 1
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Opposite function to add_rownames in dplyr

As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.
You can now use the tibble-package:
tibble::column_to_rownames()
This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.
If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...

Resources