R: Convert characters to numeric in data.frame with unknown column classes - r

There is a nice discussion of how to convert character into numerics in this SO here.
Maybe I missed something in that post, but what would one do if one does not know which columns are "convertable" (if any) ?
Is it possible to check for convertability ?
In addition, I usually suppress factor conversion (like character better) - so characters should be characters (not factors).
df <- data.frame(a=as.character(c(NA, 1/3)), b=letters[1:2], c=c('1|2', '4|2'), d=as.character(3:4), stringsAsFactors = F)
Then apply ... some function f ... to get:
str(f(df))
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: int 3 4
How to achieve this for any data.frame not known beforehand ?

You could do something like this (not very elegant though).
fun1 <- function(i) {
if (!all(is.na(as.numeric(df[, i])))){
as.numeric(df[, i])
} else {
df[, i]
}
}
df1 <- "names<-"(cbind.data.frame(lapply(seq_along(df), fun1),
stringsAsFactors=FALSE), names(df))
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4
Or more generally:
convertiblesToNumeric <- function(x){
x2 <- cbind.data.frame(lapply(seq_along(x), function(i) {
if (!all(is.na(as.numeric(x[, i])))){
as.numeric(x[, i])
} else {
x[, i]
}
}), stringsAsFactors=FALSE)
names(x2) <- names(x)
return(x2)
}
df1 <- convertiblesToNumeric(df)
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4

Related

using base R to convert list to dataframe while setting its class type and column names

There is a list like this
d <- list(c(1, 2, 3),
c("20210111", "20220122", "20220302"),
c("father", "mather", "brother"),
c("hello", "world", "again"))
I want to convert it to a dataframe while setting its class type and column names. I have two vectors for it.
# class type
type <- c("integer", "Date", "factor", "character")
# column names
nm <- c("num", "date", "relation", "chara")
I try to use rapply and type.convert function to do this, but it does not work.
The result should be like this
# > d2
# num date relation chara
# 1 1 2021-01-11 father hello
# 2 2 2022-01-22 mather world
# 3 3 2022-03-02 brother again
# > str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : num 1 2 3
# $ date : Date, format: "2021-01-11" "2022-01-22" "2022-03-02"
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
How to using base R to convert list to dataframe while setting its class type and column names?
Assuming you want this to be a programmatic venture:
d2 <- setNames(data.frame(d), nm)
str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : num 1 2 3
# $ date : chr "20210111" "20220122" "20220302"
# $ relation: chr "father" "mather" "brother"
# $ chara : chr "hello" "world" "again"
From here,
isdate <- type == "Date"
funs <- mget(paste0("as.", type), inherits = TRUE)
str(funs) # list of functions
# List of 4
# $ as.integer :function (x, ...)
# $ as.Date :function (x, ...)
# $ as.factor :function (x)
# $ as.character:function (x, ...)
d2[isdate] <- Map(function(f, ...) f(...), funs[isdate], d2[isdate], list(format = "%Y%m%d"))
d2[!isdate] <- Map(function(f, ...) f(...), funs[!isdate], d2[!isdate])
str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : int 1 2 3
# $ date : Date, format: "2021-01-11" "2022-01-22" "2022-03-02"
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
You could use structure(). Currently your Date variable isn't very handy and you need to do d[[2]] <- as.Date(d[[2]], format='%Y%m%d') beforehand.
r <- structure(Map(\(x, y, z) eval(parse(text=x)), sprintf('as.%s(y)', type), d),
class="data.frame",
row.names=c(NA, -3L),
names=nm)
r
# num date relation chara
# 1 1 2021-01-11 father hello
# 2 2 2022-01-22 mather world
# 3 3 2022-03-02 brother again
str(r)
# 'data.frame': 3 obs. of 4 variables:
# $ num : int 1 2 3
# $ date : Date, format: "2021-01-11" ...
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
Note: R >= 4.1 used.
Data:
d <- list(c(1, 2, 3),
c("20210111", "20220122", "20220302"),
c("father", "mather", "brother"),
c("hello", "world", "again"))
type <- c("integer", "Date", "factor", "character")
nm <- c("num", "date", "relation", "chara")
Here's a simple solution:
z <- as.data.frame(do.call(cbind, d))
colnames(z) <- nm
z$num <- as.integer(z$num)
z$date <- as.Date(z$date, format = "%Y%m%d")
z$relation <- as.factor(z$relation)
z$chara <- as.character(z$chara)

Convert columns in a list from character to numeric

I have a list with two data sets and I would like to convert each of the columns from character to numeric.
[[1]]
b m
2 12194.0968074593 703.359790781974
[[2]]
b m
2 49.2080763267713 30.9186232579308
> str(tidy_linear_regression)
List of 2
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "12194.0968074593"
..$ m: chr "703.359790781974"
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "49.2080763267713"
..$ m: chr "30.9186232579308"
I cannot come up with a code where I end up with a list.
I tried the following code and the result is always a data.frame:
tidy_linear_regression_new <-
lapply(tidy_linear_regression,
function(x) as.numeric(as.character(x)))
tidy_linear_regression_new<-
sapply(tidy_linear_regression,
as.character)
As you have multiple columns in the dataframe you need lapply inside a lapply -
tidy_linear_regression <- lapply(tidy_linear_regression, function(x) {
x[] <- lapply(x, as.numeric)
x
})
We may use tidyverse
library(purrr)
library(dplyr)
tidy_linear_regresion <- map(tidy_linear_regresion, ~ .x %>%
mutate(across(everything(), as.numeric)))

How to operate with factors in data.frame?

i'm new in R, and i try to operate with data frame:
screen
how to get numeric array from row 10
ar <-df[10,1] did't work
You can use gsub to remove brackets. Please see the code below:
# Simulation
x <- factor(c("[1]", "[2,3]", "[4]", "[]"))
str(x)
# Factor w/ 4 levels "[]","[1]","[2,3]",..: 2 3 4 1
foobar <- lapply(x, function(x) {
# remove brackets
s <- gsub("\\[||\\]", "", as.character(x))
as.numeric(unlist(strsplit(s, split = ",")))
})
str(foobar)
Output:
List of 4
$ : num 1
$ : num [1:2] 2 3
$ : num 4
$ : num(0)

how to assign each element of a list as arguments to a function in a loop in R?

I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.
groupList <- list(".(A, B)", "A", "B")
for(i in 1:length(groupList)){
output <- dt[,list(mean=mean(C),
sd=sd(C),
min=min(C),
median=median(C),
max=max(C)),
by = groupList[i]]
Here insert code to save each output
}
I guess aggregate function can solve your problem. Let us say you have a dataframe df contains three columns A,B,C,given as:
df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)
If you want calculate mean of C by factor A, try:
aggregate(formula=C~A,data=df,FUN=mean)
by factor B, try:
aggregate(formula=C~B,data=df,FUN=mean)
by factor A and B, try:
aggregate(formula=C~A+B,data=df,FUN=mean)
Your groupList can be restructured as a list of character vectors. Then you can either use lapply or the existing for loop with an added eval() to interpret the by= input properly:
set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)
groupList <- list(c("A", "B"), c("A"), c("B"))
lapply(
groupList,
function(x) {
dt[, .(mean=mean(C), sd=sd(C)), by=x]
}
)
out <- vector("list", 3)
for(i in 1:length(groupList)){
out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}
str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# ..$ A : int [1:6] 1 1 1 2 2 2
# ..$ B : int [1:6] 1 2 3 3 4 5
# ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
# ..$ sd : num [1:6] 0.707 0.707 NA NA 0.707 ...
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# ..$ A : int [1:2] 1 2
# ..$ mean: num [1:2] 3 8
# ..$ sd : num [1:2] 1.58 1.58
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
# ..$ B : int [1:5] 1 2 3 4 5
# ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
# ..$ sd : num [1:5] 0.707 0.707 0.707 0.707 0.707
For demonstration, I used the mtcars data set. Here is one way with the dplyr package.
library(dplyr)
# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")
# group by the variable gear
mtcars %>%
group_by(gear) %>%
summarise_at(vars(mpg), describe)
# group by the variable carb
mtcars %>%
group_by(carb) %>%
summarise_at(vars(mpg), describe)
# group by both gear and carb
mtcars %>%
group_by(gear, carb) %>%
summarise_at(vars(mpg), describe)

Add elements from a list to multiple other nested lists

I have a list of lists, as follows:
my_list = list(list(a=1,b=2),list(a=1,b=2),list(a=1,b=2))
I have a vector b_new, the length of which is exactly the same as length(my_list):
b_new = c(3,4,5)
I would like to overwrite the b-elements of my_list with the values in b sequentially, so the expected output is:
my_list = list(list(a=1,b=3),list(a=1,b=4),list(a=1,b=5))
I could obviously do this in a for loop:
for(i in 1:length(b_new))
{
my_list[[i]]$b=b_new[i]
}
but I wonder if there is a way of doing this without a for loop, for example using mapply?
It's still a loop really, but the following will do it:
Map(`[<-`, my_list, "b", b_new)
# or more pleasantly named:
Map(replace, my_list, "b", b_new)
str(Map(`[<-`, my_list, "b", b_new))
#List of 3
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 3
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 4
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 5

Resources