How to operate with factors in data.frame? - r

i'm new in R, and i try to operate with data frame:
screen
how to get numeric array from row 10
ar <-df[10,1] did't work

You can use gsub to remove brackets. Please see the code below:
# Simulation
x <- factor(c("[1]", "[2,3]", "[4]", "[]"))
str(x)
# Factor w/ 4 levels "[]","[1]","[2,3]",..: 2 3 4 1
foobar <- lapply(x, function(x) {
# remove brackets
s <- gsub("\\[||\\]", "", as.character(x))
as.numeric(unlist(strsplit(s, split = ",")))
})
str(foobar)
Output:
List of 4
$ : num 1
$ : num [1:2] 2 3
$ : num 4
$ : num(0)

Related

using base R to convert list to dataframe while setting its class type and column names

There is a list like this
d <- list(c(1, 2, 3),
c("20210111", "20220122", "20220302"),
c("father", "mather", "brother"),
c("hello", "world", "again"))
I want to convert it to a dataframe while setting its class type and column names. I have two vectors for it.
# class type
type <- c("integer", "Date", "factor", "character")
# column names
nm <- c("num", "date", "relation", "chara")
I try to use rapply and type.convert function to do this, but it does not work.
The result should be like this
# > d2
# num date relation chara
# 1 1 2021-01-11 father hello
# 2 2 2022-01-22 mather world
# 3 3 2022-03-02 brother again
# > str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : num 1 2 3
# $ date : Date, format: "2021-01-11" "2022-01-22" "2022-03-02"
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
How to using base R to convert list to dataframe while setting its class type and column names?
Assuming you want this to be a programmatic venture:
d2 <- setNames(data.frame(d), nm)
str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : num 1 2 3
# $ date : chr "20210111" "20220122" "20220302"
# $ relation: chr "father" "mather" "brother"
# $ chara : chr "hello" "world" "again"
From here,
isdate <- type == "Date"
funs <- mget(paste0("as.", type), inherits = TRUE)
str(funs) # list of functions
# List of 4
# $ as.integer :function (x, ...)
# $ as.Date :function (x, ...)
# $ as.factor :function (x)
# $ as.character:function (x, ...)
d2[isdate] <- Map(function(f, ...) f(...), funs[isdate], d2[isdate], list(format = "%Y%m%d"))
d2[!isdate] <- Map(function(f, ...) f(...), funs[!isdate], d2[!isdate])
str(d2)
# 'data.frame': 3 obs. of 4 variables:
# $ num : int 1 2 3
# $ date : Date, format: "2021-01-11" "2022-01-22" "2022-03-02"
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
You could use structure(). Currently your Date variable isn't very handy and you need to do d[[2]] <- as.Date(d[[2]], format='%Y%m%d') beforehand.
r <- structure(Map(\(x, y, z) eval(parse(text=x)), sprintf('as.%s(y)', type), d),
class="data.frame",
row.names=c(NA, -3L),
names=nm)
r
# num date relation chara
# 1 1 2021-01-11 father hello
# 2 2 2022-01-22 mather world
# 3 3 2022-03-02 brother again
str(r)
# 'data.frame': 3 obs. of 4 variables:
# $ num : int 1 2 3
# $ date : Date, format: "2021-01-11" ...
# $ relation: Factor w/ 3 levels "brother","father",..: 2 3 1
# $ chara : chr "hello" "world" "again"
Note: R >= 4.1 used.
Data:
d <- list(c(1, 2, 3),
c("20210111", "20220122", "20220302"),
c("father", "mather", "brother"),
c("hello", "world", "again"))
type <- c("integer", "Date", "factor", "character")
nm <- c("num", "date", "relation", "chara")
Here's a simple solution:
z <- as.data.frame(do.call(cbind, d))
colnames(z) <- nm
z$num <- as.integer(z$num)
z$date <- as.Date(z$date, format = "%Y%m%d")
z$relation <- as.factor(z$relation)
z$chara <- as.character(z$chara)

Convert columns in a list from character to numeric

I have a list with two data sets and I would like to convert each of the columns from character to numeric.
[[1]]
b m
2 12194.0968074593 703.359790781974
[[2]]
b m
2 49.2080763267713 30.9186232579308
> str(tidy_linear_regression)
List of 2
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "12194.0968074593"
..$ m: chr "703.359790781974"
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "49.2080763267713"
..$ m: chr "30.9186232579308"
I cannot come up with a code where I end up with a list.
I tried the following code and the result is always a data.frame:
tidy_linear_regression_new <-
lapply(tidy_linear_regression,
function(x) as.numeric(as.character(x)))
tidy_linear_regression_new<-
sapply(tidy_linear_regression,
as.character)
As you have multiple columns in the dataframe you need lapply inside a lapply -
tidy_linear_regression <- lapply(tidy_linear_regression, function(x) {
x[] <- lapply(x, as.numeric)
x
})
We may use tidyverse
library(purrr)
library(dplyr)
tidy_linear_regresion <- map(tidy_linear_regresion, ~ .x %>%
mutate(across(everything(), as.numeric)))

R: Convert characters to numeric in data.frame with unknown column classes

There is a nice discussion of how to convert character into numerics in this SO here.
Maybe I missed something in that post, but what would one do if one does not know which columns are "convertable" (if any) ?
Is it possible to check for convertability ?
In addition, I usually suppress factor conversion (like character better) - so characters should be characters (not factors).
df <- data.frame(a=as.character(c(NA, 1/3)), b=letters[1:2], c=c('1|2', '4|2'), d=as.character(3:4), stringsAsFactors = F)
Then apply ... some function f ... to get:
str(f(df))
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: int 3 4
How to achieve this for any data.frame not known beforehand ?
You could do something like this (not very elegant though).
fun1 <- function(i) {
if (!all(is.na(as.numeric(df[, i])))){
as.numeric(df[, i])
} else {
df[, i]
}
}
df1 <- "names<-"(cbind.data.frame(lapply(seq_along(df), fun1),
stringsAsFactors=FALSE), names(df))
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4
Or more generally:
convertiblesToNumeric <- function(x){
x2 <- cbind.data.frame(lapply(seq_along(x), function(i) {
if (!all(is.na(as.numeric(x[, i])))){
as.numeric(x[, i])
} else {
x[, i]
}
}), stringsAsFactors=FALSE)
names(x2) <- names(x)
return(x2)
}
df1 <- convertiblesToNumeric(df)
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4

how to assign each element of a list as arguments to a function in a loop in R?

I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.
groupList <- list(".(A, B)", "A", "B")
for(i in 1:length(groupList)){
output <- dt[,list(mean=mean(C),
sd=sd(C),
min=min(C),
median=median(C),
max=max(C)),
by = groupList[i]]
Here insert code to save each output
}
I guess aggregate function can solve your problem. Let us say you have a dataframe df contains three columns A,B,C,given as:
df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)
If you want calculate mean of C by factor A, try:
aggregate(formula=C~A,data=df,FUN=mean)
by factor B, try:
aggregate(formula=C~B,data=df,FUN=mean)
by factor A and B, try:
aggregate(formula=C~A+B,data=df,FUN=mean)
Your groupList can be restructured as a list of character vectors. Then you can either use lapply or the existing for loop with an added eval() to interpret the by= input properly:
set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)
groupList <- list(c("A", "B"), c("A"), c("B"))
lapply(
groupList,
function(x) {
dt[, .(mean=mean(C), sd=sd(C)), by=x]
}
)
out <- vector("list", 3)
for(i in 1:length(groupList)){
out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}
str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# ..$ A : int [1:6] 1 1 1 2 2 2
# ..$ B : int [1:6] 1 2 3 3 4 5
# ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
# ..$ sd : num [1:6] 0.707 0.707 NA NA 0.707 ...
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# ..$ A : int [1:2] 1 2
# ..$ mean: num [1:2] 3 8
# ..$ sd : num [1:2] 1.58 1.58
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
# ..$ B : int [1:5] 1 2 3 4 5
# ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
# ..$ sd : num [1:5] 0.707 0.707 0.707 0.707 0.707
For demonstration, I used the mtcars data set. Here is one way with the dplyr package.
library(dplyr)
# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")
# group by the variable gear
mtcars %>%
group_by(gear) %>%
summarise_at(vars(mpg), describe)
# group by the variable carb
mtcars %>%
group_by(carb) %>%
summarise_at(vars(mpg), describe)
# group by both gear and carb
mtcars %>%
group_by(gear, carb) %>%
summarise_at(vars(mpg), describe)

Append a data frame to a list

I'm trying to figure out how to add a data.frame or data.table to the first position in a list.
Ideally, I want a list structured as follows:
List of 4
$ :'data.frame': 1 obs. of 3 variables:
..$ a: num 2
..$ b: num 1
..$ c: num 3
$ d: num 4
$ e: num 5
$ f: num 6
Note the data.frame is an object within the structure of the list.
The problem is that I need to add the data frame to the list after the list has been created, and the data frame has to be the first element in the list. I'd like to do this using something simple like append, but when I try:
append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0)
I get a list structured:
str(append(list(1,2,3),data.frame(a=2,b=1,c=3),after=0))
List of 6
$ a: num 2
$ b: num 1
$ c: num 3
$ : num 1
$ : num 2
$ : num 3
It appears that R is coercing data.frame into a list when I'm trying to append. How do I prevent it from doing so? Or what alternative method might there be for constructing this list, inserting the data.frame into the list in position 1, after the list's initial creation.
The issue you are having is that to put a data frame anywhere into a list as a single list element, it must be wrapped with list(). Let's have a look.
df <- data.frame(1, 2, 3)
x <- as.list(1:3)
If we just wrap with c(), which is what append() is doing under the hood, we get
c(df)
# $X1
# [1] 1
#
# $X2
# [1] 2
#
# $X3
# [1] 3
But if we wrap it in list() we get the desired list element containing the data frame.
list(df)
# [[1]]
# X1 X2 X3
# 1 1 2 3
Therefore, since x is already a list, we will need to use the following construct.
c(list(df), x) ## or append(x, list(df), 0)
# [[1]]
# X1 X2 X3
# 1 1 2 3
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 2
#
# [[4]]
# [1] 3

Resources