I have a list with two data sets and I would like to convert each of the columns from character to numeric.
[[1]]
b m
2 12194.0968074593 703.359790781974
[[2]]
b m
2 49.2080763267713 30.9186232579308
> str(tidy_linear_regression)
List of 2
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "12194.0968074593"
..$ m: chr "703.359790781974"
$ :'data.frame': 1 obs. of 2 variables:
..$ b: chr "49.2080763267713"
..$ m: chr "30.9186232579308"
I cannot come up with a code where I end up with a list.
I tried the following code and the result is always a data.frame:
tidy_linear_regression_new <-
lapply(tidy_linear_regression,
function(x) as.numeric(as.character(x)))
tidy_linear_regression_new<-
sapply(tidy_linear_regression,
as.character)
As you have multiple columns in the dataframe you need lapply inside a lapply -
tidy_linear_regression <- lapply(tidy_linear_regression, function(x) {
x[] <- lapply(x, as.numeric)
x
})
We may use tidyverse
library(purrr)
library(dplyr)
tidy_linear_regresion <- map(tidy_linear_regresion, ~ .x %>%
mutate(across(everything(), as.numeric)))
Related
The situation is the following: I have a list of dataframes, and for each dataframe I have a list of columns whose format I need to change. Setup:
df1 <- data.frame(a = c("2020-03-02", "2020-12-22", "2020-07-03"), b = c(4, 5, 6), c = c("2020-03-13", "2019-11-03", "2011-05-02"))
df2 <- data.frame(d = c(1, 2, 3), e = c("2020-05-21", "2014-08-31", "1999-01-21"), f = c(7, 8, 9))
datasets <- list("first" = df1, "second" = df2)
dates <- list("first" = c("a", "c"), "second" = c("e"))
One could do this by 1. Looping over the list of dataframes, 2. for each dataframe, looping over the list of columns one wants to change, and reassign them in place. Something like this:
for (i in names(datasets)) {
for (j in dates[i]) {
for (k in datasets[[i]][j]) {
k <- as.Date(k)
}
}
}
This is ugly, so I wanted to try to do the same using purrr. I thought this would be a good idea:
library(purrr)
walk2(datasets, dates, ~ walk(.x[.y], ~ {.x <- as.Date(.x)}))
But the datasets remain unperturbed after this operation. Why?
Here is a solution that uses purrr and dplyr:
library(purrr)
library(dplyr)
datasets <- datasets %>%
imap(~{
.x %>%
mutate_at(vars(dates[[.y]]), as.Date)
})
str(datasets)
#List of 2
#$ first :'data.frame': 3 obs. of 3 variables:
# ..$ a: Date[1:3], format: "2020-03-02" "2020-12-22" "2020-07-03"
# ..$ b: num [1:3] 4 5 6
# ..$ c: Date[1:3], format: "2020-03-13" "2019-11-03" "2011-05-02"
#$ second:'data.frame': 3 obs. of 3 variables:
# ..$ d: num [1:3] 1 2 3
# ..$ e: Date[1:3], format: "2020-05-21" "2014-08-31" "1999-01-21"
# ..$ f: num [1:3] 7 8 9
i'm new in R, and i try to operate with data frame:
screen
how to get numeric array from row 10
ar <-df[10,1] did't work
You can use gsub to remove brackets. Please see the code below:
# Simulation
x <- factor(c("[1]", "[2,3]", "[4]", "[]"))
str(x)
# Factor w/ 4 levels "[]","[1]","[2,3]",..: 2 3 4 1
foobar <- lapply(x, function(x) {
# remove brackets
s <- gsub("\\[||\\]", "", as.character(x))
as.numeric(unlist(strsplit(s, split = ",")))
})
str(foobar)
Output:
List of 4
$ : num 1
$ : num [1:2] 2 3
$ : num 4
$ : num(0)
There is a nice discussion of how to convert character into numerics in this SO here.
Maybe I missed something in that post, but what would one do if one does not know which columns are "convertable" (if any) ?
Is it possible to check for convertability ?
In addition, I usually suppress factor conversion (like character better) - so characters should be characters (not factors).
df <- data.frame(a=as.character(c(NA, 1/3)), b=letters[1:2], c=c('1|2', '4|2'), d=as.character(3:4), stringsAsFactors = F)
Then apply ... some function f ... to get:
str(f(df))
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: int 3 4
How to achieve this for any data.frame not known beforehand ?
You could do something like this (not very elegant though).
fun1 <- function(i) {
if (!all(is.na(as.numeric(df[, i])))){
as.numeric(df[, i])
} else {
df[, i]
}
}
df1 <- "names<-"(cbind.data.frame(lapply(seq_along(df), fun1),
stringsAsFactors=FALSE), names(df))
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4
Or more generally:
convertiblesToNumeric <- function(x){
x2 <- cbind.data.frame(lapply(seq_along(x), function(i) {
if (!all(is.na(as.numeric(x[, i])))){
as.numeric(x[, i])
} else {
x[, i]
}
}), stringsAsFactors=FALSE)
names(x2) <- names(x)
return(x2)
}
df1 <- convertiblesToNumeric(df)
> str(df1)
'data.frame': 2 obs. of 4 variables:
$ a: num NA 0.333
$ b: chr "a" "b"
$ c: chr "1|2" "4|2"
$ d: num 3 4
I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.
groupList <- list(".(A, B)", "A", "B")
for(i in 1:length(groupList)){
output <- dt[,list(mean=mean(C),
sd=sd(C),
min=min(C),
median=median(C),
max=max(C)),
by = groupList[i]]
Here insert code to save each output
}
I guess aggregate function can solve your problem. Let us say you have a dataframe df contains three columns A,B,C,given as:
df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)
If you want calculate mean of C by factor A, try:
aggregate(formula=C~A,data=df,FUN=mean)
by factor B, try:
aggregate(formula=C~B,data=df,FUN=mean)
by factor A and B, try:
aggregate(formula=C~A+B,data=df,FUN=mean)
Your groupList can be restructured as a list of character vectors. Then you can either use lapply or the existing for loop with an added eval() to interpret the by= input properly:
set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)
groupList <- list(c("A", "B"), c("A"), c("B"))
lapply(
groupList,
function(x) {
dt[, .(mean=mean(C), sd=sd(C)), by=x]
}
)
out <- vector("list", 3)
for(i in 1:length(groupList)){
out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}
str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# ..$ A : int [1:6] 1 1 1 2 2 2
# ..$ B : int [1:6] 1 2 3 3 4 5
# ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
# ..$ sd : num [1:6] 0.707 0.707 NA NA 0.707 ...
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# ..$ A : int [1:2] 1 2
# ..$ mean: num [1:2] 3 8
# ..$ sd : num [1:2] 1.58 1.58
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
# ..$ B : int [1:5] 1 2 3 4 5
# ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
# ..$ sd : num [1:5] 0.707 0.707 0.707 0.707 0.707
For demonstration, I used the mtcars data set. Here is one way with the dplyr package.
library(dplyr)
# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")
# group by the variable gear
mtcars %>%
group_by(gear) %>%
summarise_at(vars(mpg), describe)
# group by the variable carb
mtcars %>%
group_by(carb) %>%
summarise_at(vars(mpg), describe)
# group by both gear and carb
mtcars %>%
group_by(gear, carb) %>%
summarise_at(vars(mpg), describe)
Consider the simple example:
library(dplyr)
dat <- data.frame( a = 1, b = 2 )
attr(dat, "myattr") <- "xyz"
dat %>% mutate(c = 3) %>% str()
## 'data.frame': 1 obs. of 3 variables:
## $ a: num 1
## $ b: num 2
## $ c: num 3
So dplyr drops the attribute. Is it possible to force it not to drop it?
More general: is it possible to force R not to drop attributes when changing object class?