I have a df that where some columns are type character and I would like them to be integer values. I know I can switch typing using as.integer:
df$i <- as.integer(df$i)
But I would like to have a loop change a bunch of columns instead of having to run the command multiple times. Here's my code so far:
cols_to_change = c(37:49, 53:61)
for(i in cols_to_change)
{
class(df[, i]) <- 'integer'
}
I'm getting an error that a list object can't be converted to type 'integer', where am I wrong here? Is there an easier way to do this using one of the apply functions?
An easier way to do this would be to use dplyr::mutate_at:
df <- dplyr::mutate_at(df, c(37:49, 53:61), as.integer)
I think purrr::map or lapply offer fairly elegant solutions here (and just say no to for-loops in R if possible):
Let's make you a fake data frame with all character vectors:
> df <- data.frame(let1 = c('a', 'b', 'c'), num1 = c('1', '2', '3'),
let2 = c('d', 'e', 'f'), num2 = c('4', '5', '6'),
num3 = c('7', '8', '9'), let3 = c('g', 'h', 'i'),
stringsAsFactors = FALSE)
> str(df)
'data.frame': 3 obs. of 6 variables:
$ let1: chr "a" "b" "c"
$ num1: chr "1" "2" "3"
$ let2: chr "d" "e" "f"
$ num2: chr "4" "5" "6"
$ num3: chr "7" "8" "9"
$ let3: chr "g" "h" "i"
Then we want to change num1, num2, and num3 into integer vectors (columns 2, 4, and 5). For illustration, copy df to df2 and then use purrr::map. Here I refer to the columns by their column number, but you could also use the names.
> df2 <- df
> df2[, c(2,4,5)] <- purrr::map(df2[, c(2,4,5)], as.integer)
> str(df2)
'data.frame': 3 obs. of 6 variables:
$ let1: chr "a" "b" "c"
$ num1: int 1 2 3
$ let2: chr "d" "e" "f"
$ num2: int 4 5 6
$ num3: int 7 8 9
$ let3: chr "g" "h" "i"
If you don't want to load any other packages, lapply will work:
> df3 <- df
> df3[, c(2,4,5)] <- lapply(df3[, c(2,4,5)], as.integer)
> str(df3)
'data.frame': 3 obs. of 6 variables:
$ let1: chr "a" "b" "c"
$ num1: int 1 2 3
$ let2: chr "d" "e" "f"
$ num2: int 4 5 6
$ num3: int 7 8 9
$ let3: chr "g" "h" "i"
Related
I have a dataframe such as
df
COL1 COL2
A "[Lasius_niger]"
B "[Canis_lupus,Feis_cattus]"
C "[Cattus_stigmatizans,Cattus_cattus"]
D "[Apis_mellifera]"
and in my code I iterate each row of df$COL2 into a commande where I need that the cotent is a list.
So I need to transforme the df$COL2 into a list inside the dataframe
So I should get something like that I guess:
COL1 COL2
A "Lasius_niger"
B "Canis_lupus","Feis_cattus"
C "Cattus_stigmatizans","Cattus_cattus"
D "Apis_mellifera"
does someone have an idea ?
Remove opening and closing square brackets using gsub and split string on comma.
df$COL2 <- strsplit(gsub('\\[|\\]', '', df$COL2), ',')
str(df)
#'data.frame': 4 obs. of 2 variables:
# $ COL1: chr "A" "B" "C" "D"
# $ COL2:List of 4
# ..$ : chr "Lasius_niger"
# ..$ : chr "Canis_lupus" "Feis_cattus"
# ..$ : chr "Cattus_stigmatizans" "Cattus_cattus"
# ..$ : chr "Apis_mellifera"
data
df <- structure(list(COL1 = c("A", "B", "C", "D"), COL2 = c("[Lasius_niger]",
"[Canis_lupus,Feis_cattus]", "[Cattus_stigmatizans,Cattus_cattus]",
"[Apis_mellifera]")), class = "data.frame", row.names = c(NA, -4L))
You can also use the function stri_extract_all_words in the stringi package as follows
df$COL2 <- stringi::stri_extract_all_words(df$COL2)
str(df)
#'data.frame': 4 obs. of 2 variables:
# $ COL1: chr "A" "B" "C" "D"
# $ COL2:List of 4
# ..$ : chr "Lasius_niger"
# ..$ : chr "Canis_lupus" "Feis_cattus"
# ..$ : chr "Cattus_stigmatizans" "Cattus_cattus"
# ..$ : chr "Apis_mellifera"
We can also use str_extract_all
library(stringr)
df$COL2 <- str_extract_all(df$COL2, "\\w+")
Or another option from qdapRegex
library(qdapRegex)
rm_square(df$COL2, extract = TRUE)
I want to erase all attributes from data and applied this solution. However neither one_entry() (the original) nor my one_entry2() will work and I don't see why.
one_entry2 <- function(x) {
attr(x, "label") <- NULL
attr(x, "labels") <- NULL
}
> lapply(df1, one_entry2)
$`id`
NULL
$V1
NULL
$V2
NULL
$V3
NULL
How can we do this?
Data:
df1 <- setNames(data.frame(matrix(1:12, 3, 4)),
c("id", paste0("V", 1:3)))
attr(df1$V1, "labels") <- LETTERS[1:4]
attr(df1$V1, "label") <- letters[1:4]
attr(df1$V2, "labels") <- LETTERS[1:4]
attr(df1$V2, "label") <- letters[1:4]
attr(df1$V3, "labels") <- LETTERS[1:4]
attr(df1$V3, "label") <- letters[1:4]
> str(df1)
'data.frame': 3 obs. of 4 variables:
$ id: int 1 2 3
$ V1: int 4 5 6
..- attr(*, "labels")= chr "A" "B" "C" "D"
..- attr(*, "label")= chr "a" "b" "c" "d"
$ V2: int 7 8 9
..- attr(*, "labels")= chr "A" "B" "C" "D"
..- attr(*, "label")= chr "a" "b" "c" "d"
$ V3: int 10 11 12
..- attr(*, "labels")= chr "A" "B" "C" "D"
..- attr(*, "label")= chr "a" "b" "c" "d"
To remove all attributes, how about this
df1[] <- lapply(df1, function(x) { attributes(x) <- NULL; x })
str(df1)
#'data.frame': 3 obs. of 4 variables:
# $ id: int 1 2 3
# $ V1: int 4 5 6
# $ V2: int 7 8 9
# $ V3: int 10 11 12
Simplifying a bit #maurits-evers answer:
df1[] <- lapply(df1, as.vector)
str(df1)
#'data.frame': 3 obs. of 4 variables:
# $ id: int 1 2 3
# $ V1: int 4 5 6
# $ V2: int 7 8 9
# $ V3: int 10 11 12
The original answer is by Prof. Brian Ripley in this R-Help post.
In tidyverse world:
df1 <- df1 %>% mutate(across(everything(), as.vector))
With data.table
library(data.table)
# Assuming
# setDT(df1) # or
# df1 <- as.data.table(df1)
df1 <- df1[, lapply(.SD, as.vector)]
Provided all the columns are the same type (as in your example) you can do
df1[] = c(df1, recursive=TRUE)
The PKPDmisc package has a dplyr friendly way to do this:
library(PKPDmisc)
df %>% strip_attributes(c("label", "labels"))
The following is a simple solution (and will not convert a date class to a numeric):
df1 <- data.frame(df1)
For certain situations, a modified version of the answer by #maurits-evers may be useful.
Create a function to remove attributes.
remove_attributes <- function(x) {attributes(x) <- NULL; return(x)}
To remove attributes from one element in a list.
df1$V1 <- remove_attributes(df1$V1)
To remove attributes from all elements in a list.
df1 <- lapply(df1, remove_attributes)
After a previous post regarding coercion of variables into their appropriate format, I realized that the problem is due to unlist():ing, which appears to kill off the object class of variables.
Consider a nested list (myList) of the following structure
> str(myList)
List of 2
$ lst1:List of 3
..$ var1: chr [1:4] "A" "B" "C" "D"
..$ var2: num [1:4] 1 2 3 4
..$ var3: Date[1:4], format: "1999-01-01" "2000-01-01" "2001-01-01" "2002-01-01"
$ lst2:List of 3
..$ var1: chr [1:4] "Q" "W" "E" "R"
..$ var2: num [1:4] 11 22 33 44
..$ var3: Date[1:4], format: "1999-01-02" "2000-01-03" "2001-01-04" "2002-01-05"
which contains different object types (character, numeric and Date) at the lowest level. I`ve been using
myNewLst <- lapply(myList, function(x) unlist(x,recursive=FALSE))
result <- do.call("rbind", myNewLst)
to get the desired structure of my resulting matrix. However, this yields a coercion into character for all variables, as seen here:
> str(result)
chr [1:2, 1:12] "A" "Q" "B" "W" "C" "E" "D" "R" "1" "11" "2" "22" "3" "33" "4" "44" "10592" "10593" "10957" "10959" "11323" "11326" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "lst1" "lst2"
..$ : chr [1:12] "var11" "var12" "var13" "var14" ...
After reading a post on a similar issue, I've attempted to utilize do.call("c", x)
myNewLst <- lapply(myList, function(x) do.call("c", x))
result <- do.call("rbind", myNewLst)
Unfortunately, this also results in all variables being characters, as my first attempt. So my question is: How do I unlist a nested list without loosing the object class of my lower-level variables? Are there alternatives which will accomplish the desired result?
Reproducible code for myList:
myList <- list(
"lst1" = list(
"var1" = c("A","B","C","D"),
"var2" = c(1,2,3,4),
"var3" = c(as.Date('1999/01/01'),as.Date('2000/01/01'),as.Date('2001/01/01'),as.Date('2002/01/01'))
),
"lst2" = list(
"var1" = c("Q","W","E","R"),
"var2" = c(11,22,33,44),
"var3" = c(as.Date('1999/01/02'),as.Date('2000/01/03'),as.Date('2001/01/4'),as.Date('2002/01/05'))
)
)
You can use Reduce() or do.call() to be able to combine all of the to one dataframe. The code below should work
Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F))
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Also the class is maintained:
mapply(class,Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F)))
var1 var2 var3
"character" "numeric" "Date"
If your goal is to convert this list of lists into a single data frame, the following code should work:
result <- data.frame(var1 = unlist(lapply(myList, function(e) e[1]), use.names = FALSE),
var2 = unlist(lapply(myList, function(e) e[2]), use.names = FALSE),
var3 = as.Date(unlist(lapply(myList, function(e) e[3]), use.names = FALSE), origin = "1970-01-01"))
This gives:
> result
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Of course, you could use a for-loop to make the code more succinct if there are multiple variables in each list.
I have a nested named list in R and given a name, I want to check whether that's present in the names of that nested list.
For level 1 depth, given_name %in% names(list) is working fine. But how to search for names at different levels.
For ex:
list (a:1, b:1, c:( c_a:2,c_b:3 )). How to check whether c$c_a is in the list.
I. Creating Nested List
Your_list <- list(a=list(x=c(4,5)),b=list(c=list(y=c(8,99)),d=c("a","b")))
names(Your_list)
# [1] "a" "b"
names(.Internal(unlist(Your_list, TRUE, TRUE)))
# [1] "a.x1" "a.x2" "b.c.y1" "b.c.y2" "b.d1" "b.d2"
str(Your_list)
# List of 2
# $ a:List of 1
# ..$ x: num [1:2] 4 5
# $ b:List of 2
# ..$ c:List of 1
# .. ..$ y: num [1:2] 8 99
# ..$ d: chr [1:2] "a" "b"
II. Removing Nesting from the list
New_list <- unlist(Your_list)
New_list
# a.x1 a.x2 b.c.y1 b.c.y2 b.d1 b.d2
# "4" "5" "8" "99" "a" "b"
class(New_list)
# [1] "character"
str(New_list)
# Named chr [1:6] "4" "5" "8" "99" "a" "b"
# - attr(*, "names")= chr [1:6] "a.x1" "a.x2" "b.c.y1" "b.c.y2" ...
III. Converting it to list without nesting
New_list <- as.list(New_list)
New_list
# $a.x1
# [1] "4"
# $a.x2
# [1] "5"
# $b.c.y1
# [1] "8"
# $b.c.y2
# [1] "99"
# $b.d1
# [1] "a"
# $b.d2
# [1] "b"
class(New_list)
# [1] "list"
str(New_list)
# List of 6
# $ a.x1 : chr "4"
# $ a.x2 : chr "5"
# $ b.c.y1: chr "8"
# $ b.c.y2: chr "99"
# $ b.d1 : chr "a"
# $ b.d2 : chr "b"
IV. Accessing elements from Flat list New_list by names
New_list$a.x1
# [1] "4"
New_list$a.x2
# [1] "5"
New_list$b.d2
# [1] "b"
New_list$b.c.y2
# [1] "99"
Note: Here, the class is not preserved for the elements of flatten list. You will need to preserve the class when unlisting the list.
As you see all of them are character at the end.
This has all the signs of being something that's so trivially stupid that I'll regret asking it in a public forum, but I've now stumped a few people on it so c'est la vie.
I'm running the following block of code, and not getting the result that I expect:
zz <- list(a=list('a', 'b', 'c', 'd'), b=list('f', 'g', '2', '1'),
c=list('t', 'w', 'x', '6'))
padMat <- do.call('cbind', zz)
headMat <- matrix(c(colnames(padMat), rep('foo', ncol(padMat))), nrow=2, byrow=TRUE)
rbind(headMat, padMat)
I had expected:
a b c
foo foo foo
a f t
b g w
c 2 x
d 1 6
Instead I'm getting:
a b c
a f t
b g w
c 2 x
d 1 6
NULL NULL NULL
It appears that it's filling in the upper part of the rbind by row, and then adding a row of NULL values at the end.
A couple of notes:
This works AOK as long as headMat is a single row
To double check, I also got rid of the dimnames for padMat, this wasn't affecting things
Another thought was that it somehow had to do with the byrow=TRUE, but the same behavior happens if you take that out
padMat is a list (with a dim attribute), not what you usually think of as a matrix.
> padMat <- do.call('cbind', zz)
> str(padMat)
List of 12
$ : chr "a"
$ : chr "b"
$ : chr "c"
$ : chr "d"
$ : chr "f"
$ : chr "g"
$ : chr "2"
$ : chr "1"
$ : chr "t"
$ : chr "w"
$ : chr "x"
$ : chr "6"
- attr(*, "dim")= int [1:2] 4 3
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "a" "b" "c"
I suspect you want something like:
> padMat <- do.call(cbind,lapply(zz,c,recursive=TRUE))
> str(padMat)
chr [1:4, 1:3] "a" "b" "c" "d" "f" "g" "2" "1" "t" "w" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "a" "b" "c"
The lesson here is, "str is your friend." :)
The problem appears to stem from the fact that padMat is a strange matrix. R reports that is a list of 12 with dimensions:
R> str(padMat)
List of 12
$ : chr "a"
$ : chr "b"
$ : chr "c"
$ : chr "d"
$ : chr "f"
$ : chr "g"
$ : chr "2"
$ : chr "1"
$ : chr "t"
$ : chr "w"
$ : chr "x"
$ : chr "6"
- attr(*, "dim")= int [1:2] 4 3
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "a" "b" "c"
That appears to be the source of the problem, as recasting as a matrix works:
R> rbind(headMat, matrix(unlist(padMat), ncol = 3))
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "foo" "foo" "foo"
[3,] "a" "f" "t"
[4,] "b" "g" "w"
[5,] "c" "2" "x"
[6,] "d" "1" "6"
Others have correctly pointed out the fact that padMat had mode list, which if you look at the docs for rbind and cbind, is bad:
In the default method, all the vectors/matrices must be atomic (see vector) or lists.
That's why the do.call works, since the elements of zz are themselves lists. If you change the definition of zz to the following:
zz <- list(a=c('a', 'b', 'c', 'd'), b=c('f', 'g', '2', '1'),
c=c('t', 'w', 'x', '6'))
the code works as expected.
More insight can be had, I think, from this nugget also in the docs for rbind and cbind:
The type of a matrix result determined from the highest type of any of the inputs
in the hierarchy raw < logical < integer < real < complex < character < list .