I'm trying to rename the columns of a matrix that has no names in dplyr :
set.seed(1234)
v1 <- table(round(runif(50,0,10)))
v2 <- table(round(runif(50,0,10)))
library(dplyr)
bind_rows(v1,v2) %>%
t
[,1] [,2]
0 3 4
1 1 9
2 8 6
3 11 7
5 7 8
6 7 1
7 3 4
8 6 3
9 3 6
10 1 NA
4 NA 2
I usually use rename for that with the form rename(new_name=old_name) however because there is no old_name it doesn't work. I've tried:
rename("v1","v2")
rename(c("v1","v2")
rename(v1=1, v2=2)
rename(v1=[,1],v2=[,v2])
rename(v1="[,1]",v2="[,v2]")
rename_(.dots = c("v1","v2"))
setNames(c("v1","v2"))
none of these works.
I know the base R way to do it (colnames(obj) <- c("v1","v2")) but I'm specifically looking for a dplyrway to do it.
This one with magrittr:
library(dplyr)
bind_rows(v1,v2) %>%
t %>%
magrittr::set_colnames(c("new1", "new2"))
In order to use rename you need to have some sort of a list (like a data frame or a tibble). So you can do two things. You either convert to tibble and use rename or use colnames and leave the structure as is, i.e.
new_d <- bind_rows(v1,v2) %>%
t() %>%
as.tibble() %>%
rename('A' = 'V1', 'B' = 'V2')
#where
str(new_d)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 11 obs. of 2 variables:
# $ A: int 3 1 8 11 7 7 3 6 3 1 ...
# $ B: int 4 9 6 7 8 1 4 3 6 NA ...
Or
new_d1 <- bind_rows(v1,v2) %>%
t() %>%
`colnames<-`(c('A', 'B'))
#where
str(new_d1)
# int [1:11, 1:2] 3 1 8 11 7 7 3 6 3 1 ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:11] "0" "1" "2" "3" ...
# ..$ : chr [1:2] "A" "B"
Related
I've got a data.frame which contains a character variable and multiple numeric variables, something like this:
sampleDF <- data.frame(a = c(1,2,3,"String"), b = c(1,2,3,4), c= c(5,6,7,8), stringsAsFactors = FALSE)
Which looks like this:
a b c
1 1 1 5
2 2 2 6
3 3 3 7
4 String 4 8
I'd like to transpose this data.frame and get it to look like this:
V1 V2 V3 V4
1 1 2 3 String
2 1 2 3 4
3 5 6 7 8
I tried
c<-t(sampleDF)
as well as
d<-transpose(sampleDF)
but both these methods result in V1, V2 and V3 now being of characer type despite only having numeric values.
I know that this has already been asked multiple times. However, I haven't found a suitable answer for why in this case V1, V2 and V3 are also being converted to character.
Is there any way how ensure that these column stay numeric?
Thanks a lot any apologies already for the duplicate nature of this question.
EDIT:
as.data.frame(t(sampleDF)
Does not solve the problem:
'data.frame': 3 obs. of 4 variables:
$ V1: Factor w/ 2 levels "1","5": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V2: Factor w/ 2 levels "2","6": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V3: Factor w/ 2 levels "3","7": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V4: Factor w/ 3 levels "4","8","String": 3 1 2
..- attr(*, "names")= chr "a" "b" "c"
After transposing it, convert the columns to numeric with type.convert
out <- as.data.frame(t(sampleDF), stringsAsFactors = FALSE)
out[] <- lapply(out, type.convert, as.is = TRUE)
row.names(out) <- NULL
out
# V1 V2 V3 V4
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
str(out)
#'data.frame': 3 obs. of 4 variables:
# $ V1: int 1 1 5
# $ V2: int 2 2 6
# $ V3: int 3 3 7
# $ V4: chr "String" "4" "8"
Or rbind the first column converted to respective 'types' with the transposed other columns
rbind(lapply(sampleDF[,1], type.convert, as.is = TRUE),
as.data.frame(t(sampleDF[2:3])))
NOTE: The first method would be more efficient
Or another approach would be to paste the values together in each column and then read it again
read.table(text=paste(sapply(sampleDF, paste, collapse=" "),
collapse="\n"), header = FALSE, stringsAsFactors = FALSE)
# V1 V2 V3 V4
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
Or we can convert the 'data.frame' to 'data.matrix' which changes the character elements to NA, use the is.na to find the index of elements that are NA for replacing with the original string values
m1 <- data.matrix(sampleDF)
out <- as.data.frame(t(m1))
out[is.na(out)] <- sampleDF[is.na(m1)]
Or another option is type_convert from readr
library(dplyr)
library(readr)
sampleDF %>%
t %>%
as_data_frame %>%
type_convert
# A tibble: 3 x 4
# V1 V2 V3 V4
# <int> <int> <int> <chr>
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
I have 2 dataframes.
Template - I will be using data types from this data frame.
df - I want to change datatypes of this data frame based on template.
I want to change data types of second dataframe based on first. Lets suppose I have below data frame which I am using as template.
> template
id <- c(1,2,3,4)
a <- c(1,4,5,6)
b <- as.character(c(0,1,1,4))
c <- as.character(c(0,1,1,0))
d <- c(0,1,1,0)
template <- data.frame(id,a,b,c,d, stringsAsFactors = FALSE)
> str(template)
'data.frame': 4 obs. of 5 variables:
$ id: num 1 2 3 4
$ a : num 1 4 5 6
$ b : chr "0" "1" "1" "4"
$ c : chr "0" "1" "1" "0"
$ d : num 0 1 1 0
I am looking for below things.
To cast data types of template to be exactly same in df.
It should have same columns which are there in template frame.
**Note- It should add additional columns with all NA's if not available in df.
> df
id <- c(6,7,12,14,1,3,4,4)
a <- c(0,1,13,1,3,4,5,6)
b <- c(1,4,12,3,4,5,6,7)
c <- c(0,0,13,3,4,45,6,7)
e <- c(0,0,13,3,4,45,6,7)
df <- data.frame(id,a,b,c,e)
> str(df)
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : num 1 4 12 3 4 5 6 7
$ c : num 0 0 13 3 4 45 6 7
$ e : num 0 0 13 3 4 45 6 7
Desired output-
> output
id a b c d
1 6 0 1 0 NA
2 7 1 4 0 NA
3 12 13 12 13 NA
4 14 1 3 3 NA
5 1 3 4 4 NA
6 3 4 5 45 NA
7 4 5 6 6 NA
8 4 6 7 7 NA
> str(output)
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : chr "1" "4" "12" "3" ...
$ c : chr "0" "0" "13" "3" ...
$ d : logi NA NA NA NA NA NA ...
My attempts-
template <- fread("template.csv"),header=TRUE,stringsAsFactors = FALSE)
n <- names(template)
template[,(n) := lapply(.SD,function(x) gsub("[^A-Za-z0-90 _/.-]","", as.character(x)))]
n <- names(df)
df[,(n) := lapply(.SD,function(x) gsub("[^A-Za-z0-90 _/.-]","", as.character(x)))]
output <- rbindlist(list(template,df),use.names = TRUE,fill = TRUE,idcol="template")
After this I write the output data frame and then reread using write.csv to get data types. But, I am messing up with data types. Please suggest any appropriate way to deal with it.
I'd do
res = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
or with magrittr
library(magrittr)
setNames(, names(template)) %>%
lapply(. %>% {
if (. %in% names(df)) as(df[[.]], class(template[[.]]))
else template[[.]][NA_integer_]
}) %>% data.frame(stringsAsFactors = FALSE)
verifying ...
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : chr "1" "4" "12" "3" ...
$ c : chr "0" "0" "13" "3" ...
$ d : num NA NA NA NA NA NA NA NA
I'd suggest looking at the vetr package if you're going to be doing a lot of stuff like this. It has a good approach to templates for data frames and their columns.
Here's some code that does what you want.
require(tidyverse)
new_types <-
map_df(template, class) %>%
t %>%
as.data.frame(stringsAsFactors = F) %>%
rownames_to_column %>%
setNames(c('col', 'type'))
new_data <- df %>%
gather(col, value) %>%
right_join(new_types, by='col') %>%
group_by(col) %>%
mutate(rownum = row_number()) %>%
ungroup %>%
complete(col, rownum=1:max(rownum)) %>%
group_by(col) %>%
summarize(val = list(value), type=first(type)) %>%
mutate(new_val = map2(val, type, ~as(.x, .y, strict = T))) %>%
select(col, new_val) %>%
spread(col, new_val) %>%
unnest
The main idea here is to use map2() from the purrr package to apply the as() function from base R. This function takes in an object (e.g. a vector or column from a dataframe) and a character string that describes a new type, and returns the coerced object. This is the core capability that you need.
My new_types dataframe just lists the column names of the template and the (character-string) named of their type in a data frame.
Except for the map2() line, everything else is mucky data wrangling that could probably be improved.
Some key features:
right_join here is essential to keep only the columns you want.
the lines from mutate(rownum = row_number()) to complete(col, rownum=1:max(rownum)) are necessary only when the target df has columns that aren't in template -- they make sure that the resulting number of NAs is the same as for the other columns.
I remember a comment on r-help in 2001 saying that drop = TRUE in [.data.frame was the worst design decision in R history.
dplyr corrects that and does not drop implicitly. When trying to convert old code to dplyr style, this introduces some nasty bugs when d[, 1] or d[1] is assumed a vector.
My current workaround uses unlist as shown below to obtain a 1-column vector. Any better ideas?
library(dplyr)
d2 = data.frame(x = 1:5, y = (1:5) ^ 2)
str(d2[,1]) # implicit drop = TRUE
# int [1:5] 1 2 3 4 5
str(d2[,1, drop = FALSE])
# data.frame': 5 obs. of 1 variable:
# $ x: int 1 2 3 4 5
# With dplyr functions
d1 = data_frame(x = 1:5, y = x ^ 2)
str(d1[,1])
# Classes ‘tbl_df’ and 'data.frame': 5 obs. of 1 variable:
# $ x: int 1 2 3 4 5
str(unlist(d1[,1]))
# This ugly construct gives the same as str(d2[,1])
str(d1[,1][[1]])
You can just use the [[ extract function instead of [.
d1[[1]]
## [1] 1 2 3 4 5
If you use a lot of piping with dplyr, you may also want to use the convenience functions extract and extract2 from the magrittr package:
d1 %>% magrittr::extract(1) %>% str
## Classes ‘tbl_df’ and 'data.frame': 5 obs. of 1 variable:
## $ x: int 1 2 3 4 5
d1 %>% magrittr::extract2(1) %>% str
## int [1:5] 1 2 3 4 5
Or if extract is too verbose for you, you can just use [ directly in the pipe:
d1 %>% `[`(1) %>% str
## Classes ‘tbl_df’ and 'data.frame': 5 obs. of 1 variable:
## $ x: int 1 2 3 4 5
d1 %>% `[[`(1) %>% str
## int [1:5] 1 2 3 4 5
I have following code comparing na.omit and complete.cases:
> mydf
AA BB
1 2 2
2 NA 5
3 6 8
4 5 NA
5 9 6
6 NA 1
>
>
> na.omit(mydf)
AA BB
1 2 2
3 6 8
5 9 6
>
> mydf[complete.cases(mydf),]
AA BB
1 2 2
3 6 8
5 9 6
>
> str(na.omit(mydf))
'data.frame': 3 obs. of 2 variables:
$ AA: int 2 6 9
$ BB: int 2 8 6
- attr(*, "na.action")=Class 'omit' Named int [1:3] 2 4 6
.. ..- attr(*, "names")= chr [1:3] "2" "4" "6"
>
>
> str(mydf[complete.cases(mydf),])
'data.frame': 3 obs. of 2 variables:
$ AA: int 2 6 9
$ BB: int 2 8 6
>
> identical(na.omit(mydf), mydf[complete.cases(mydf),])
[1] FALSE
Are there any situations where one or the other should be used or effectively they are the same?
It is true that na.omit and complete.cases are functionally the same when complete.cases is applied to all columns of your object (e.g. data.frame):
R> all.equal(na.omit(mydf),mydf[complete.cases(mydf),],check.attributes=F)
[1] TRUE
But I see two fundamental differences between these two functions (there may very well be additional differences). First, na.omit adds an na.action attribute to the object, providing information about how the data was modified WRT missing values. I imagine a trivial use case for this as something like:
foo <- function(data) {
data <- na.omit(data)
n <- length(attributes(na.omit(data))$row.names)
message(sprintf("Note: %i rows removed due to missing values.",n))
# do something with data
}
##
R> foo(mydf)
Note: 3 rows removed due to missing values.
where we provide the user with some relevant information. I'm sure a more creative person could (and probably has) find (found) better uses of the na.action attribute, but you get the point.
Second, complete.cases allows for partial manipulation of missing values, e.g.
R> mydf[complete.cases(mydf[,1]),]
AA BB
1 2 2
3 6 8
4 5 NA
5 9 6
Depending on what your variables represent, you may feel comfortable imputing values for column BB, but not for column AA, so using complete.cases like this allows you finer control.
I want to convert a list of integer matrices to numeric. I know that lapply is not friendly to internal structures, but is there a lapply-solution?
mtList = list(matrix(sample(1:10),nrow=5),
matrix(sample(1:21),nrow=7))
str(mtList)
# This works, and I could wrap it in a for loop
mtList[[1]][] = as.numeric(mtList[[1]])
mtList[[2]][] = as.numeric(mtList[[2]])
str(mtList)
# But how to use lapply here? Note that the internal
# matrix structure is flattened
mtList1 = lapply(mtList,function(x)x[] = as.numeric(x))
str(mtList1)
You have to return the value in your lapply workhorse function:
mtList1 = lapply(mtList,function(x) {
x[] = as.numeric(x)
x
})
str(mtList1)
# List of 2
# $ : num [1:5, 1:2] 1 7 6 3 9 10 5 2 8 4
# $ : num [1:7, 1:3] 21 3 15 14 6 4 18 17 9 8 ...
A marginally simpler alternative could be to coerce by multiplying by 1.0 (or arguably, more robustly, raising to the power of 1.0)...
mtList1 <- lapply( mtList , "*" , 1.0 )
str(mtList1)
#List of 2
# $ x: num [1:5, 1:2] 2 8 1 5 7 9 3 10 4 6
# $ x: num [1:7, 1:3] 17 20 9 11 2 18 1 4 12 10 ...