I have below data frame
col1 <- c("A","B", "A")
col2 <- c("C","D","D")
col3 <- c("E","E","E")
col4 <- c("F","F","H")
x <- data.frame(col1,col2,col3,col4)
Output of above frame is:
1
I want to replace characters to numbers, as below:
2
Here's a one-liner in base R that works with any number of columns and any names - nothing is hard-coded, so it works with any x:
> setNames(data.frame(matrix(as.numeric(unlist(x)),ncol=ncol(x))),names(x))
col1 col2 col3 col4
1 1 3 5 6
2 2 4 5 6
3 1 4 5 7
x <- x %>%
unlist %>%
as.numeric %>%
matrix(ncol=4) %>%
data.frame
names(x) <- paste0("col", 1:4)
x
col1 col2 col3 col4
1 1 3 5 6
2 2 4 5 6
3 1 4 5 7
Here is a solution with base R:
x[] <- match(as.matrix(x), unique(c(as.matrix(x))))
# > x
# col1 col2 col3 col4
# 1 1 3 5 6
# 2 2 4 5 6
# 3 1 4 5 7
Here is a shorter solution:
x[] <- as.integer(unlist(x))
data:
x <- data.frame(col1=c("A","B", "A"), col2=c("C","D","D"), col3=c("E","E","E"), col4=c("F","F","H")
We can use lapply from base R
x[] <- lapply(x, match, LETTERS)
x
# col1 col2 col3 col4
#1 1 3 5 6
#2 2 4 5 6
#3 1 4 5 8
Related
I have a list with two dataframes, the first of which has two columns and the second of which has three.
dat.list<-list(dat1=data.frame(col1=c(1,2,3),
col2=c(10,20,30)),
dat2= data.frame(col1=c(5,6,7),
col2=c(30,40,50),
col3=c(7,8,9)))
# $dat1
# col1 col2
# 1 1 10
# 2 2 20
# 3 3 30
# $dat2
# col1 col2 col3
# 1 5 30 7
# 2 6 40 8
# 3 7 50 9
I am trying to create a new column in both dataframes using map(), mutate() and case_when(). I want this new column to be identical to col3 if the dataframe has more than two columns, and identical to col1 if it has two or less columns. I have tried to do this with the following code:
library(tidyverse)
dat.list %>% map(~ .x %>%
mutate(newcol=case_when(ncol(.)>2 ~ col3,
TRUE ~ col1),
))
However, this returns the following error: "object 'col3' not found". How can I get the desired output? Below is the exact output I am trying to achieve.
# $dat1
# col1 col2 newcol
# 1 1 10 1
# 2 2 20 2
# 3 3 30 3
# $dat2
# col1 col2 col3 newcol
# 1 5 30 7 7
# 2 6 40 8 8
# 3 7 50 9 9
if/else will do :
library(dplyr)
library(purrr)
dat.list %>% map(~ .x %>% mutate(newcol= if(ncol(.) > 2) col3 else col1))
#$dat1
# col1 col2 newcol
#1 1 10 1
#2 2 20 2
#3 3 30 3
#$dat2
# col1 col2 col3 newcol
#1 5 30 7 7
#2 6 40 8 8
#3 7 50 9 9
Base R using lapply :
lapply(dat.list, function(x) transform(x, newcol = if(ncol(x) > 2) col3 else col1))
Following a question I came across today, I would like to know how I can use bind_rows function in a pipe while avoiding duplication and NA values. Consider I have the following simple tibble:
df <- tibble(
col1 = c(3, 4, 5),
col2 = c(5, 3, 1),
col3 = c(6, 4, 9),
col4 = c(9, 6, 5)
)
I would like to bind col1 & col2 row-wise with col3 & col4 so that I have a tibble with 2 columns and 6 observations. In the end changing the names of the columns to colnew1 and colnew2.
But when I use bind_rows I got the following output with a lot of duplications and NA values.
df %>%
bind_rows(
select(., 1:2),
select(., 3:4)
)
# A tibble: 9 x 4
col1 col2 col3 col4
<dbl> <dbl> <dbl> <dbl>
1 3 5 6 9
2 4 3 4 6
3 5 1 9 5
4 3 5 NA NA
5 4 3 NA NA
6 5 1 NA NA
7 NA NA 6 9
8 NA NA 4 6
9 NA NA 9 5
# My desired output would be something like this:
f1 <- function(x) {
df <- x %>%
set_names(nm = rep(c("newcol1", "newcol2"), 2))
bind_rows(df[, c(1, 2)], df[, c(3, 4)])
}
f1(df)
# A tibble: 6 x 2
newcol1 newcol2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
I can get the desired output without a pipe but first I would like to know how I could use bind_rows in a pipe without getting NA values and duplications and second whether I could use select function in bind_rows as I remember once Hadley Wickham used filter function wrapped by bind_rows.
I would appreciate any explanation to this problem and thank you in advance.
Select the first two columns and bind_rows col3 col4 to col1 and col2 then use transmute
df1 <- df %>%
select(col1, col2) %>%
bind_rows(
df %>%
transmute(col1 = col3, col2 = col4)
)
Results:
# A tibble: 6 x 2
col1 col2
<dbl> <dbl>
1 3 5
2 4 3
3 5 1
4 6 9
5 4 6
6 9 5
I have a file named 'schema'. Based on the file, I need to rename other data frames. For example, 'Var1' of TableA needs to be renamed to 'Col1'. Similarly, VarA of TableA needs to be renamed to ColA. In short, all variables listed in 'FROM' colume of schema needs to be renamed to column 'To'.
Schema <- read.table(header = TRUE, text =
'Tables From To
A Var1 Col1
A Var2 Col2
A Var3 Col3
B VarA ColA
B VarB ColB
B VarC ColC
')
A <- data.frame(Var1 = 1:3,
Var2 = 2:4,
Var3 = 3:5)
B <- data.frame(VarA = 1:3,
VarB = 2:4,
VarC = 3:5)
We could use match:
lapply(list(A = A, B = B), function(i){
setNames(i, Schema$To[ match(names(i), Schema$From) ])
})
# $A
# Col1 Col2 Col3
# 1 1 2 3
# 2 2 3 4
# 3 3 4 5
#
# $B
# ColA ColB ColC
# 1 1 2 3
# 2 2 3 4
# 3 3 4 5
Or:
Anew <- setNames(A, Schema$To[ match(names(A), Schema$From) ])
Bnew <- setNames(B, Schema$To[ match(names(B), Schema$From) ])
Or list2env:
list2env(lapply(list(A = A, B = B), function(i){
setNames(i, Schema$To[ match(names(i), Schema$From) ])
}), envir = globalenv())
Edit: When there is no match Schema then use keep column name as is:
list2env(lapply(list(A = A, B = B), function(i){
# check if there is a match, if not keep name unchaged
x <- as.character(Schema$To[ match(names(i), Schema$From) ])
ix <- which(is.na(x))
x[ ix ] <- names(i)[ ix ]
# retunr with updated names
setNames(i, x)
}), envir = globalenv())
The following code can extract retrieve the name of tables (A and B) from Schema and to the name replacement task:
r <- Map(function(v) function(v) {
r <- get(v)
names(r)[names(r) %in% Schema$From] <- as.character(Schema$To[Schema$From %in% names(r)])
assign(v,r)},
as.character(unique(Schema$Tables)))
which gives
> r
$A
Col1 Col2 Col3
1 1 2 3
2 2 3 4
3 3 4 5
$B
ColA ColB ColC
1 1 2 3
2 2 3 4
3 3 4 5
If you don't want result as list, you can do something like
list2env(Map(function(v) {
r <- get(v)
names(r)[names(r) %in% Schema$From] <- as.character(Schema$To[Schema$From %in% names(r)])
assign(v,r)},
as.character(unique(Schema$Tables))),envir = .GlobalEnv)
or
for (v in as.character(unique(Schema$Tables))) {
r <- get(v)
names(r)[names(r) %in% Schema$From] <- as.character(Schema$To[Schema$From %in% names(r)])
assign(v,r)
}
then you will keep your object A and B
> A
Col1 Col2 Col3
1 1 2 3
2 2 3 4
3 3 4 5
> B
ColA ColB ColC
1 1 2 3
2 2 3 4
3 3 4 5
lut <- setNames(as.character(Schema$To), Schema$From)
setNames(A, lut[names(A)])
Col1 Col2 Col3
1 1 2 3
2 2 3 4
3 3 4 5
setNames(B, lut[names(B)])
ColA ColB ColC
1 1 2 3
2 2 3 4
3 3 4 5
I am looking for a way to copy a column "col1" x times and appending each of these copies with one of x strings from a character vector. Example:
df <- data.frame(col1 = c(1,2,3,4,5))
suffix <- c("a", "b", "c")
resulting in:
df_suffix <- data.frame(col1 = c(1,2,3,4,5), col1_a = c(1,2,3,4,5), col1_b = c(1,2,3,4,5), col1_c = c(1,2,3,4,5))
col1 col1_a col1_b col1_c
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
You can use paste() to create the new columns inside df, and assign them the values of the first column:
df[,paste(names(df), suffix, sep = "_")] <- df[,1]
# col1 col1_a col1_b col1_c
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 4 4
#5 5 5 5 5
I have a data.frame with several columns I'd like to join into one column in a new data.frame.
df1 <- data.frame(col1 = 1:3, col2 = 4:6, col3 = 7:9)
how would I create a new data.frame with a single column that's 1:9?
Since data.frames are essentially lists of columns, unlist(df1) will give you one large vector of all the values. Now you can simply construct a new data.frame from it:
data.frame(col = unlist(df1))
In case you want an indicator too:
stack(df1)
# values ind
# 1 1 col1
# 2 2 col1
# 3 3 col1
# 4 4 col2
# 5 5 col2
# 6 6 col2
# 7 7 col3
# 8 8 col3
# 9 9 col3
Just to provide a complete set of ways to do that, here is the tidyr way.
library(tidyr)
gather(df1)
key value
1 col1 1
2 col1 2
3 col1 3
4 col2 4
5 col2 5
6 col2 6
7 col3 7
8 col3 8
9 col3 9
One more using c function:
data.frame(col11 = c(df1,recursive=TRUE))
col11
col11 1
col12 2
col13 3
col21 4
col22 5
col23 6
col31 7
col32 8
col33 9
You could try:
as.data.frame(as.vector(as.matrix(df1)))
# as.vector(as.matrix(df1))
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
Another approach, just for using Reduce...
data.frame(Reduce(c, df1))