Create new column of character vectors - r

I'm trying to combine two columns of type "character" into a new column. That is,
ColA ColB ColC
"A" "1" c("A", "1")
"B" "2" c("B", "2")
"C" "3" c("C", "3")
I have tried:
df %>%
mutate(ColC = list(ColA, ColB))
and other variants but this doesn't work. Anyone know how to do this?

A simple paste would do the job in this example
df=data.frame(colA=c("A","B","C"), colB=c("1","2","3"))
df$ColC=paste(df$colA, df$colB)
df
colA colB ColC
1 A 1 A 1
2 B 2 B 2
3 C 3 C 3

We can user rowwise
library(tidyverse)
df %>%
rowwise() %>%
mutate(ColC = list(c(.)))
Or using pmap
df %>%
mutate(ColC = pmap(., ~ c(...)))
data
df <- structure(list(ColA = c("A", "B", "C"), ColB = 1:3),
class = "data.frame", row.names = c(NA, -3L))

If you do not want to use dplyr df$ColC <- apply(df[,c("ColA", "ColB")], 1, paste, collapse = " ").

Related

How to replace all values in a column with another value?

Suppose I have a data frame df with two columns:
id category
A 1
B 4
C 3
D 1
I want to replace the numbers in category with the following: 1 = "A", 2 = "B", 3 = "C", 4 = "D".
I.e. the output should be
id category
A A
B D
C C
D A
Does anyone know how to do this?
Here I propose three methods to achieve your goal.
Base R
If you have a vector of values for conversion, you can use match to find the index of the vector to replace the category column.
vec <- c("1" = "A", "2" = "B", "3" = "C", "4" = "D")
df$category <- vec[match(df$category, names(vec))]
dplyr
Use a case_when statement to match the values in category, and assign new strings to it.
library(dplyr)
df %>% mutate(category = case_when(category == 1 ~ "A",
category == 2 ~ "B",
category == 3 ~ "C",
category == 4 ~ "D",
TRUE ~ NA_character_))
left_join from dplyr
Or if you have a dataframe with two columns specifying values for conversion, you can left_join them. Here, the dataframe for conversion is created by enframe.
left_join(df, enframe(vec), by = c("category" = "name")) %>% select(-value)
Output
id category
1 A A
2 B D
3 C C
4 D A
Data
df <- structure(list(id = c("A", "B", "C", "D"), category = c("A",
"D", "C", "A")), row.names = c(NA, -4L), class = "data.frame")
A possible solution:
library(tidyverse)
df %>%
mutate(category = LETTERS[category])
#> id category
#> 1 A A
#> 2 B D
#> 3 C C
#> 4 D A

How fill a dataframe from another one in R?

I want to fill df2 with information from df1.
df1 as below
ID Mutation
1 A
2 B
2 C
3 A
df2 as below
ID A B C
1
2
3
For example, if mutation A is found in ID 1, then I want it in df2 it marked as "Y".
So the df2 result should be
ID A B C
1 Y
2 Y Y
3 Y
I have hundreds of IDs and more than 20 mutations. How can I efficiently achieve this in R? Thanks!
Using data.table you can try
setDT(df)
df2 <- dcast(df,formula = ID~Mutation )
df2[, c("A", "B", "C") := lapply(.SD, function(x) ifelse(is.na(x), " ", "Y")), ID]
df2
#Output
ID A B C
1: 1 Y
2: 2 Y Y
3: 3 Y
Create a new column with value 'Y' and cast the data in wide format.
library(dplyr)
library(tidyr)
df %>%
mutate(value = 'Y') %>%
pivot_wider(names_from = Mutation, values_from = value, values_fill = '')
# ID A B C
# <int> <chr> <chr> <chr>
#1 1 "Y" "" ""
#2 2 "" "Y" "Y"
#3 3 "Y" "" ""
data
df <- structure(list(ID = c(1L, 2L, 2L, 3L), Mutation = c("A", "B",
"C", "A")), class = "data.frame", row.names = c(NA, -4L))

Count number of occurrences of two column cases

I have a dataframe:
ID col1 col2
1 LOY A
2 LOY B
3 LOY B
4 LOY B
5 LOY A
I want to count number of occurrences of unique values according to col1 and col2. So, desired result is:
event count
loy-a 2
loy-b 3
How could i do that?
You can also try:
library(dplyr)
#Code
new <- df %>% group_by(event=tolower(paste0(col1,'-',col2))) %>%
summarise(count=n())
Output:
# A tibble: 2 x 2
event count
<chr> <int>
1 loy-a 2
2 loy-b 3
Some data used:
#Data
df <- structure(list(ID = 1:5, col1 = c("LOY", "LOY", "LOY", "LOY",
"LOY"), col2 = c("A", "B", "B", "B", "A")), class = "data.frame", row.names = c(NA,
-5L))
Here is an option where we convert the columns to lower case, then get the count and unite the 'col1', 'col2' to a single 'event' column
library(dplyr)
library(tidyr)
df1 %>%
mutate(across(c(col1, col2), tolower)) %>%
count(col1, col2) %>%
unite(event, col1, col2, sep='-')
-output
# event n
#1 loy-a 2
#2 loy-b 3
NOTE: Returns the OP's expected output
Or using base R
with(df1, table(tolower(paste(col1, col2, sep='-'))))
data
df1 <- structure(list(ID = 1:5, col1 = c("LOY", "LOY", "LOY", "LOY",
"LOY"), col2 = c("A", "B", "B", "B", "A")),
class = "data.frame", row.names = c(NA,
-5L))

dplyr mutate to replace specific values in a data frame

I have a data frame that consists of characters "a", "b", "x", "y".
df <- data.frame(v1 = c("a", "b", "x", "y"),
v2 = c("a", "b", "a", "y"))
Now I want to replace all values with the following scheme and also convert the whole data frame to numeric.
"a" -> 0
"b" -> 1
"x" -> 1
"y" -> 2
I know this must be somehow possible with mutate_all but I cannot figure out how
df %>% mutate_all(replace("a", 1)) %>%
mutate_all(is.character, as.numeric)
One solution could be with case_when:
df %>%
mutate_all(funs(case_when(. == "a" ~ 0,
. %in% c("b", "x") ~ 1,
. == "y" ~ 2,
TRUE ~ NA_real_)))
# v1 v2
# 1 0 0
# 2 1 1
# 3 1 0
# 4 2 2
Create a named vector with mappings and then subset it using mutate_all
vec <- c(a = 0, b = 1, x = 1, y = 2)
library(dplyr)
df %>% mutate_all(~vec[.])
# v1 v2
#1 0 0
#2 1 1
#3 1 0
#4 2 2
In base R that would be just
df[] <- vec[unlist(df)]
data
df <- data.frame(v1 = c("a", "b", "x", "y"),
v2 = c("a", "b", "a", "y"), stringsAsFactors = FALSE)

R paste0 2 columns if not NA

I would like to paste0 two columns if the element in one column is not NA.If one element of one columns is NA then keep the element of the other column only.
structure(list(col1 = structure(1:3, .Label = c("A", "B", "C"),
class = "factor"), col2 = c(1, NA, 3)), .Names = c("col1", "col2"),
class = "data.frame",row.names = c(NA, -3L))
# col1 col2
# 1 A 1
# 2 B NA
# 3 C 3
structure(list(col1 = structure(1:3, .Label = c("A", "B", "C"),
class = "factor"),col2 = c(1, NA, 3), col3 = c("A|1", "B", "C|3")),
.Names = c("col1", "col2", "col3"), row.names = c(NA,-3L),
class = "data.frame")
# col1 col2 col3
#1 A 1 A|1
#2 B NA B
#3 C 3 C|3
you can also do it with regular expressions:
df$col3 <- sub("NA\\||\\|NA", "", with(df, paste0(col1, "|", col2)))
That is, paste them in regular way and then replace any "NA|" or "|NA" with "". Note that | needs to be "double escaped" because it means "OR" in regexps, that's why the strange pattern NA\\||\\|NA means actually "NA|" OR "|NA".
As #Roland says, this is easy using ifelse (just translate the mental logic into a series of nested ifelse statements):
x <- transform(x,col3=ifelse(is.na(col1),as.character(col2),
ifelse(is.na(col2),as.character(col1),
paste0(col1,"|",col2))))
update: need as.character in some cases.
Try:
> df$col1 = as.character(df$col1)
> df$col3 = with(df, ifelse(is.na(col1),col2, ifelse(is.na(col2), col1, paste0(col1,'|',col2))))
> df
col1 col2 col3
1 A 1 A|1
2 B NA B
3 C 3 C|3
You could also do:
library(stringr)
df$col3 <- apply(df, 1, function(x)
paste(str_trim(x[!is.na(x)]), collapse="|"))
df
# col1 col2 col3
#1 A 1 A|1
#2 B NA B
#3 C 3 C|3

Resources