Combining two columns with character strings into a new column

Combining two columns with character strings into a new column - r

Below I have two columns of data (column 6 and 7) of genus and species names. I would like to combine those two columns with character string data into a new column with the names combined.
I am quite new to R and the code below does not work! Thank you for the help wonderful people of stack overflow!
#TRYING TO MIX GENUS & SPECIES COLUMN
accepted_genus <- merged_subsets_2[6]
accepted_species <- merged_subsets_2[7]
accepted_genus
accepted_species
merged_subsets_2%>%
bind_cols(accepted_genus, accepted_species)
merged_subsets_2

We can use str_c from stringr
library(dplyr)
library(stringr)
df %>%
mutate(Col3 = str_c(Col1, Col2))
Or with unite
library(tidyr)
df %>%
unite(Col3, Col1, Col2, sep="", remove = FALSE)

Please take a look at this if this doesn't answer your question.
df <- data.frame(Col1 = letters[1:2], Col2=LETTERS[1:2]) # Sample data
> df
Col1 Col2
1 a A
2 b B
df$Col3 <- paste0(df$Col1, df$Col2) # Without spacing
> df
Col1 Col2 Col3
1 a A aA
2 b B bB
df$Col3 <- paste(df$Col1, df$Col2)
> df
Col1 Col2 Col3
1 a A a A
2 b B b B

Related

separate a column by a string [special case where the string is not always present]

I have a data frame that looks like this
col1 <- c("test-1", "test-2","test")
col2 <- c(1,2,3)
df <- data.frame(col1,col2)
I would to like separate col1 and my data look like this
check1 check2 col2
test 1 1
test 2 2
test NA 3
a function like this would not work
separate(df, col1, c("check1,check2"),"-")
any idea why?

use fill = 'right' to fill NAs in case of missing values and prevent displaying any warnings
col1 <- c("test-1", "test-2","test")
col2 <- c(1,2,3)
df <- data.frame(col1,col2)
library(tidyverse)
df %>% separate(col1, into = c('checkA', 'checkB'), sep = '-', fill = 'right')
#> checkA checkB col2
#> 1 test 1 1
#> 2 test 2 2
#> 3 test <NA> 3
Created on 2021-06-01 by the reprex package (v2.0.0)

Regarding the OP's issue, instead of creating a vector of column names, there is a syntax issue i.e. c("check1,check2") is a single element and it should be
c("check1","check2")
separate(df, col1, c("check1","check2"),"-")

Remove a number of character from string in a column

I have a data frame with a column of strings and I would like to remove the first three characters in each of the strings. As in the following example:
From this:
df <- data_frame(col1 = c('01_A','02_B', '03_C'))
To this:
df <- data_frame(col1 = c('A','B', 'C'))
I have been trying to use the dplyr transmute function but I can't really get it to work.
Any help would be super appreciated!

I think this will work:
library(dplyr)
library(stringr)
df %>%
mutate(col1 = str_remove(col1, "\\d+(_)"))
col1
1 A
2 B
3 C

We could also use substring from base R as the OP mentioned above position based substring extraction
df$col1 <- substring(df$col1, 4)
df$col1
#[1] "A" "B" "C"

You can use sub like below
> df %>%
+ mutate(col1 = sub("^.{3}", "", col1))
# A tibble: 3 x 1
col1
<chr>
1 A
2 B
3 C

How to combine all unique values of a dataframe column into a string

I have created a dataframe that looks like this
data <- data.frame(col1,col2,col3)
>data
col1 col2 col3
1 a1 b1 c1
2 a1 b2 c2
3 a1 b3 c3
and would like to transform into
col1 col2 col3
1 a1 b1,b2,b3 c1,c2,c3
It seems that rbind is what I am looking for. But after reading the description, I still have no clue how to implement this.

Create example dataset:
df <- data.frame(
col1 = c("a1","a1","a1"),
col2 = c("b1","b2","b3"),
col3 = c("c1","c2","c3"),
stringsAsFactors = FALSE
)
Short version:
data.frame(lapply(df, function(x) paste(unique(x), collapse=",")))
With explanation and intermediate steps:
#create a custom function to list unique elements as comma separated
myfun <- function(x) {
paste(unique(x), collapse=",")
}
#apply our function to our dataframe's columns
temp <- lapply(df, myfun)
#temp is a list, turn it into a dataframe
result <- data.frame(temp)

Another option would be to use summarise_all
library(dplyr)
df %>% summarise_all(funs(paste(unique(.), collapse = ",")))
# col1 col2 col3
# 1 a1 b1,b2,b3 c1,c2,c3

Purify df1 by rows that have no duplicates in df2 based on several columns

I have two data frames, df1 and df2, each with several columns. My goal is to modify df1 such that it contains only rows that have duplicates in df2 based on several columns. Unfortunately, I only found ways to do it based on either one or all columns. Here is an example:
df1 <- data.frame(c(seq(1:5)),
c(letters[1:5]),
c(letters[22:26]))
colnames(df1) <- c("col1", "col2", "col3")
df2 <- data.frame(c(1, 20, 30, 4, 5),
c(letters[1:5]),
c(letters[15:19]))
colnames(df2) <- c("col1", "col2", "col3")
Now, I want to modify df1 such that it contains only rows that have duplicates in df2 based on col1 and col2. Thus, my goal is to get:
> df3
col1 col2 col3
1 1 a v
2 4 d y
3 5 e z

With merge in base R, you can do
merge(df1, df2[, 1:2])
col1 col2 col3
1 1 a v
2 4 d y
3 5 e z
You have to drop the final column of df2 (or keep only the ID columns). By default only the IDs that match in both data.frames are kept. Also, merge searches for the names of the IDs that match in both data.frames (via setdiff), and uses them for the merge operation, which is what we want here, so we don't even have to specify the "by" or "by.x" / "by.y" arguments.

Here is a join option with data.table
library(data.table)
setDT(df1)[df2[1:2], on = .(col1, col2), nomatch = 0]
# col1 col2 col3
#1: 1 a v
#2: 4 d y
#3: 5 e z

A base R solution could be
df1[with(df1,paste0(col1,"_",col2)) %in% with(df2,paste0(col1,"_",col2)),]
modified according to comments by #docendo discimus
Alternative solution by #docendo discimus:
cols <- c("col1", "col2"); df1[Reduce(&, Map(==, df1[cols], df2[cols])),]

We can use semi_join from dplyr. df3 is the final output.
library(dplyr)
df3 <- df1 %>% semi_join(df2, by = c("col1", "col2"))

How to append two dataframe by overwriting the existing rows

I have a dataframe say df. I have extracted a sample 5% rows from df and created a new dataframe df1 to do few manipulations in the dataset. Now I need to append df1 to df and overwrite the existing rows of df1 as it is a subset of df.
I tried to extract the rows that are not present in df using
df2 <- subset(df, !(rownames(df) %in% rownames(df1[])))
But this didnt work.
Can anyone help please.

Save the filter and re-use it like so
set.seed(357)
xy <- data.frame(col1 = letters[1:5], col2 = runif(5))
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 0.27987766
4 d 0.22486212
5 e 0.65348521
your.condition <- xy$col1 %in% c("c", "d")
newxy1 <- xy[your.condition, ]
newxy1$col2 <- 1:2
xy[your.condition, "col2"] <- newxy1$col2
xy
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 1.00000000
4 d 2.00000000
5 e 0.65348521

You should always try to make a reproducible example so that it is easy for others to help you
I have tried to do that with the help of mtcars dataset
#Copied mtcars data into df
df = mtcars
# sample 5 rows from df
df1 = df[sample(1:nrow(df), 5), ]
# did few manipulations in the dataset
df1 = df1 * 2
# overwrite the existing rows of df1 as it is a subset of df
df[rownames(df1), ] <- df1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Combining two columns with character strings into a new column - r

We can use str_c from stringr library(dplyr) library(stringr) df %>% mutate(Col3 = str_c(Col1, Col2)) Or with unite library(tidyr) df %>% unite(Col3, Col1, Col2, sep="", remove = FALSE)

Related

separate a column by a string [special case where the string is not always present]

Remove a number of character from string in a column

How to combine all unique values of a dataframe column into a string

Purify df1 by rows that have no duplicates in df2 based on several columns

How to append two dataframe by overwriting the existing rows

Categories

Resources