How to split one column to multiple? [duplicate] - r

This question already has answers here:
Split column at delimiter in data frame [duplicate]
(6 answers)
Closed 3 years ago.
May I ask how to split the column of a data frame into multiple columns, for example:
ID value
10.A.S 1
11.A.S 2
12.A.S 3
10.A 4
11.A 5
12.A 6
I want to split the ID column based on the ".", and the expected result should be like:
ID NO. type treatment value
10.A.S 10 A S 1
11.A.S 11 A S 2
12.A.S 12 A S 3
10.A 10 A 4
11.A 11 A 5
12.A 12 A 6
Thank you very much.

An option is separate. The sep in separate takes by default regex. According to ?separate
sep - If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.
The . is a metacharacter to match any character. So, we either escape (\\.) or place it in square brackets ([.])
library(dplyr)
library(tidyr)
df1 %>%
separate(ID, into = c("NO.", "type", "treatment"),
sep="\\.", remove = FALSE, convert = TRUE)

Related

split the lines of a data frame into a variable number of lines based on a character in R [duplicate]

This question already has answers here:
Split delimited strings in a column and insert as new rows [duplicate]
(6 answers)
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 10 months ago.
I have this df:
df = data.frame(ID = c(1,2,3),
A = c("h;d;c", "j;k", "k"))
And i want to retrieve a new df with splited rows based on ";" character, just like this:
ID A
1 1 h
2 1 d
3 1 c
4 2 j
5 2 k
6 3 k
I searched for other questions, but they need an exact amount of expected characters. (Split data frame string column into multiple columns)
Thanks for the help!

How to calculate the number of elements in a string [duplicate]

This question already has answers here:
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 1 year ago.
I have a lot of strings of elements separated with hyphen -:
string<-c("aaa","aaa-bbb","aaa-bbb-ccc","aaa-bbb-ccc-ddd")
I want to calculate the number of elements in each string. The expected vector is
[1] 1 2 3 4
Does this work:
sapply(strsplit(string, split = '-'), length)
[1] 1 2 3 4

Split column using regular expressions in R [duplicate]

This question already has answers here:
Split data frame string column into multiple columns
(16 answers)
Closed 2 years ago.
Im trying so split column in my dataframe into two columns. Values in column look like this:
column
user_author-5
creator-user-5
Desired result is this:
column number
user_author 5
creator-user 7
I do this:
df %>%
tidyr::extract(col = "column",
into = c("number"),
regex = "-(\\d+)$",
remove = FALSE
)
But i get this:
column number
user_author-5 5
creator-user-7 7
How could i split column and remove that number from the first column at the same time? The problem here is that there are some "-" in text too, so I must use regular expression "-(\d+)$", not "-". It makes it a little bit unclear to me
You can use extract like :
tidyr::extract(df, column, c('column', 'number'), '(.*)-.*?(\\d+)')
# column number
#1 user_author 5
#2 creator-user 7
in regex we capture data in two groups. First group is till first '-' and the second group is the last number.
data
df <- structure(list(column = c("user_author-5", "creator-user-7")),
class = "data.frame", row.names = c(NA, -2L))
Another way you can try in this case.
library(stringr)
df2 <- df %>%
mutate(colum2 = str_extract_all(column, regex("(?<=-)\\d{1,}$")))
# column colum2
# 1 user_author-5 5
# 2 creator-user-7 7

how do I generate a new column in a tibble by combining 2 other columns? [duplicate]

This question already has answers here:
How to combine multiple character columns into a single column in an R data frame
(7 answers)
Closed 4 years ago.
I have a tibble with the variables Year and Quarter in separate columns in the format '2006' (Year) and '2' (Quarter). I want to merge these to create a new column called Year_Qtr in for format 2006-2 or something similar.
I have tried the unite function with the code:
SA_Long_Data %>%
unite(Year_Qtr, Year, Quarter)
This has created the variable I need but produced it as a table rather than a new column within the tibble.
Can anyone help?
You have a remove parameter in unite which is TRUE by default, set it to FALSE. You can also use the sep argument to have dashes instead of underscores.
library(tidyverse)
SA_Long_Data <- tibble::tibble(Year=1:3,Quarter=5:7)
SA_Long_Data %>% unite(Year_Qtr, Year, Quarter,remove = FALSE,sep="-")
# A tibble: 3 x 3
Year_Qtr Year Quarter
* <chr> <int> <int>
1 1-5 1 5
2 2-6 2 6
3 3-7 3 7

R - count number of items in a piped list [duplicate]

This question already has answers here:
Count values separated by a comma in a character string
(5 answers)
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 6 years ago.
I have a column with a piped list of identifiers
Identifier
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786
P28066|P60900|O14818|P20618|P40306
Q99436|P28062|P28065
P28062|P28065|P62191|P35998|P17980|P43686
How do I produce a column of the numbers of identifiers in each row?
Output to read something like this
Identifier Count
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786 7
P28066|P60900|O14818|P20618|P40306 5
Q99436|P28062|P28065 3
P28062|P28065|P62191|P35998|P17980|P43686 6
Thanks in advance!
sapply(strsplit(df$Identifier, '[|]'), length)
for unique cases, just add the unique function
sapply(strsplit(df$Identifier, '[|]'), function(i) length(unique(i)))
A base R option without splitting would be
df1$Count <- nchar(gsub("[^|]", "", df1$Identifier)) + 1L
df1$Count
#[1] 7 5 3 6
Or with gregexpr
sapply(gregexpr("[|]", df1$Identifier),
function(x) sum(attr(x, "match.length"))+1)
#[1] 7 5 3 6

Resources