R: extract text after ")" - r

A simple question but I can't find solution. In R dplyr how do I extract text after a ")" and then split it based on "/"?
my data is like this
# A tibble: 3 x 2
id Group
<dbl> <chr>
1 1 (aa1) red/yellow
2 2 (bb1) blue/yellow
3 3 (cc1) green/orange
structure(list(id = c(1, 2, 3), group = c("(aa1) red/yellow",
"(bb1) blue/yellow", "(cc1) green/orange")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
And I would simply like:
Seems simple but I am new to r and cannot figure this out. Thanks.

you can use regmatches in combination with regexpr.
library(dplyr)
df = data.frame(id = c(1,2,3), group = c("(aa1) red/yellow","(dd1) blue/yellow","(cc1) green/orange"))
df %>%
mutate(x1 = regmatches(group,regexpr("^\\(.{3}\\)",group)),
x2 = regmatches(group,regexpr("(?<= )\\w+(?=/)",group,perl = TRUE)),
x3 = regmatches(group,regexpr("(?<=/)\\w+$",group,perl = TRUE)))
output is:
id group x1 x2 x3
1 1 (aa1) red/yellow (aa1) red yellow
2 2 (dd1) blue/yellow (dd1) blue yellow
3 3 (cc1) green/orange (cc1) green orange
If you don't know how to use regular expressions you can read this, it is a helpful intro to regular expressions

First separate the values in group, separating them by whitespace \\s or /, then remove the parentheses in x1 using sub and 'recollecting' only the alphanumerical parts \\w+ in the replacement with backreference \\1:
library(tidyr)
library(dplyr)
df %>%
separate(., col = "group", into = paste0("x", 1:3), sep = "\\s|/") %>%
mutate(x1 = sub(".(\\w+).", "\\1", x1))
# A tibble: 3 x 4
id x1 x2 x3
<dbl> <chr> <chr> <chr>
1 1 aa1 red yellow
2 2 bb1 blue yellow
3 3 cc1 green orange
EDIT:
If your input data is more complex, as suggested in a comment, such as this:
df <- structure(list(id = c(1, 2, 3), group = c("(aa1) red bus/yellow",
"(bb1) blue/yellow", "(cc1) green/orange apple")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
then this will work:
df %>%
separate(., col = "group", into = paste0("x", 1:3), sep = "\\) |/") %>%
mutate(x1 = sub(".(\\w+).", "\\1", x1))
# A tibble: 3 x 4
id x1 x2 x3
<dbl> <chr> <chr> <chr>
1 1 aa red bus yellow
2 2 bb blue yellow
3 3 cc green orange apple

Related

Add more rows based on a grouping variable R

I'd like to add more rows to my dataset based on a grouping variable. Right now, my data has 2 rows but I would like 3 rows and the var app to be repeated for the third row.
This is what my data currently looks like:
my_data <- data.frame(app = c('a','b'), type = c('blue','red'), code = c(1:2), type_2 = c(NA, 'blue'), code_2 = c(NA, 3))
app type code type_2 code_2
a blue 1 NA NA
b red 2 blue 3
I would like the data to look like this:
app type code
a blue 1
b red 2
b blue 3
library(data.table)
setDT(my_data)
res <-
melt(
my_data,
id.vars = "app",
measure.vars = patterns(c("^type", "^code")),
value.name = c("type", "code")
)[!is.na(type), .(app, type, code)]
Using tidyverse
library(dplyr)
library(stringr)
library(tidyr)
my_data %>%
rename_at(vars(c(type, code)), ~ str_c(., "_1")) %>%
pivot_longer(cols = -app, names_to = c(".value", "grp"), names_sep = "_",
values_drop_na = TRUE) %>% select(-grp)
# A tibble: 3 x 3
# app type code
# <chr> <chr> <dbl>
#1 a blue 1
#2 b red 2
#3 b blue 3

Splitting values in a column

sorry I'm new to R but I've got some data that looks like the following:
I'd like count the number of times each object is mentioned in the findings. So the result would look like this:
I've tried tidyverse and separate but can't seem to get the hang of it, any help would be amazing, thanks in advance!
To recreate my data:
df <- data.frame(
col_1 = paste0("image", 1:5),
findings = c("rock|cat|sun", "cat", "cat|dog|fish|sun", "sun", "dog|cat")
)
You can use separate_rows() and then count().
library(tidyverse)
df %>%
separate_rows(findings) %>%
count(findings)
# # A tibble: 5 x 2
# findings n
# <chr> <int>
# 1 cat 4
# 2 dog 2
# 3 fish 1
# 4 rock 1
# 5 sun 3
Data
df <- structure(list(col_1 = c("image_1", "image_2", "image_3", "image_4",
"image_5"), findings = c("rock|cat|sun", "cat", "cat|dog|fish|sun",
"sun", "dog|cat")), class = "data.frame", row.names = c(NA, -5L))
In base R:
as.data.frame(table(unlist(strsplit(df$col_2, "|", fixed = TRUE))))
# Var1 Freq
# 1 cat 4
# 2 dog 2
# 3 fish 1
# 4 rock 1
# 5 sun 3
Reproducible data (please provide it in your next post):
df <- data.frame(
col_1 = paste0("image", 1:5),
col_2 = c("rock|cat|sun", "cat", "cat|dog|fish|sun", "sun", "dog|cat")
)
An option with cSplit
library(splitstackshape)
cSplit(df, 'col_2', 'long', sep="|")[, .N, col_2]
# col_2 N
#1: rock 1
#2: cat 4
#3: sun 3
#4: dog 2
#5: fish 1
data
df <- structure(list(col_1 = c("image1", "image2", "image3", "image4",
"image5"), col_2 = c("rock|cat|sun", "cat", "cat|dog|fish|sun",
"sun", "dog|cat")), class = "data.frame", row.names = c(NA, -5L
))
Using tidyverse:
df %>%
separate_rows(findings) %>%
group_by(findings) %>%
summarize(total_count_col=n())
First we convert the data into a long format using separate_rows, then group and count the number of rows with each finding.
Example:
df<-data.frame(col1=c(rep(letters[1:3],3),"d"),col2=c(rep("moose|cat|dog",9),"rock"), stringsAsFactors = FALSE)
df %>% separate_rows(col2) %>% group_by(col2) %>% summarize(total_count_col=n())
# A tibble: 4 x 2
col2 total_count_col
<chr> <int>
1 cat 9
2 dog 9
3 moose 9
4 rock 1

Transforming a dataframe by "multiplying" a column's elements by the names of the other columns [duplicate]

This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Closed 3 years ago.
An example is below. How can I transform a dataframe df with column names to the form of df.transformed below?
> df <- data.frame("names" = c("y1", "y2"), "x1" = 1:2, "x2" = 4:5)
> df
names x1 x2
1 y1 1 4
2 y2 2 5
> df.transformed <- data.frame("y1x1" = 1, "y1x2" =4, "y2x1" = 2, "y2x2" = 5)
> df.transformed
y1x1 y1x2 y2x1 y2x2
1 1 4 2 5
Code
require(data.table); setDT(df)
dt = melt(df, id.vars = 'names')[, col := paste0(variable, names)]
out = dt$value; names(out) = dt$col
Result
> data.frame(t(out))
x1y1 x1y2 x2y1 x2y2
1 2 4 5
You can achieve this in base R. This should work for any data frame size. The idea is combine Reduce with outer to build the data frame column names.
df <- data.frame("names" = c("y1", "y2"), "x1" = 1:2, "x2" = 4:5)
df_names <- outer(df[,1], names(df[,-1]), paste0)
df.transformed <- as.data.frame(matrix(,ncol = nrow(df)*ncol(df[,-1]), nrow = 0))
names(df.transformed) <- Reduce(`c`,t(df_names))
df.transformed[1,] <- Reduce(`c`,t(df[-1]))
Output
# y1x1 y1x2 y2x1 y2x2
# 1 4 2 5
You can do this in one line with the new tidyr::pivot_wider. Setting multiple columns for values means names will get pasted together for assignment.
library(tidyr)
pivot_wider(df, names_from = names, values_from = c(x1, x2), names_sep = "")
#> # A tibble: 1 x 4
#> x1y1 x1y2 x2y1 x2y2
#> <int> <int> <int> <int>
#> 1 1 2 4 5
However, the column names ("x1", "x2") come first. If you need to swap the "x" and "y" components of the names, you can do regex replacement with dplyr::rename_all.
df %>%
pivot_wider(names_from = names, values_from = c(x1, x2), names_sep = "") %>%
dplyr::rename_all(gsub, pattern = "(x\\d+)(y\\d+)", replacement = "\\2\\1")
#> # A tibble: 1 x 4
#> y1x1 y2x1 y1x2 y2x2
#> <int> <int> <int> <int>
#> 1 1 2 4 5

R - How to replace values of a variable with the name of the variable in dplyr

library(tidyverse)
df = tribble(~a,~b,
0 ,1,
1 ,0)
Result I want:
a b
'' 'b'
'a' ''
How can I replace values with the variables' name?
If we need to replace the 1 with the column names and 0 by blank (""), and if the columns are only binary, we can use map2 to loop through the columns and corresponding column names, then add 1 to the columns and use it as index to replace the 1 with "" and 2 with the corresponding column names
library(tidyverse)
map2_df(df, colnames(df), ~ c('', .y)[.x +1])
# A tibble: 2 x 2
# a b
# <chr> <chr>
#1 "" b
#2 a ""
You can also try , replace function from BASE R:, printing nwdf1, you will have your final answer.
nwdf <- replace(df,df == 1,names(df))
nwdf1 <- replace(nwdf, nwdf==0, '')
Where df is :
structure(list(a = c(0, 1), b = c(1, 0)), .Names = c("a", "b"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L))
EDIT:
More generic solution for above question:
outputdf <- data.frame(sapply(names(df), function(x)ifelse(df[,x] == 1, x, '')), stringsAsFactors=F)
Output:
# A tibble: 2 x 2
a b
<chr> <chr>
1 b
2 a

Binding data frames from a list with different column types

Trying to figure out a way in purrr to bind rows over different elements of lists where the column types are not consistent. For example, my data looks a little like this...
d0 <- list(
data_frame(x1 = c(1, 2), x2 = c("a", "b")),
data_frame(x1 = c("P1"), x2 = c("c"))
)
d0
# [[1]]
# # A tibble: 2 x 2
# x1 x2
# <dbl> <chr>
# 1 1 a
# 2 2 b
#
# [[2]]
# # A tibble: 1 x 2
# x1 x2
# <chr> <chr>
# 1 P1 c
I can use a for loop and then map_df with bind_rows to get the output I want (map_df will not work if the columns are of different types)...
for(i in 1:length(d0)){
d0[[i]] <- mutate_if(d0[[i]], is.numeric, as.character)
}
map_df(d0, bind_rows)
# # A tibble: 3 x 2
# x1 x2
# <chr> <chr>
# 1 1 a
# 2 2 b
# 3 P1 c
but I think I am missing a trick somewhere that would allow me to avoid the for loop. My attempts along these lines...
d0 %>%
map(mutate_if(., is.numeric, as.character)) %>%
map_df(.,bind_rows)
# Error in UseMethod("tbl_vars") :
# no applicable method for 'tbl_vars' applied to an object of class "list"
... do not seem to work (still getting my head around purrr)
You can use rbindlist() from data.table in this case
data.table::rbindlist(d0) %>%
dplyr::as_data_frame()
# A tibble: 3 x 2
x1 x2
<chr> <chr>
1 1 a
2 2 b
3 P1 c
There may be circumstances where you will want to make sure the fill argument is TRUE
Documentation reference:
If column i of input items do not all have the same type; e.g, a
data.table may be bound with a list or a column is factor while others
are character types, they are coerced to the highest type (SEXPTYPE).
How about this?
library(purrr)
map_df(lapply(d0, function(x) data.frame(lapply(x, as.character))), bind_rows)
Output is:
x1 x2
1 1 a
2 2 b
3 P1 c
Sample data:
d0 <- list(structure(list(x1 = c(1, 2), x2 = c("a", "b")), .Names = c("x1",
"x2"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(x1 = "P1", x2 = "c"), .Names = c("x1", "x2"
), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"
)))
With tidyverse, the option would be
library(tidyverse)
d0 %>%
map_df(~ .x %>%
mutate_if(is.numeric, as.character))
# A tibble: 3 x 2
# x1 x2
# <chr> <chr>
#1 1 a
#2 2 b
#3 P1 c
It's a good opportunity to use purrr::modify_depth :
library(purrr)
library(dplyr)
bind_rows(modify_depth(d0,2,as.character))
# # A tibble: 3 x 2
# x1 x2
# <chr> <chr>
# 1 1 a
# 2 2 b
# 3 P1 c

Resources