Populate a column based on a pattern in another column

Populate a column based on a pattern in another column - r

I have a DF where I am trying to populate a column based on whether a pattern in a string exists.
A B
E3Y12
E3Y45
E3Y56
c1234
c56534
c3456
I would like to check if A contains the string 'E3Y' and populate the column B with "This one" or if it doesn't contain that pattern "That one"
I've tried the dplyr starts_with inside a case_when() and ifelse() statement with no avail since it has to be part of a select function.

You can use str_detect() to evaluate if a string contains a certain pattern and then using an ifelse is straightforward:
library(dplyr)
tibble( A = c(
"E3Y12",
"E3Y45",
"E3Y56",
"c1234",
"c56534",
"c3456")) %>%
mutate(B = ifelse(stringr::str_detect(A, "E3Y"), "This one", "That one"))

try:
library(dplyr)
df %>%
mutate(B = ifelse(grepl("E3Y", A), "This one", "That one"))
Output is:
# A tibble: 6 × 2
A B
<chr> <chr>
1 E3Y12 This one
2 E3Y45 This one
3 E3Y56 This one
4 c1234 That one
5 c56534 That one
6 c3456 That one
used
df <- structure(list(A = c("E3Y12", "E3Y45", "E3Y56", "c1234", "c56534",
"c3456"), B = c(NA, NA, NA, NA, NA, NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))

Related

Grabs rows where second column is equal to value

I have a dataset which looks something like this:
print(animals_in_zoo)
// I only know the name of the first column, the second one is dynamic/based on a previously calculated variable
animals | dynamic_column_name
// What the data looks like
elefant x
turtle
monkey
giraffe x
swan
tiger x
What I want is to collect the rows in which the second columns' value is equal to "x".
What I want to do is something like:
SELECT * from data where col2 == "x";
After that, I want to grab only the first column and create a string object like "elefant giraffe tiger", but that is the easy part.

You can reference that column by its index and use that to get the animals you want:
df1 <- structure(list(animal = c("elefant", "turtle", "monkey", "giraffe",
"swan", "tiger"), dynamic_column = c("x", NA, NA, "x", NA, "x"
)), row.names = c(NA, -6L), class = "data.frame")
df1[, 1][df1[, 2] == "x" & !is.na(df1[, 2])]
#> [1] "elefant" "giraffe" "tiger"

We could use filter with grepl which searches for a pattern 'x' in the string:
# the data frame
df <- read.table(header = TRUE, text =
'my_col
"elefant x"
turtle
monkey
"giraffe x"
swan
"tiger x"'
)
library(dplyr)
df %>%
filter(grepl('x', my_col))
my_col
1 elefant x
2 giraffe x
3 tiger x

Use [: the first argument refers to the rows. You want the rows where the second column is "x". The second argument is the column you need in the end, and you want the column named "animals":
dat[dat[2] == "x", "animals"]
#[1] "elefant" "giraffe" "tiger"
data
dat <- structure(list(animals = c("elefant", "turtle", "monkey", "giraffe",
"swan", "tiger"), V2 = c("x", "", "", "x", "", "x")), row.names = c(NA,
-6L), class = "data.frame")
# animals V2
# 1 elefant x
# 2 turtle
# 3 monkey
# 4 giraffe x
# 5 swan
# 6 tiger x

I guess you have a dataframe?
If so, something like df[df$col2 == 'x',] should work.

With base functions, you can do it like this:
# Option 1
your_dataframe[your_dataframe$col2 == "x", ]
# Option 2
your_dataframe[your_dataframe[,2] == "x", ]
With dplyr functions, you can do it like this:
library(dplyr)
your_dataframe %>%
filter(col2 == "x")

Rename columns of R dataframe with tidyselect and regular expression

I have a dataframe whose columns names are combinations of numbering and some complicated texts:
A1. Good day
A1a. Have a nice day
......
Z7d. Some other titles
Now I want to keep only the "A1.", "A1a.", "Z7d.", removing both the preceding number and the ending texts. Is there any idea how to do this with tidyselect and regex?

You can use this regex -
names(df) <- sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', names(df))
names(df)
#[1] "A1" "A1a" "Z7d"
The same regex can also be used in rename_with if you want a tidyverse answer.
library(dplyr)
df %>% rename_with(~sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', .))
# A1 A1a Z7d
#1 0.5755992 0.4147519 -0.1474461
#2 0.1347792 -0.6277678 0.3263348
#3 1.6884930 1.3931306 0.8809109
#4 -0.4269351 -1.2922231 -0.3362182
#5 -2.0032113 0.2619571 0.4496466
data
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))

We can use str_extract
library(stringr)
names(df) <- str_extract(names(df), "(?<=\\.\\s)[^.]+")
names(df)
[1] "A1" "A1a" "Z7d"
data
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))

Many string replacements in R

How can I do a string replace for one column, but multiple conditions.
I have the following data
strings <- as_tibble(c("string.a","string.a", "string.b", "string.c"))
# A tibble: 4 x 1
value
<chr>
1 string_alice
2 string_alice
3 string_bob
4 string_joe
and the following replacements
replacements <- c("alice", "bob", "joe")
conditions <- c(".a", ".b", ".c")
The resulting data would be
result <- as_tibble(c("string_alice", "string_bob", "string_joe"))
# A tibble: 4 x 1
value
<chr>
1 string_alice
2 string_alice
3 string_bob
4 string_joe
I have considered a mapping table of some sort, but it is not clear to me how to feed a mapping table to a string replace function.

nm = setNames(replacements, gsub("\\.", "", conditions))
sapply(strsplit(strings$value, "\\."), function(x){
paste(c(x[1], nm[x[2]]), collapse = ".")
})
Data
strings = structure(list(value = c("string.a", "string.a", "string.b",
"string.c")), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))

We can use gsubfn
library(gsubfn)
sub("\\.", "_", gsubfn("(\\w+)$", setNames(as.list(replacements),
sub("\\.", "", conditions)), strings$value))
#[1] "string_alice" "string_alice" "string_bob" "string_joe"

Slicing inside a column

I've done slicing within R to separate texts and columns before but having an issue when slicing inside a column. Let's Say I have this data
Zip Code <- c("90042 34.11332407100048 -118.19142869099971",
"90040 33.99649121800047 -118.15148940099971",
"90007 34.02833141800045 -118.28507659499968")
I want extract just the zip code and place it in a different column. The long/lat will also need to go to another column.
Do I use grep?

We can use tidyverse
library(tidyverse)
separate(dat, ZipCode, into = c('value', 'lat', 'lon'), sep= ' ')
# A tibble: 3 x 3
# value lat lon
#* <chr> <chr> <chr>
#1 90042 34.11332407100048 -118.19142869099971
#2 90040 33.99649121800047 -118.15148940099971
#3 90007 34.02833141800045 -118.28507659499968
data
dat <- structure(list(ZipCode = c("90042 34.11332407100048 -118.19142869099971",
"90040 33.99649121800047 -118.15148940099971", "90007 34.02833141800045 -118.28507659499968"
)), .Names = "ZipCode", class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))

Manipulate string in R

I'm looking to manipulate a set of strings in R.
The data I have:
Data Field
Mark Twain 5
I want it to instead be:
Data Field
Twain Mark 5
My idea was to first split the string into two columns and then concatenate. But I'm wondering if there is an easier way.

you can try this approach:
> df <- data.frame(Data=c("Mark Twain"), Field=5)
> df$Data <- lapply(strsplit(as.character(df$Data), " "), function(x) paste(rev(x), collapse=" "))
> df
Data Field
1 Twain Mark 5
This will work even if the number of rows in your data frame is > 1

we can use sub to do this
df1$Data <- sub("(\\S+)\\s+(\\S+)", "\\2 \\1", df1$Data)
df1
# Data Field
#1 Twain Mark 5
data
df1 <- structure(list(Data = "Mark Twain", Field = 5L),
.Names = c("Data", "Field"), class = "data.frame",
row.names = c(NA, -1L))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Populate a column based on a pattern in another column - r

You can use str_detect() to evaluate if a string contains a certain pattern and then using an ifelse is straightforward: library(dplyr) tibble( A = c( "E3Y12", "E3Y45", "E3Y56", "c1234", "c56534", "c3456")) %>% mutate(B = ifelse(stringr::str_detect(A, "E3Y"), "This one", "That one"))

Related

Grabs rows where second column is equal to value

Rename columns of R dataframe with tidyselect and regular expression

Many string replacements in R

Slicing inside a column

Manipulate string in R

Categories

Resources