I've done slicing within R to separate texts and columns before but having an issue when slicing inside a column. Let's Say I have this data
Zip Code <- c("90042 34.11332407100048 -118.19142869099971",
"90040 33.99649121800047 -118.15148940099971",
"90007 34.02833141800045 -118.28507659499968")
I want extract just the zip code and place it in a different column. The long/lat will also need to go to another column.
Do I use grep?
We can use tidyverse
library(tidyverse)
separate(dat, ZipCode, into = c('value', 'lat', 'lon'), sep= ' ')
# A tibble: 3 x 3
# value lat lon
#* <chr> <chr> <chr>
#1 90042 34.11332407100048 -118.19142869099971
#2 90040 33.99649121800047 -118.15148940099971
#3 90007 34.02833141800045 -118.28507659499968
data
dat <- structure(list(ZipCode = c("90042 34.11332407100048 -118.19142869099971",
"90040 33.99649121800047 -118.15148940099971", "90007 34.02833141800045 -118.28507659499968"
)), .Names = "ZipCode", class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))
Related
I have a DF where I am trying to populate a column based on whether a pattern in a string exists.
A B
E3Y12
E3Y45
E3Y56
c1234
c56534
c3456
I would like to check if A contains the string 'E3Y' and populate the column B with "This one" or if it doesn't contain that pattern "That one"
I've tried the dplyr starts_with inside a case_when() and ifelse() statement with no avail since it has to be part of a select function.
You can use str_detect() to evaluate if a string contains a certain pattern and then using an ifelse is straightforward:
library(dplyr)
tibble( A = c(
"E3Y12",
"E3Y45",
"E3Y56",
"c1234",
"c56534",
"c3456")) %>%
mutate(B = ifelse(stringr::str_detect(A, "E3Y"), "This one", "That one"))
try:
library(dplyr)
df %>%
mutate(B = ifelse(grepl("E3Y", A), "This one", "That one"))
Output is:
# A tibble: 6 × 2
A B
<chr> <chr>
1 E3Y12 This one
2 E3Y45 This one
3 E3Y56 This one
4 c1234 That one
5 c56534 That one
6 c3456 That one
used
df <- structure(list(A = c("E3Y12", "E3Y45", "E3Y56", "c1234", "c56534",
"c3456"), B = c(NA, NA, NA, NA, NA, NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
This question already has answers here:
subset a column in data frame based on another data frame/list
(2 answers)
Closed 2 years ago.
I have a tibble similar to:
tibble(
x = c("christmas", "christmas", "car", "dog")
y = c("one","two","three", "four")
)
and then I have another tibble like:
tibble(
x = c("christmas", "dog")
)
Notice the two christmas' that are in the first tibble.
I want to use the second tibble's column to output new columns from the first:
tibble(
x = c("christmas","christmas", "dog")
y = c("one","two","four")
)
Try this base R solution using %in% and indexing:
#Code
df1[df1$x %in% df2$x,]
Output:
# A tibble: 3 x 2
x y
<chr> <chr>
1 christmas one
2 christmas two
3 dog four
Some data used:
#Data 1
df1 <- structure(list(x = c("christmas", "christmas", "car", "dog"),
y = c("one", "two", "three", "four")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
#Data 2
df2 <- structure(list(x = c("christmas", "dog")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
If you are comfortable with SQL terminology:
library(dplyr)
> df1 %>% inner_join(df2)
Joining, by = "x"
# A tibble: 3 x 2
x y
<chr> <chr>
1 christmas one
2 christmas two
3 dog four
>
Using base R
> merge(df1, df2)
x y
1 christmas one
2 christmas two
3 dog four
>
I would like to add a date column to a data frame. The date column needs to populate automatically for the full length of the column. See example below:
enter image description here
Data frame:
df = structure(list(Name = c("Joe", "Sanj", "Rob"),
Col1 = c(20, 60, 40),
Col2 = c(100, 233, 500)),
row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
You can add Sys.Date() (todays date) as a new column.
df$Date <- Sys.Date()
# A tibble: 3 x 4
# Name Col1 Col2 Date
# <chr> <dbl> <dbl> <date>
#1 Joe 20 100 2020-08-12
#2 Sanj 60 233 2020-08-12
#3 Rob 40 500 2020-08-12
The simplest way to do this is by doing the following:
df$date <- as.Date("2020-08-12")
This assigns the data value "2020-08-12" to a new column in df called date. When assigning a length 1 vector to a new column of a dataframe, R will recycle to the same length as the columns in your dataframe (3 in this case). We wrap your date ("2020-08-21") in as.Date() so the column class is "date". If we do not do this, the class will be "character".
The main dataframe has a column "passings". It is the only nested variable in the main dataframe. Inside it, there are dataframes (an example a nested cell). In the nested cells, the number of rows varies, yet the number of columns is the same. The columns names are "date" and "title". What I need is to grab a respective date and put it in the main dataframe as a new variable if title is "Закон прийнято" ("A passed law" - translation).
I'm a newbie in coding.
I will appreciate your help!
dataframe
an example of a dataframe within a nested cell
Here is an option where we loop over the 'passings' list column with map (based on the image, it would be a list of 2 column data.frame), filter the rows where the 'title' is "Закон прийнято" (assuming only a single value per row) and pull the 'date' column to create a new column 'date' in the original dataset
library(dplyr)
library(purrr)
df1 %>%
mutate(date = map_chr(passings, ~ .x %>%
filter(title == "Закон прийнято") %>%
pull(date)))
# id passed passings date
#1 54949 TRUE 2015-06-10, 2015-06-08, abcb, Закон прийнято 2015-06-08
#2 55009 TRUE 2015-06-10, 2015-09-08, bcb, Закон прийнято 2015-09-08
NOTE: It works as expected.
data
df1 <- structure(list(id = c(54949, 55009), passed = c(TRUE, TRUE),
passings = list(structure(list(date = c("2015-06-10", "2015-06-08"
), title = c("abcb", "Закон прийнято")), class = "data.frame", row.names = c(NA,
-2L)), structure(list(date = c("2015-06-10", "2015-09-08"
), title = c("bcb", "Закон прийнято")), class = "data.frame", row.names = c(NA,
-2L)))), row.names = c(NA, -2L), class = "data.frame")
I'm looking to manipulate a set of strings in R.
The data I have:
Data Field
Mark Twain 5
I want it to instead be:
Data Field
Twain Mark 5
My idea was to first split the string into two columns and then concatenate. But I'm wondering if there is an easier way.
you can try this approach:
> df <- data.frame(Data=c("Mark Twain"), Field=5)
> df$Data <- lapply(strsplit(as.character(df$Data), " "), function(x) paste(rev(x), collapse=" "))
> df
Data Field
1 Twain Mark 5
This will work even if the number of rows in your data frame is > 1
we can use sub to do this
df1$Data <- sub("(\\S+)\\s+(\\S+)", "\\2 \\1", df1$Data)
df1
# Data Field
#1 Twain Mark 5
data
df1 <- structure(list(Data = "Mark Twain", Field = 5L),
.Names = c("Data", "Field"), class = "data.frame",
row.names = c(NA, -1L))