This seems like a very basic operation, but my searches are not finding a simple solution.
As an example of what I am trying to do, consider the following two data frames from a database.
First an ID table that assigns an index to a color name:
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
ColorID
# A tibble: 4 x 2
ID Name
<int> <chr>
1 1 Red
2 2 Green
3 3 Blue
4 4 Black
Next some table that points to these color indexes (instead of storing text strings):
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
Front Back Top Bottom
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 1
2 3 4 3 2
3 4 3 2 3
4 2 3 1 4
5 1 1 2 3
6 1 2 3 2
Now I just want to join the two tables to substitute the index values with the actual color names, so what I want is:
Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
I've tried many iterations with no success, what I thought would work is something like:
J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))
I can tackle this column by column by using one variable at a time, e.g.
J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))
Which doesn't replace "Front", but instead creates a new "Name" column. Seems like there has to be a simple solution to this though. Thanks.
There is no need for join functions:
library(dplyr)
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ]
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
check_id <- function(col){
ColorID$Name[match(col, ColorID$ID)]
}
Widgets %>%
mutate(across(everything(), check_id))
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
(Edited) What I'm doing with dplyr and mutate is matching the numbers on Widgets with the number on the ColorID$ID column. This provides me with the row on the ColorID data frame I need for extracting the name.
Does this work:
library(dplyr)
library(tidyr)
Widgets %>% pivot_longer(everything()) %>%
inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>%
pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
This question relates somehow to another question I asked 14 days ago.
How to conditional subset a list in R based on range in another column
The difference here, is that I need to subset rows, instead of columns, and I cannot make that work.
I have imported more than 100 equal .xls files with 10 sheets each into a list in R. I am now trying to get the information out that I need. The data in the files are highly unstructured.
I have created some toy data to show what I want.
list3 <- list(data.frame(depth = c(NA,NA,NA,1,2,3,4,5),
col1 = c(NA,NA,"black",NA,"x",NA,NA,NA),
col2 = c(NA,NA,"blue",NA,NA,"x",NA,NA),
col3 = c(NA,NA,"white","x",NA,NA,NA,NA),
col4 = c(NA,NA,"grey",NA,NA,NA,"x",NA),
col5 = c(NA,NA,"yellow",NA,NA,NA,NA,"x")))
list4 <- list(data.frame(depth = c(NA,NA,NA,1,2,3,4,5),
col1 = c(NA,NA,"black",NA,NA,"x",NA,NA),
col2 = c(NA,NA,"blue",NA,NA,NA,"x",NA),
col3 = c(NA,NA,"white","x",NA,NA,NA,NA),
col4 = c(NA,NA,"grey",NA,"x",NA,NA,NA),
col5 = c(NA,NA,"yellow",NA,NA,NA,NA,"x")))
list5 <- list(data.frame(depth = c(NA,NA,NA,1,2,3,4,5),
col1 = c(NA,NA,"black",NA,"x","x",NA,NA),
col2 = c(NA,NA,"blue",NA,NA,NA,"x",NA),
col3 = c(NA,NA,"white","x",NA,NA,NA,NA),
col4 = c(NA,NA,"grey",NA,NA,NA,NA,NA),
col5 = c(NA,NA,"yellow",NA,NA,NA,NA,"x")))
my_list <- list(list3,list4,list5)
desired_result <- data.frame(depth = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
color = c("white","black","blue","grey","yellow",
"white","grey","black","blue","yellow",
"white","black","black","blue","yellow"))
As I mentioned in my previous question, the data are highly unstructered and I therefore need a solution based on subsetting a range.
I need to iterate over my list. I have done that with purrr:map with success so far. But this one I cant seem to figure out.
I need to link the color found on each depth in all my files. The result dont need to be in a dataframe, a vector for each depth is fine.
I hope for a purrr solution, but everything is thankfully accepted.
Additional requirement given in comments
Your my_list actually has no names! so try this syntax
library(janitor)
imap_dfr(my_list, ~(.x[[1]] %>% mutate(across(starts_with("col"), ~ifelse(. == "x", depth, .))) %>%
select(-depth) %>% row_to_names(3) %>% ungroup() %>%
pivot_longer(everything(), names_to = "color", values_to = "depth", values_drop_na = T) %>%
mutate(list_name = .y)))
# A tibble: 15 x 3
color depth list_name
<chr> <chr> <int>
1 white 1 1
2 black 2 1
3 blue 3 1
4 grey 4 1
5 yellow 5 1
6 white 1 2
7 grey 2 2
8 black 3 2
9 blue 4 2
10 yellow 5 2
11 white 1 3
12 black 2 3
13 black 3 3
14 blue 4 3
15 yellow 5 3
If list contain names, the output will have names else index numbers of list. Use of imap_dfr is recommended. Assumption lied is here that third column contains color names.
Try this:
library(purrr)
library(dplyr)
my_fun <-function(x){
depth <- x %>% summarise(across(.cols = starts_with("col"),.fns=~depth[which(.=="x")])) %>%
as.numeric()
color <- select(x,starts_with("col"))[3,] %>% as.character(.)
data.frame(depth,color) %>% arrange(depth)
}
map(my_list,function(l)do.call("rbind",map(l,my_fun))) %>% do.call("rbind",.)
Output:
# depth color
# 1 1 white
# 2 2 black
# 3 3 blue
# 4 4 grey
# 5 5 yellow
# 6 1 white
# 7 2 grey
# 8 3 black
# 9 4 blue
# 10 5 yellow
This is not a duplicate of this question. Please read questions entirely before labeling duplicates.
I have a data.frame like so:
library(tidyverse)
tibble(
color = c("blue", "blue", "red", "green", "purple"),
shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)
color shape
<chr> <chr>
1 blue triangle
2 blue square
3 red circle
4 green hexagon
5 purple hexagon
I'd like to add a group_id column like this:
color shape group_id
<chr> <chr> <dbl>
1 blue triangle 1
2 blue square 1
3 red circle 2
4 green hexagon 3
5 purple hexagon 3
The difficulty is that I want to group by unique values of color or shape. I suspect the solution might be to use list-columns, but I can't figure out how.
We can use duplicated in base R
df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))
-output
df1
# A tibble: 5 x 3
# color shape group_id
# <chr> <chr> <int>
#1 blue triangle 1
#2 blue square 1
#3 red circle 2
#4 green hexagon 3
#5 purple hexagon 3
Or using tidyverse
library(dplyr)
library(purrr)
df1 %>%
mutate(group_id = map(., duplicated) %>%
reduce(`|`) %>%
`!` %>%
cumsum)
data
df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I have the following example dataframe "df" with the variable "Text" containing text:
df:
Text
1 I like blue shoes.
2 Black is great!
3 Pink and grey books.
4 I don't like grey trousers.
5 Yellow is my favorite colour
6 No more green!
7 Cars are red.
8 I have a pink bike
I use the following code to filter every case which contains at least one of the listed words, which works perfectly fine:
library(tidyverse)
library(igraph)
library(stringi)
library(stringr)
filter <- c("blue","green","yellow","red")
df2 <-
df %>%
filter(str_detect(tolower(df$Text), paste(filter, collapse = "|")))
df2:
Text
1 I like blue shoes.
5 Yellow is my favorite colour
6 No mor green!
7 Cars are red.
As an additional condition, I now want to add the combination of "pink" and "grey", filtering for at least one of the listed words above OR the combination. The dataframe I want to have looks like that:
df2:
Text
1 I like blue shoes.
3 Pink and grey books.
5 Yellow is my favorite colour
6 No mor green!
7 Cars are red.
Do you have any idea how I can get there?
Thanks in advance!
You can use the & operator to combine filter operations ( there is also the | OR operator).
> f1
[1] "blue" "green" "yellow" "red"
> f2
[1] "pink" "grey"
> df
# A tibble: 4 x 2
Text1 Text2
<chr> <chr>
1 Yellow This
2 red That
3 Purple grey The
4 green pink other
> filter(df, str_detect(Text1, paste0(f1, collapse = "|")))
# A tibble: 2 x 2
Text1 Text2
<chr> <chr>
1 red That
2 green pink other
> filter(df,
str_detect(Text1, paste0(f1, collapse = "|")) &
str_detect(Text1, paste0(f2, collapse = "|")))
# A tibble: 1 x 2
Text1 Text2
<chr> <chr>
1 green pink other
Note the second requires both operations.
EDIT ADDRESSING THE COMMENT
> filter(df,
str_detect(Text1, paste0(f1, collapse = "|")) |
(str_detect(Text1, "pink") & str_detect(Text1, "grey")))
You can still use & or | operators together with brackets to get the logical combinations you want.