This is not a duplicate of this question. Please read questions entirely before labeling duplicates.
I have a data.frame like so:
library(tidyverse)
tibble(
color = c("blue", "blue", "red", "green", "purple"),
shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)
color shape
<chr> <chr>
1 blue triangle
2 blue square
3 red circle
4 green hexagon
5 purple hexagon
I'd like to add a group_id column like this:
color shape group_id
<chr> <chr> <dbl>
1 blue triangle 1
2 blue square 1
3 red circle 2
4 green hexagon 3
5 purple hexagon 3
The difficulty is that I want to group by unique values of color or shape. I suspect the solution might be to use list-columns, but I can't figure out how.
We can use duplicated in base R
df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))
-output
df1
# A tibble: 5 x 3
# color shape group_id
# <chr> <chr> <int>
#1 blue triangle 1
#2 blue square 1
#3 red circle 2
#4 green hexagon 3
#5 purple hexagon 3
Or using tidyverse
library(dplyr)
library(purrr)
df1 %>%
mutate(group_id = map(., duplicated) %>%
reduce(`|`) %>%
`!` %>%
cumsum)
data
df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
Related
I need to generate a table counting new levels of a factor per site.
My code is like this
# Data creation
f = c("red", "green", "blue", "orange", "yellow")
f = factor(f)
d = data.frame(
site = 1:10,
color1= c(
"red", "red", "green", "green", "green",
"blue","green", "blue", "orange", "yellow"
),
color2= c(
"green", "green", "green", "blue","green",
"blue", "orange", "yellow","red", "red"
)
)
d$color1 = factor( d$color1 , levels = levels(f) )
d$color2 = factor( d$color2 , levels = levels(f) )
d
It shows me this table
I need to count how many new colors are in every new site. Only count first time appearing, not duplicated. Resulting a table like this one.
Counting not duplicated colors per site is in this figure.
Is there a dplyr way to find this output?
You can do:
library(tidyverse)
d %>%
pivot_longer(cols = -site) %>%
mutate(newColors = duplicated(value)) %>%
group_by(site) %>%
mutate(newColors = sum(!newColors)) %>%
ungroup() %>%
pivot_wider()
which gives:
# A tibble: 10 x 4
site newColors color1 color2
<int> <int> <fct> <fct>
1 1 2 red green
2 2 0 red green
3 3 0 green green
4 4 1 green blue
5 5 0 green green
6 6 0 blue blue
7 7 1 green orange
8 8 1 blue yellow
9 9 0 orange red
10 10 0 yellow red
Note that this differs for row 9 where you have a 1, but both colors (orange and red) already appeared in previous rows.
This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 1 year ago.
I would like to add news columns by count of each group in type. My dataframe is like this:
# color type
# black chair
# black chair
# black sofa
# pink plate
# pink chair
# red sofa
# red plate
I am looking for something like:
# color chair sofa plate
# black 2 1 0
# pink 1 0 1
# red 0 1 1
I used table(df$color, df$type), but the result has no name for column 'color'
We may use table from base R
table(df)
Or with pivot_wider
library(tidyr)
pivot_wider(df, names_from = type, values_from = type,
values_fn = length, values_fill = 0)
# A tibble: 3 × 4
color chair sofa plate
<chr> <int> <int> <int>
1 black 2 1 0
2 pink 1 0 1
3 red 0 1 1
Or with dcast
library(data.table)
dcast(setDT(df), color ~ type, value.var = 'type', length, fill = 0)
data
df <- structure(list(color = c("black", "black", "black", "pink", "pink",
"red", "red"), type = c("chair", "chair", "sofa", "plate", "chair",
"sofa", "plate")), class = "data.frame", row.names = c(NA, -7L
))
How can I return a value from a row that immediately follows one which satisfies a specific condition?
For example, let's say I have this dataset
ID Colour Result
1 red positive
2 blue positive
3 NA void
4 green negative
reproduced here:
structure(list(ID = c(1, 2, 3, 4),
Colour = c("red", "blue", NA, "green"),
Result = c("positive", "positive", "void", "negative")),
row.names = c(NA, -4L), class = c("data.frame"))
I want to be able to say, if Result == "void", then tell me what the value of Colour is in the row immediately afterwards.
So if I run such a function over this dataset, it should return green, as it is the colour that belongs to the row after the void row.
How can I do this?
Using which is natural:
dataset$Colour[1+which(dataset$Result == 'void')]
evaluates to "green".
How about fill from tidyr?
df <- read_table("ID Colour Result
1 red positive
2 blue positive
3 NA void
4 green negative")
> df %>% fill(Colour, .direction = "up")
# A tibble: 4 x 3
ID Colour Result
<dbl> <chr> <chr>
1 1 red positive
2 2 blue positive
3 3 green void
4 4 green negative
Or if it needs to be based on the Result column, then lead from dplyr:
df %>%
mutate(Colour = ifelse(
Result == "void",
lead(Colour),
Colour))
# A tibble: 4 x 3
ID Colour Result
<dbl> <chr> <chr>
1 1 red positive
2 2 blue positive
3 3 green void
4 4 green negative
This seems like a very basic operation, but my searches are not finding a simple solution.
As an example of what I am trying to do, consider the following two data frames from a database.
First an ID table that assigns an index to a color name:
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
ColorID
# A tibble: 4 x 2
ID Name
<int> <chr>
1 1 Red
2 2 Green
3 3 Blue
4 4 Black
Next some table that points to these color indexes (instead of storing text strings):
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
Front Back Top Bottom
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 1
2 3 4 3 2
3 4 3 2 3
4 2 3 1 4
5 1 1 2 3
6 1 2 3 2
Now I just want to join the two tables to substitute the index values with the actual color names, so what I want is:
Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
I've tried many iterations with no success, what I thought would work is something like:
J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))
I can tackle this column by column by using one variable at a time, e.g.
J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))
Which doesn't replace "Front", but instead creates a new "Name" column. Seems like there has to be a simple solution to this though. Thanks.
There is no need for join functions:
library(dplyr)
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ]
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
check_id <- function(col){
ColorID$Name[match(col, ColorID$ID)]
}
Widgets %>%
mutate(across(everything(), check_id))
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
(Edited) What I'm doing with dplyr and mutate is matching the numbers on Widgets with the number on the ColorID$ID column. This provides me with the row on the ColorID data frame I need for extracting the name.
Does this work:
library(dplyr)
library(tidyr)
Widgets %>% pivot_longer(everything()) %>%
inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>%
pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm having trouble to split column into multiple rows even after using separate_rows function.
It gives me the following error..
Error: Can't subset columns that don't exist.
INPUT:
ID Colours Shapes
1 Red Triangle
1 Red Square
2 Green, Black Circle
2 Green, Black Triangle
3 Blue Square
3 Blue Oval
OUTPUT:
ID Colours Shapes
1 Red Triangle
1 Red Square
2 Green Circle
2 Green Triangle
2 Black Circle
2 Black Triangle
3 Blue Square
3 Blue Oval
I tried to use separate_rows with your data and I had no problems:
df <- data.frame(ID = c(1,1,2,2,3,3),
Colours = c("Red", "Red", "Green, Black", "Green, Black", "Blue", "Blue"),
Shapes = c("Triangle", "Square", "Circle", "Triangle", "Square", "Oval"))
library(tidyr)
df %>% separate_rows(Colours, sep = ", ")
#> # A tibble: 8 x 3
#> ID Colours Shapes
#> <dbl> <chr> <chr>
#> 1 1 Red Triangle
#> 2 1 Red Square
#> 3 2 Green Circle
#> 4 2 Black Circle
#> 5 2 Green Triangle
#> 6 2 Black Triangle
#> 7 3 Blue Square
#> 8 3 Blue Oval
Try this. You can use tidyverse functions to separate rows by comma. The solution will work for n elements separated by comma. Initially, reshape data to long with pivot_longer(), then separate rows with separate_rows(). As ids for rows were necessary you can reshape to wide to obtain the expected output. Finally, use fill() to complete the missing values and arrange() to give the desired order. Here the code:
library(tidyverse)
#Code
newdf <- df %>% mutate(id=row_number()) %>%
pivot_longer(-c(ID,id)) %>%
separate_rows(value,sep=',') %>%
mutate(value=trimws(value)) %>%
group_by(id,name) %>% mutate(id2=row_number()) %>%
pivot_wider(names_from = name,values_from=value) %>%
fill(Shapes) %>% ungroup() %>% select(-c(id,id2)) %>%
arrange(ID,Colours)
Output:
# A tibble: 8 x 3
ID Colours Shapes
<int> <chr> <chr>
1 1 Red Triangle
2 1 Red Square
3 2 Black Circle
4 2 Black Triangle
5 2 Green Circle
6 2 Green Triangle
7 3 Blue Square
8 3 Blue Oval