This seems like a very basic operation, but my searches are not finding a simple solution.
As an example of what I am trying to do, consider the following two data frames from a database.
First an ID table that assigns an index to a color name:
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
ColorID
# A tibble: 4 x 2
ID Name
<int> <chr>
1 1 Red
2 2 Green
3 3 Blue
4 4 Black
Next some table that points to these color indexes (instead of storing text strings):
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
Front Back Top Bottom
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 1
2 3 4 3 2
3 4 3 2 3
4 2 3 1 4
5 1 1 2 3
6 1 2 3 2
Now I just want to join the two tables to substitute the index values with the actual color names, so what I want is:
Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
I've tried many iterations with no success, what I thought would work is something like:
J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))
I can tackle this column by column by using one variable at a time, e.g.
J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))
Which doesn't replace "Front", but instead creates a new "Name" column. Seems like there has to be a simple solution to this though. Thanks.
There is no need for join functions:
library(dplyr)
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ]
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
check_id <- function(col){
ColorID$Name[match(col, ColorID$ID)]
}
Widgets %>%
mutate(across(everything(), check_id))
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
(Edited) What I'm doing with dplyr and mutate is matching the numbers on Widgets with the number on the ColorID$ID column. This provides me with the row on the ColorID data frame I need for extracting the name.
Does this work:
library(dplyr)
library(tidyr)
Widgets %>% pivot_longer(everything()) %>%
inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>%
pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
Related
I need to generate a table counting new levels of a factor per site.
My code is like this
# Data creation
f = c("red", "green", "blue", "orange", "yellow")
f = factor(f)
d = data.frame(
site = 1:10,
color1= c(
"red", "red", "green", "green", "green",
"blue","green", "blue", "orange", "yellow"
),
color2= c(
"green", "green", "green", "blue","green",
"blue", "orange", "yellow","red", "red"
)
)
d$color1 = factor( d$color1 , levels = levels(f) )
d$color2 = factor( d$color2 , levels = levels(f) )
d
It shows me this table
I need to count how many new colors are in every new site. Only count first time appearing, not duplicated. Resulting a table like this one.
Counting not duplicated colors per site is in this figure.
Is there a dplyr way to find this output?
You can do:
library(tidyverse)
d %>%
pivot_longer(cols = -site) %>%
mutate(newColors = duplicated(value)) %>%
group_by(site) %>%
mutate(newColors = sum(!newColors)) %>%
ungroup() %>%
pivot_wider()
which gives:
# A tibble: 10 x 4
site newColors color1 color2
<int> <int> <fct> <fct>
1 1 2 red green
2 2 0 red green
3 3 0 green green
4 4 1 green blue
5 5 0 green green
6 6 0 blue blue
7 7 1 green orange
8 8 1 blue yellow
9 9 0 orange red
10 10 0 yellow red
Note that this differs for row 9 where you have a 1, but both colors (orange and red) already appeared in previous rows.
This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 1 year ago.
I would like to add news columns by count of each group in type. My dataframe is like this:
# color type
# black chair
# black chair
# black sofa
# pink plate
# pink chair
# red sofa
# red plate
I am looking for something like:
# color chair sofa plate
# black 2 1 0
# pink 1 0 1
# red 0 1 1
I used table(df$color, df$type), but the result has no name for column 'color'
We may use table from base R
table(df)
Or with pivot_wider
library(tidyr)
pivot_wider(df, names_from = type, values_from = type,
values_fn = length, values_fill = 0)
# A tibble: 3 × 4
color chair sofa plate
<chr> <int> <int> <int>
1 black 2 1 0
2 pink 1 0 1
3 red 0 1 1
Or with dcast
library(data.table)
dcast(setDT(df), color ~ type, value.var = 'type', length, fill = 0)
data
df <- structure(list(color = c("black", "black", "black", "pink", "pink",
"red", "red"), type = c("chair", "chair", "sofa", "plate", "chair",
"sofa", "plate")), class = "data.frame", row.names = c(NA, -7L
))
This is not a duplicate of this question. Please read questions entirely before labeling duplicates.
I have a data.frame like so:
library(tidyverse)
tibble(
color = c("blue", "blue", "red", "green", "purple"),
shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)
color shape
<chr> <chr>
1 blue triangle
2 blue square
3 red circle
4 green hexagon
5 purple hexagon
I'd like to add a group_id column like this:
color shape group_id
<chr> <chr> <dbl>
1 blue triangle 1
2 blue square 1
3 red circle 2
4 green hexagon 3
5 purple hexagon 3
The difficulty is that I want to group by unique values of color or shape. I suspect the solution might be to use list-columns, but I can't figure out how.
We can use duplicated in base R
df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))
-output
df1
# A tibble: 5 x 3
# color shape group_id
# <chr> <chr> <int>
#1 blue triangle 1
#2 blue square 1
#3 red circle 2
#4 green hexagon 3
#5 purple hexagon 3
Or using tidyverse
library(dplyr)
library(purrr)
df1 %>%
mutate(group_id = map(., duplicated) %>%
reduce(`|`) %>%
`!` %>%
cumsum)
data
df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm having trouble to split column into multiple rows even after using separate_rows function.
It gives me the following error..
Error: Can't subset columns that don't exist.
INPUT:
ID Colours Shapes
1 Red Triangle
1 Red Square
2 Green, Black Circle
2 Green, Black Triangle
3 Blue Square
3 Blue Oval
OUTPUT:
ID Colours Shapes
1 Red Triangle
1 Red Square
2 Green Circle
2 Green Triangle
2 Black Circle
2 Black Triangle
3 Blue Square
3 Blue Oval
I tried to use separate_rows with your data and I had no problems:
df <- data.frame(ID = c(1,1,2,2,3,3),
Colours = c("Red", "Red", "Green, Black", "Green, Black", "Blue", "Blue"),
Shapes = c("Triangle", "Square", "Circle", "Triangle", "Square", "Oval"))
library(tidyr)
df %>% separate_rows(Colours, sep = ", ")
#> # A tibble: 8 x 3
#> ID Colours Shapes
#> <dbl> <chr> <chr>
#> 1 1 Red Triangle
#> 2 1 Red Square
#> 3 2 Green Circle
#> 4 2 Black Circle
#> 5 2 Green Triangle
#> 6 2 Black Triangle
#> 7 3 Blue Square
#> 8 3 Blue Oval
Try this. You can use tidyverse functions to separate rows by comma. The solution will work for n elements separated by comma. Initially, reshape data to long with pivot_longer(), then separate rows with separate_rows(). As ids for rows were necessary you can reshape to wide to obtain the expected output. Finally, use fill() to complete the missing values and arrange() to give the desired order. Here the code:
library(tidyverse)
#Code
newdf <- df %>% mutate(id=row_number()) %>%
pivot_longer(-c(ID,id)) %>%
separate_rows(value,sep=',') %>%
mutate(value=trimws(value)) %>%
group_by(id,name) %>% mutate(id2=row_number()) %>%
pivot_wider(names_from = name,values_from=value) %>%
fill(Shapes) %>% ungroup() %>% select(-c(id,id2)) %>%
arrange(ID,Colours)
Output:
# A tibble: 8 x 3
ID Colours Shapes
<int> <chr> <chr>
1 1 Red Triangle
2 1 Red Square
3 2 Black Circle
4 2 Black Triangle
5 2 Green Circle
6 2 Green Triangle
7 3 Blue Square
8 3 Blue Oval
In R, I'm trying to take the first value of a character variable and use it to rename the same variable or even to assign a name to another new variable, but I haven't figured out how to do this.
Example:
PR <- data.frame("Variable1" = c("Red", "Blue", "Green", "Yellow"),
"Variable2" = seq(1:4))
PR
Variable1 Variable2
1 Red 1
2 Blue 2
3 Green 3
4 Yellow 4
I know one could just use "PR %>% rename(Red = Variable1)", but I want R to take this name from the variable directly. The outcome should be:
Red Variable2
1 Red 1
2 Blue 2
3 Green 3
4 Yellow 4
I've trayed to use "rename()" function from dplyr to make it but it didn't work:
PR <- PR %>% rename(as.name(Variable1)[1] = Variable1)
Error: unexpected '=' in "PR <- PR %>% rename(as.name(Variable1)[1] ="
How could I do this using dplyr, or even in the context of creating a new variable with the "mutate()" command (for example if I want to create a new variable which name is the first value of "Variable1")?
Does this work:
> PR
Variable1 Variable2
1 Red 1
2 Blue 2
3 Green 3
4 Yellow 4
> name <- PR$Variable1[1]
> PR %>% rename(!!sym(name) := Variable1)
Red Variable2
1 Red 1
2 Blue 2
3 Green 3
4 Yellow 4
>
You need to use a special substitute.
library(tidyverse)
PR <- data.frame("Variable1" = c("Red", "Blue", "Green", "Yellow"),
"Variable2" = seq(1:4))
#Note the sequence of commands
PR %>%
mutate(Variable3 = PR$Variable1[1]) %>%
rename(!!PR$Variable1[1] := Variable1)
# Red Variable2 Variable3
# 1 Red 1 Red
# 2 Blue 2 Red
# 3 Green 3 Red
# 4 Yellow 4 Red
We can use rename_at
library(dplyr)
PR %>%
rename_at(vars(Variable1), ~ PR$Variable1[1])
#. Red Variable2
#1 Red 1
#2 Blue 2
#3 Green 3
#4 Yellow 4