Summation over multiple rows based on a condition in R - r

I have a dataset like the following
name
city
number
A
M
2
A
N
3
A
O
5
A
P
7
B
M
7
B
N
8
B
0
9
B
P
2
For each of of the name category, I want to sum the number of M and N value and put it in a new variable. The same goes for O and P value.
The dataset should look like the following:
name
city
number
A
X
5
A
Y
12
B
X
15
B
Y
11
I'm new in R programming. I have tried to use group by and mutate method but was not successful.

We could modify the values in the column 'city' to 'X', 'Y', and do a group by sum
library(dplyr)
df1 %>%
group_by(name, city = case_when(city %in% c("M", "N") ~ 'X',
city %in% c("O", "P") ~ "Y")) %>%
summarise(number = sum(number), .groups = 'drop')
-output
# A tibble: 4 × 3
name city number
<chr> <chr> <int>
1 A X 5
2 A Y 12
3 B X 15
4 B Y 11
data
df1 <- structure(list(name = c("A", "A", "A", "A", "B", "B", "B", "B"
), city = c("M", "N", "O", "P", "M", "N", "O", "P"), number = c(2L,
3L, 5L, 7L, 7L, 8L, 9L, 2L)), row.names = c(NA, -8L), class = "data.frame")

Related

Creating new columns based on data in row separated by specific character in R

I've the following table
Owner
Pet
Housing_Type
A
Cats;Dog;Rabbit
3
B
Dog;Rabbit
2
C
Cats
2
D
Cats;Rabbit
3
E
Cats;Fish
1
The code is as follows:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Can anyone advise me how I can create new columns based on the data in Pet column by creating a new column for each animal separated by ; to look like the following table?
Owner
Cats
Dog
Rabbit
Fish
Housing_Type
A
Y
Y
Y
N
3
B
N
Y
Y
N
2
C
N
Y
N
N
2
D
Y
N
Y
N
3
E
Y
N
N
Y
1
Thanks!
One approach is to define a helper function that matches for a specific animal, then bind the columns to the original frame.
Note that some wrangling is done to get rid of whitespace to identify the unique animals to query.
f <- Vectorize(function(string, match) {
ifelse(grepl(match, string), "Y", "N")
}, c("match"))
df %>%
bind_cols(
f(df$Pets, unique(unlist(strsplit(trimws(as.character(df$Pets)), ";"))))
)
Owner Pets House_Type Cats Dog Rabbit Fish
1 A Cats;Dog;Rabbit 3 Y Y Y N
2 B Dog;Rabbit 2 N Y Y N
3 C Cats 2 Y N N N
4 D Cats;Rabbit 3 Y N Y N
5 E Cats;Fish 1 Y N N Y
Or more generalized if you don't know for sure that the separator is ;, and whitespace is present, stringi is useful:
dplyr::bind_cols(
df,
f(df$Pets, unique(unlist(stringi::stri_extract_all_words(df$Pets))))
)
You can use separate_rows and pivot_wider from tidyr library:
library(tidyr)
library(dplyr)
Data_Pets %>%
separate_rows(Pets , sep = ";") %>%
mutate(Pets = trimws(Pets)) %>%
mutate(temp = row_number()) %>%
pivot_wider(names_from = Pets, values_from = temp) %>%
mutate(across(c(Cats:Fish), function(x) if_else(is.na(x), "N", "Y"))) %>%
dplyr::relocate(House_Type, .after = Fish)
which will generate:
# Owner Cats Dog Rabbit Fish House_Type
# <fct> <chr> <chr> <chr> <chr> <int>
# 1 A Y Y Y N 3
# 2 B N Y Y N 2
# 3 C Y N N N 2
# 4 D Y N Y N 3
# 5 E Y N N Y 1
Data:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))

Rearrangement data using r

I would like to ask how can I rearrange my dataset that fulfils the following
[
Original :
Group Value_y Value_z
1 m a
1 n a
2 o b
2 p b
Intended:
Group Value_a Value_b
1 m n
2 o p
]1
which involves separating value_y according to value_z and adding a new column according to the group number. Will potential need to average a separate column's values and add as a new column the same way.
Thank you!
In data.table we can use dcast :
library(data.table)
dcast(setDT(df), Group~rowid(Value_z), value.var = 'Value_y')
# Group 1 2
#1: 1 m n
#2: 2 o p
data
df <- structure(list(Group = c(1L, 1L, 2L, 2L), Value_y = c("m", "n",
"o", "p"), Value_z = c("a", "a", "b", "b")), class = "data.frame",
row.names = c(NA, -4L))
There is a dplyr solution. Define
Uneven = seq(1, dim(A)[1] - 1, by = 2)
Even = seq(2, dim(A)[1], by = 2)
with
A = data.frame(Group = c(1, 1, 2, 2), Value_y = c("m", "n", "o", "p"))
Then, you can use the pipe and some dplyr functionality to get
A2 = A %>%
dplyr::group_by(Group) %>%
dplyr::mutate(Row_1 = Value_y[Uneven]) %>%
dplyr::mutate(Row_2 = Value_y[Even]) %>%
dplyr::select(-Value_y) %>%
dplryr::slice(1)
and the output
> A2
# A tibble: 2 x 3
# Groups: Group [2]
Group Row_1 Row_2
<dbl> <fct> <fct>
1 1 m n
2 2 o p
Note that this solution presupposes two-pairs of Groups, i.e. an even number of observations.

How to swap row values in the same column of a data frame?

I have a data frame that looks like the following:
ID Loc
1 N
2 A
3 N
4 H
5 H
I would like to swap A and H in the column Loc while not touching rows that have values of N, such that I get:
ID Loc
1 N
2 H
3 N
4 A
5 A
This dataframe is the result of a pipe so I'm looking to see if it's possible to append this operation to the pipe.
You could try:
df$Loc <- chartr("AH", "HA", df$Loc)
df
ID Loc
1 1 N
2 2 H
3 3 N
4 4 A
5 5 A
We can try chaining together two calls to ifelse, for a base R option:
df <- data.frame(ID=c(1:5), Loc=c("N", "A", "N", "H", "H"), stringsAsFactors=FALSE)
df$Loc <- ifelse(df$Loc=="A", "H", ifelse(df$Loc=="H", "A", df$Loc))
df
ID Loc
1 1 N
2 2 H
3 3 N
4 4 A
5 5 A
If you have a factor, you could simply reverse those levels
l <- levels(df$Loc)
l[l %in% c("A", "N")] <- c("N", "A")
df
# ID Loc
# 1 1 A
# 2 2 N
# 3 3 A
# 4 4 H
# 5 5 H
Data:
df <- structure(list(ID = 1:5, Loc = structure(c(3L, 1L, 3L, 2L, 2L
), .Label = c("A", "H", "N"), class = "factor")), .Names = c("ID",
"Loc"), class = "data.frame", row.names = c(NA, -5L))

Distinct in dplyr does not work (sometimes)

I have the following data frame which I have obtained from a count. I have used dput to make the data frame available and then edited the data frame so there is a duplicate of A.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
Now I would like to take distinct on Procedure and only keep the first A.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
It does not work. Strange...
If we print the Procedure column, we can see that there are duplicated levels for a, which is problematic for the distinct function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
One way to fix is to drop the factor levels. We can use factor function to achieve this. Another way is to convert the Procedure column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480
You have duplicated value in a label parameter .Label = c("A", "A", "C", "D", "-1"). That is an issue. Btw your way of initializing of a tibble seems to be very strange (i do not know exactly your goal but still)
Why not use
df <- tibble(
Procedure = c("D", "A", "A", "C"),
n = c(10717L, 4412L, 2058L, 1480L)
)

Iterate through grouped rows to get different pair combinations

Having the following table:
read.table(text = "route origin dest seq
1 a b 1
1 b c 2
1 c d 3
1 d e 4
2 f g 1
2 g h 2
2 h i 3", header = TRUE)
I'm trying to find a way of going through each row, grouped by route, and iterate every potential combination of origin destination pairs, taking into account the seq variable and the route as mentioned.
The output should look something like this:
origin dest
a b
a c
a d
a e
b c
b d
(...) (...)
The idea behind this is that a train e.g route 1, goes from a to e. However, I want to list every single possibility of train pairs with that. I tried with igraph but unsuccessfully.
Any ideas with dplyr or so?
library(dplyr)
library(tidyr)
df %>%
mutate_if(is.factor, as.character) %>% #convert factor variable to character
group_by(route) %>%
expand(origin = paste(origin, seq, sep = "_"), dest = paste(dest, seq, sep = "_")) %>% #all possible combination of origin & destination grouped by route
rowwise() %>%
filter(strsplit(origin, split = "_")[[1]][1] != strsplit(dest, split = "_")[[1]][1] &
strsplit(origin, split = "_")[[1]][2] <= strsplit(dest, split = "_")[[1]][2]) %>%
mutate(origin = gsub("_.*$", "", origin),
dest = gsub("_.*$", "", dest))
Output is:
route origin dest
1 1 a b
2 1 a c
3 1 a d
4 1 a e
5 1 b c
...
Sample data:
df <- structure(list(route = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), origin = structure(1:7, .Label = c("a",
"b", "c", "d", "f", "g", "h"), class = "factor"), dest = structure(1:7, .Label = c("b",
"c", "d", "e", "g", "h", "i"), class = "factor"), seq = c(1L,
2L, 3L, 4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-7L))
# route origin dest seq
#1 1 a b 1
#2 1 b c 2
#3 1 c d 3
#4 1 d e 4
#5 2 f g 1
#6 2 g h 2
#7 2 h i 3

Resources