Having the following table:
read.table(text = "route origin dest seq
1 a b 1
1 b c 2
1 c d 3
1 d e 4
2 f g 1
2 g h 2
2 h i 3", header = TRUE)
I'm trying to find a way of going through each row, grouped by route, and iterate every potential combination of origin destination pairs, taking into account the seq variable and the route as mentioned.
The output should look something like this:
origin dest
a b
a c
a d
a e
b c
b d
(...) (...)
The idea behind this is that a train e.g route 1, goes from a to e. However, I want to list every single possibility of train pairs with that. I tried with igraph but unsuccessfully.
Any ideas with dplyr or so?
library(dplyr)
library(tidyr)
df %>%
mutate_if(is.factor, as.character) %>% #convert factor variable to character
group_by(route) %>%
expand(origin = paste(origin, seq, sep = "_"), dest = paste(dest, seq, sep = "_")) %>% #all possible combination of origin & destination grouped by route
rowwise() %>%
filter(strsplit(origin, split = "_")[[1]][1] != strsplit(dest, split = "_")[[1]][1] &
strsplit(origin, split = "_")[[1]][2] <= strsplit(dest, split = "_")[[1]][2]) %>%
mutate(origin = gsub("_.*$", "", origin),
dest = gsub("_.*$", "", dest))
Output is:
route origin dest
1 1 a b
2 1 a c
3 1 a d
4 1 a e
5 1 b c
...
Sample data:
df <- structure(list(route = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), origin = structure(1:7, .Label = c("a",
"b", "c", "d", "f", "g", "h"), class = "factor"), dest = structure(1:7, .Label = c("b",
"c", "d", "e", "g", "h", "i"), class = "factor"), seq = c(1L,
2L, 3L, 4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-7L))
# route origin dest seq
#1 1 a b 1
#2 1 b c 2
#3 1 c d 3
#4 1 d e 4
#5 2 f g 1
#6 2 g h 2
#7 2 h i 3
Related
I was trying to do something kind of simple. My dataframe looks like this:
ID value
1 a
2 b
2 c
3 d
3 d
4 e
4 e
4 e
What I wanted to do is to filter groups with more than one row and where all the values in the value column are the same:
df %>% group_by(ID) %>% filter(n() > 1 & all(mysterious_condition))
So mysterious_condition is what I'm lacking. What I'm trying to achieve is this:
ID value
3 d
3 d
4 e
4 e
4 e
Any thoughts on how to accomplish this?
Thanks!
We may use n_distinct to check for the count of unique elements
library(dplyr)
df %>%
group_by(ID) %>%
filter(n() >1, n_distinct(value) == 1) %>%
ungroup
-output
# A tibble: 5 × 2
ID value
<int> <chr>
1 3 d
2 3 d
3 4 e
4 4 e
5 4 e
data
df <- structure(list(ID = c(1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), value = c("a",
"b", "c", "d", "d", "e", "e", "e")), class = "data.frame", row.names = c(NA,
-8L))
I've the following table
Owner
Pet
Housing_Type
A
Cats;Dog;Rabbit
3
B
Dog;Rabbit
2
C
Cats
2
D
Cats;Rabbit
3
E
Cats;Fish
1
The code is as follows:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Can anyone advise me how I can create new columns based on the data in Pet column by creating a new column for each animal separated by ; to look like the following table?
Owner
Cats
Dog
Rabbit
Fish
Housing_Type
A
Y
Y
Y
N
3
B
N
Y
Y
N
2
C
N
Y
N
N
2
D
Y
N
Y
N
3
E
Y
N
N
Y
1
Thanks!
One approach is to define a helper function that matches for a specific animal, then bind the columns to the original frame.
Note that some wrangling is done to get rid of whitespace to identify the unique animals to query.
f <- Vectorize(function(string, match) {
ifelse(grepl(match, string), "Y", "N")
}, c("match"))
df %>%
bind_cols(
f(df$Pets, unique(unlist(strsplit(trimws(as.character(df$Pets)), ";"))))
)
Owner Pets House_Type Cats Dog Rabbit Fish
1 A Cats;Dog;Rabbit 3 Y Y Y N
2 B Dog;Rabbit 2 N Y Y N
3 C Cats 2 Y N N N
4 D Cats;Rabbit 3 Y N Y N
5 E Cats;Fish 1 Y N N Y
Or more generalized if you don't know for sure that the separator is ;, and whitespace is present, stringi is useful:
dplyr::bind_cols(
df,
f(df$Pets, unique(unlist(stringi::stri_extract_all_words(df$Pets))))
)
You can use separate_rows and pivot_wider from tidyr library:
library(tidyr)
library(dplyr)
Data_Pets %>%
separate_rows(Pets , sep = ";") %>%
mutate(Pets = trimws(Pets)) %>%
mutate(temp = row_number()) %>%
pivot_wider(names_from = Pets, values_from = temp) %>%
mutate(across(c(Cats:Fish), function(x) if_else(is.na(x), "N", "Y"))) %>%
dplyr::relocate(House_Type, .after = Fish)
which will generate:
# Owner Cats Dog Rabbit Fish House_Type
# <fct> <chr> <chr> <chr> <chr> <int>
# 1 A Y Y Y N 3
# 2 B N Y Y N 2
# 3 C Y N N N 2
# 4 D Y N Y N 3
# 5 E Y N N Y 1
Data:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))
I have a column of numbers (index) in a dataframe like the below. I am attempting to check if these numbers are in ascending order by the value of 1. For example, group B and C do not ascend by 1. While I can check by sight, my dataframe is thousands of rows long, so I'd prefer to automate this. Does anyone have advice? Thank you!
group index
A 0
A 1
A 2
A 3
A 4
B 0
B 1
B 2
B 2
C 0
C 3
C 1
C 2
...
I think this works. diff calculates the difference between the two subsequent numbers, and then we can use all to see if all the differences are 1. dat2 is the final output.
library(dplyr)
dat2 <- dat %>%
group_by(group) %>%
summarize(Result = all(diff(index) == 1)) %>%
ungroup()
dat2
# # A tibble: 3 x 2
# group Result
# <chr> <lgl>
# 1 A TRUE
# 2 B FALSE
# 3 C FALSE
DATA
dat <- read.table(text = "group index
A 0
A 1
A 2
A 3
A 4
B 0
B 1
B 2
B 2
C 0
C 3
C 1
C 2",
header = TRUE, stringsAsFactors = FALSE)
Maybe aggregate could help
> aggregate(.~group,df1,function(v) all(diff(v)==1))
group index
1 A TRUE
2 B FALSE
3 C FALSE
We can do a group by group, get the difference between the current and previous value (shift) and check if all the differences are equal to 1.
library(data.table)
setDT(df1)[, .(Result = all((index - shift(index))[-1] == 1)), group]
# group Result
#1: A TRUE
#2: B FALSE
#3: C FALSE
data
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "C", "C", "C", "C"), index = c(0L, 1L, 2L, 3L, 4L, 0L, 1L,
2L, 2L, 0L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-13L))
I have a data frame that looks like the following:
ID Loc
1 N
2 A
3 N
4 H
5 H
I would like to swap A and H in the column Loc while not touching rows that have values of N, such that I get:
ID Loc
1 N
2 H
3 N
4 A
5 A
This dataframe is the result of a pipe so I'm looking to see if it's possible to append this operation to the pipe.
You could try:
df$Loc <- chartr("AH", "HA", df$Loc)
df
ID Loc
1 1 N
2 2 H
3 3 N
4 4 A
5 5 A
We can try chaining together two calls to ifelse, for a base R option:
df <- data.frame(ID=c(1:5), Loc=c("N", "A", "N", "H", "H"), stringsAsFactors=FALSE)
df$Loc <- ifelse(df$Loc=="A", "H", ifelse(df$Loc=="H", "A", df$Loc))
df
ID Loc
1 1 N
2 2 H
3 3 N
4 4 A
5 5 A
If you have a factor, you could simply reverse those levels
l <- levels(df$Loc)
l[l %in% c("A", "N")] <- c("N", "A")
df
# ID Loc
# 1 1 A
# 2 2 N
# 3 3 A
# 4 4 H
# 5 5 H
Data:
df <- structure(list(ID = 1:5, Loc = structure(c(3L, 1L, 3L, 2L, 2L
), .Label = c("A", "H", "N"), class = "factor")), .Names = c("ID",
"Loc"), class = "data.frame", row.names = c(NA, -5L))
I got a data.frame that looks like the following one:
OBJECT ID TASK
1 A
1 C
1 D
1 E
2 A
2 B
2 C
2 D
2 F
Now I would like to count the unique successive combinations within the data.frame in order to get following result:
PREDECESSOR SUCCESSOR COUNT
A C 1
C D 2
D E 1
A B 1
B C 1
D F 1
I've already figured out to extract the successive values with the help of two for loops, but I'm failing the assignment and counting task within a new data.frame (or list).
aggregate(COUNT~.,
data.frame(PREDECESSOR = head(df1$TASK, -1),
SUCCESSOR = tail(df1$TASK, -1),
COUNT = 1),
length)
# PREDECESSOR SUCCESSOR COUNT
#1 E A 1
#2 A B 1
#3 A C 1
#4 B C 1
#5 C D 2
#6 D E 1
#7 D F 1
You could use a similar approach even if you want to first split by OBJECT.ID
temp = do.call(rbind, lapply(split(df1, df1$OBJECT.ID), function(X){
aggregate(COUNT~., data.frame(PREDECESSOR = head(X$TASK, -1),
SUCCESSOR = tail(X$TASK, -1),
COUNT = 1),
length)
}))
aggregate(COUNT~., temp, length)
# PREDECESSOR SUCCESSOR COUNT
#1 A C 1
#2 B C 1
#3 C D 2
#4 D E 1
#5 A B 1
#6 D F 1
DATA
df1 = structure(list(OBJECT.ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), TASK = c("A", "C", "D", "E", "A", "B", "C", "D", "F")), .Names = c("OBJECT.ID",
"TASK"), class = "data.frame", row.names = c(NA, -9L))
Solution using data.table:
Code:
library(data.table)
setDT(df)
df[, TASK0 := shift(TASK), OBJECT]
df[!is.na(TASK0), .N, .(TASK, TASK0)][, .(
COUNT = sum(N)), .(PREDECESSOR = TASK0, SUCCESSOR = TASK)]
Result:
PREDECESSOR SUCCESSOR COUNT
1: A C 1
2: C D 2
3: D E 1
4: A B 1
5: B C 1
6: D F 1
Explanation:
setDT(df): turns data.frame into a data.table object
[, TASK0 := shift(TASK), OBJECT]: gets previous letter for each OBJECT
!is.na(TASK0): gets rid of first row for each OBJECT (they don't have PREDECESSOR)
.N, .(TASK, TASK0): counts occurences of TASK and TASK0 (previous letter combinations)
sum(N): sums counts
Data (df):
structure(list(OBJECT = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
TASK = c("A", "C", "D", "E", "A", "B", "C", "D", "F")), .Names = c("OBJECT",
"TASK"), row.names = c(NA, -9L), class = c("data.table", "data.frame"
))
Just to get the counts, you do it with the following two lines:
cc <- cbind(df$TASK,c(df$TASK[-1],"LAST"))
table(paste(cc[,1],cc[2],sep="-"))
The result is
A-B A-C B-C C-D D-E D-F E-A F-LAST
1 1 1 2 1 1 1 1