concatenate data based on a certain sequence - r

my data look like this way, and variable day ranges from 1 to 232. This is just a shorter version of the data, the real data have over 20000000 rows with variable 'day' ranging from 1 to 232
day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2
and I have a vector that contains 1000 of randomly selected from sequences of variable day (1-232), say
df=c(3,4,1,2,...,4,1,3)
I want to create a new dataset that sorts based on the sequence. The we first extract day=3 from the data, and then extract day=4 after it, then extracr day=1 and rbind thereafter. For example, the first 4 sequence should look like this way:
day time
3 4
3 5
4 4
4 2
1 2
1 2
2 2
2 3

Base R method:
x <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), time = c(2L,
2L, 2L, 3L, 4L, 5L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
df <- c(3,4,1,2,4,1,3)
do.call("rbind.data.frame", lapply(df, function(i) subset(x, day == i)))
# day time
# 5 3 4
# 6 3 5
# 7 4 4
# 8 4 2
# 1 1 2
# 2 1 2
# 3 2 2
# 4 2 3
# 71 4 4
# 81 4 2
# 11 1 2
# 21 1 2
# 51 3 4
# 61 3 5
The use of do.call("rbind.data.frame", ...) is prone to typical data.frame instantiation, meaning if your real data has any columns of type character, you will likely want to do
do.call("rbind.data.frame", c(lapply(df, function(i) subset(x, day == i)), stringsAsFactors = FALSE))
Also, it could easily be replaced (without the risk of factors) with data.table::rbindlist or dplyr::bind_rows.

If I understand correctly, you can do this in a pretty straight forward manner with data.table():
library(data.table)
df <- fread(text = "day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2", header = TRUE)
seqs <- data.table(day = c(3,4,1,2,4,1,3))
df[seqs, on = "day"]
#> day time
#> 1: 3 4
#> 2: 3 5
#> 3: 4 4
#> 4: 4 2
#> 5: 1 2
#> 6: 1 2
#> 7: 2 2
#> 8: 2 3
#> 9: 4 4
#> 10: 4 2
#> 11: 1 2
#> 12: 1 2
#> 13: 3 4
#> 14: 3 5
Created on 2019-02-10 by the reprex package (v0.2.1)

Related

By group relative order

I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!
A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))
You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3

Order values within column according to values within different column by group in R

I have the following panel data set:
group i f r d
1 4 8 3 3
1 9 4 5 1
1 2 2 2 2
2 5 5 3 2
2 3 9 3 3
2 9 1 3 1
I want to reorder column i in this data frame according to values in column d for each group. So the highest value for group 1 in column i should correspond to the highest value in column d. In the end my data.frame should look like this:
group i f r d
1 9 8 3 3
1 2 4 5 1
1 4 2 2 2
2 5 5 3 2
2 9 9 3 3
2 3 1 3 1
Here is a dplyr solution.
First, group by group. Then get the permutation rearrangement of column d in a temporary new column, ord and use it to reorder i.
library(dplyr)
df1 %>%
group_by(group) %>%
mutate(ord = order(d),
i = i[ord]) %>%
ungroup() %>%
select(-ord)
## A tibble: 6 x 5
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
original (wrong)
You can achieve this using dplyr and rank:
library(dplyr)
df1 %>% group_by(group) %>%
mutate(i = i[rev(rank(d))])
Edit
This question is actually trickier than it first seems and the original answer I posted is incorrect. The correct solution orders by i before subsetting by the rank of d. This gives OP's desired output which my previous answer did not (not paying attention!)
df1 %>% group_by(group) %>%
mutate(i = i[order(i)][rank(d)])
# A tibble: 6 x 5
# Groups: group [2]
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
There is some confusion regarding the expected output. Here I am showing a way to get both the versions of the output.
A base R using split and mapply
df$i <- c(mapply(function(x, y) sort(y)[x],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
Or another version
df$i <- c(mapply(function(x, y) y[order(x)],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
We can also use dplyr for this :
For 1st version
library(dplyr)
df %>%
group_by(group) %>%
mutate(i = sort(i)[d])
2nd version is already shown by #Rui using order
df %>%
group_by(group) %>%
mutate(i = i[order(d)])
An option with data.table
library(data.table)
setDT(df1)[, i := i[order(d)], group]
df1
# group i f r d
#1: 1 9 8 3 3
#2: 1 2 4 5 1
#3: 1 4 2 2 2
#4: 2 9 5 3 2
#5: 2 5 9 3 3
#6: 2 3 1 3 1
If we need the second version
setDT(df1)[, i := sort(i)[d], group]
data
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L), i = c(4L, 9L,
2L, 5L, 3L, 9L), f = c(8L, 4L, 2L, 5L, 9L, 1L), r = c(3L, 5L,
2L, 3L, 3L, 3L), d = c(3L, 1L, 2L, 2L, 3L, 1L)), class = "data.frame",
row.names = c(NA,
-6L))

Expand an R Column Values To Column Headers with Another Column's values

I'm trying to expand an R data table that looks like this:
a step_num duration
1 1 5
1 2 4
1 3 1
2 1 7
2 2 2
2 3 9
3 1 1
3 2 1
3 3 3
Into something that looks like this:
a | step_num | duration | 1_duration | 2_duration | 3_duration |
----------------------------------------------------------------
1 1 5 5 - -
1 2 4 - 4 -
1 3 1 - - 1
2 1 7 7 - -
2 2 2 - 2 -
2 3 9 - - 9
3 1 1 1 - -
3 2 1 - 1 -
3 3 3 - - 3
I'm wondering if there's an 'expand' function, so to speak, that would do this.
Thanks!
We can do this in base r.
cbind(df,
reshape(df, idvar = c("a","step_num"), timevar = "step_num", direction = "wide")[,-1])
#> a step_num duration duration.1 duration.2 duration.3
#> 1 1 1 5 5 NA NA
#> 2 1 2 4 NA 4 NA
#> 3 1 3 1 NA NA 1
#> 4 2 1 7 7 NA NA
#> 5 2 2 2 NA 2 NA
#> 6 2 3 9 NA NA 9
#> 7 3 1 1 1 NA NA
#> 8 3 2 1 NA 1 NA
#> 9 3 3 3 NA NA 3
Created on 2019-05-21 by the reprex package (v0.2.1)
Simple tidyverse solution:
library(tidyverse)
df %>%
mutate(step = step_num) %>%
spread(step, duration, fill = '-') %>%
rename_all( ~ gsub('(\\d+)', 'duration_\\1', .))
# a step_num duration_1 duration_2 duration_3
# 1 1 1 5 - -
# 2 1 2 - 4 -
# 3 1 3 - - 1
# 4 2 1 7 - -
# 5 2 2 - 2 -
# 6 2 3 - - 9
# 7 3 1 1 - -
# 8 3 2 - 1 -
# 9 3 3 - - 3
Or an option with dcast from data.table
library(data.table)
dcast(setDT(df), a + step_num ~
paste0("duration_", step_num), value.var = 'duration')
# a step_num duration_1 duration_2 duration_3
#1: 1 1 5 NA NA
#2: 1 2 NA 4 NA
#3: 1 3 NA NA 1
#4: 2 1 7 NA NA
#5: 2 2 NA 2 NA
#6: 2 3 NA NA 9
#7: 3 1 1 NA NA
#8: 3 2 NA 1 NA
#9: 3 3 NA NA 3
NOTE: It is better to have NA instead of - as NA is easily removable with is.na/complete.cases/na.omit and it wouldn't change the class of the column to character
data
df <- structure(list(a = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), step_num = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), duration = c(5L, 4L, 1L, 7L,
2L, 9L, 1L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
Here's an approach using dplyr and tidyr.
We take the original data and add on some columns by first adding a new column col which holds the column header we want, based on the step_num. Then we use tidyr::spread to put the durations into different columns depending on which col they go with. fill = "-" fills all the empty columns with dashes. Finally, we drop the a and step_num columns since they're already there in the original data and we don't want to have copies of them.
(Note, we needed step_num to still exist at the spread step, because we wanted to keep each row aligned with the original rows. Without step_num, the data would get spread into a wider, shorter format that would have misaligned rows.)
library(dplyr); library(tidyr)
df %>%
mutate(col = paste0(step_num, "_duration")) %>%
spread(col, duration, fill = "-") %>%
select(-a, -step_num)) %>%
bind_cols(df, .) # Edit, per excellent suggestion from M-M
a step_num duration 1_duration 2_duration 3_duration
1 1 1 5 5 - -
2 1 2 4 - 4 -
3 1 3 1 - - 1
4 2 1 7 7 - -
5 2 2 2 - 2 -
6 2 3 9 - - 9
7 3 1 1 1 - -
8 3 2 1 - 1 -
9 3 3 3 - - 3

Create edgelist for all interactions from data.frame

I am trying to do network analysis in igraph but having some issues with transforming the dataset I have into an edge list (with weights), given the differing amount of columns.
The data set looks as follows (df1) (much larger of course): First is the main operator id (main operator can also be partner and vice versa, so the Ids are staying the same in the edge list) The challenge is that the amount of partners varies (from 0 to 40) and every interaction has to be considered (not just "IdMain to IdPartnerX").
IdMain IdPartner1 IdPartner2 IdPartner3 IdPartner4 .....
1 4 3 7 6
2 3 1 NA NA
3 1 4 2 NA
4 9 6 3 NA
.
.
I already got the helpful tip to use reshape to do this, like:
data_melt <- reshape2::melt(data, id.vars = "IdMain")
edgelist <- data_melt[!is.na(data_melt$value), c("IdMain", "value")]
However, this only creates a 'directed' edgelist (from Main to Partners). What I need is something like below, where every interaction is recorded.
Id1 Id2
1 4
1 3
1 7
1 6
4 3
4 7
4 6
3 7
etc
Does anyone have a tip what the best way to go is? I also looked into the igraph library and couldn't find the function to do this.
There is no need for reshape(2) and melting etc. You just need to grap every combination of column pairs and then bind them together.
x <- read.table(text="IdMain IdPartner1 IdPartner2 IdPartner3 IdPartner4
1 4 3 7 6
2 3 1 NA NA
3 1 4 2 NA
4 9 6 3 NA", header=TRUE)
idx <- t(combn(seq_along(x), 2))
edgelist <- lapply(1:nrow(idx), function(i) x[, c(idx[i, 1], idx[i, 2])])
edgelist <- lapply(edgelist, setNames, c("ID1","ID2"))
edgelist <- do.call(rbind, edgelist)
edgelist <- edgelist[rowSums(is.na(edgelist))==0, ]
edgelist
# ID1 ID2
# 1 1 4
# 2 2 3
# 3 3 1
# 4 4 9
# 5 1 3
# 6 2 1
# 7 3 4
# 8 4 6
# 9 1 7
# 11 3 2
# 12 4 3
# 13 1 6
# 17 4 3
# 18 3 1
# 19 1 4
# 20 9 6
# 21 4 7
# 23 1 2
# 24 9 3
# 25 4 6
# 29 3 7 <--
# 31 4 2
# 32 6 3
# 33 3 6 <--
# 37 7 6 <--
Using the data below. You can achieve what looks to be your goal with apply and combn. This returns a list matrices with the pairwise comparison of the row element of your data.frame
myPairs <- apply(t(dat), 2, function(x) t(combn(x[!is.na(x)], 2)))
Note that the output of apply can be finicky and it is necessary here to have at least one row with an NA so that apply will return a list rather than a matrix.
If you want a data.frame at the end, use do.call and rbind to put the matrices together and then data.frame and setNames for the object coercion and to add names.
setNames(data.frame(do.call(rbind, myPairs)), c("Id1", "Id2"))
Id1 Id2
1 1 4
2 1 3
3 1 7
4 1 6
5 4 3
6 4 7
7 4 6
8 3 7
9 3 6
10 7 6
11 2 3
12 2 1
13 3 1
14 3 1
15 3 4
16 3 2
17 1 4
18 1 2
19 4 2
20 4 9
21 4 6
22 4 3
23 9 6
24 9 3
25 6 3
data
dat <-
structure(list(IdMain = 1:4, IdPartner1 = c(4L, 3L, 1L, 9L),
IdPartner2 = c(3L, 1L, 4L, 6L), IdPartner3 = c(7L, NA, 2L,
3L), IdPartner4 = c(6L, NA, NA, NA)), .Names = c("IdMain",
"IdPartner1", "IdPartner2", "IdPartner3", "IdPartner4"),
class = "data.frame", row.names = c(NA, -4L))

Add sequence along blocks [duplicate]

This question already has answers here:
Create counter within consecutive runs of values
(3 answers)
Closed 5 years ago.
I would like to have a sequence along each Blocks as such:
Blocks MySeq
1 1
1 2
2 1
2 2
1 1
1 2
1 3
1 4
3 1
3 2
3 3
4 1
4 2
4 3
4 4
Based on this I have try
myDf %>% dplyr::mutate(MySeq= seq(1:length(unique(Blocks)),rle(Blocks)$"lengths")
However, the sequence is not resetting with each new block. See below:
Blocks MySeq
1 1
1 2
2 1
2 2
1 3
1 4
1 5
1 6
3 1
3 2
3 3
4 1
4 2
4 3
4 4
How can I make a new sequence from each individual Blocks?
Try this
unlist(sapply(rle(df1$Blocks)$lengths,seq_len))
We can use rleid from data.table by grouping the rleid of 'Blocks' and assign (:=) 'MySeq' as the sequence of rows.
library(data.table)
setDT(df1)[, MySeq := seq_len(.N) , .(rleid(Blocks))]
df1
# Blocks MySeq
# 1: 1 1
# 2: 1 2
# 3: 2 1
# 4: 2 2
# 5: 1 1
# 6: 1 2
# 7: 1 3
# 8: 1 4
# 9: 3 1
#10: 3 2
#11: 3 3
#12: 4 1
#13: 4 2
#14: 4 3
#15: 4 4
Or if we are using base R, then sequence of lengths will get the expected output
sequence(rle(df1$Blocks)$lengths)
#[1] 1 2 1 2 1 2 3 4 1 2 3 1 2 3 4
data
df1 <- structure(list(Blocks = c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 4L, 4L, 4L, 4L)), .Names = "Blocks", row.names = c(NA,
-15L), class = "data.frame")

Resources