R - dplyr map slice for repeat rows

R - dplyr map slice for repeat rows - r

I have trouble combining slice and map.
I am interested of doing something similar to this; which is, in my case, transforming a compact person-period file to a long (sequential) person-period one. However, because my file is too big, I need to split the data first.
My data look like this
group id var ep dur
1 A 1 a 1 20
2 A 1 b 2 10
3 A 1 a 3 5
4 A 2 b 1 5
5 A 2 b 2 10
6 A 2 b 3 15
7 B 1 a 1 20
8 B 1 a 2 10
9 B 1 a 3 10
10 B 2 c 1 20
11 B 2 c 2 5
12 B 2 c 3 10
What I need is simply this (answer from this)
library(dplyr)
dt %>% slice(rep(1:n(),.$dur))
However, I am interested in introducing a split(.$group).
How I am suppose to do so ?
dt %>% split(.$group) %>% map_df(slice(rep(1:n(),.$dur)))
Is not working for example.
My desired output is the same as dt %>% slice(rep(1:n(),.$dur))
which is
group id var ep dur
1 A 1 a 1 20
2 A 1 a 1 20
3 A 1 a 1 20
4 A 1 a 1 20
5 A 1 a 1 20
6 A 1 a 1 20
7 A 1 a 1 20
8 A 1 a 1 20
9 A 1 a 1 20
10 A 1 a 1 20
.....
But I need to split this operation because the file is too big.
data
dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), ep = structure(c(1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2",
"3"), class = "factor"), dur = c(20, 10, 5, 5, 10, 15, 20,
10, 10, 20, 5, 10)), .Names = c("group", "id", "var", "ep",
"dur"), row.names = c(NA, -12L), class = "data.frame")

map takes two arguments: a vector/list in .x and a function in .f. It then applies .f on all elements in .x.
The function you are passing to map is not formatted correctly. Try this:
f <- function(x) x %>% slice(rep(1:n(), .$dur))
dt %>%
split(.$group) %>%
map_df(f)
You could also use it like this:
dt %>%
split(.$group) %>%
map_df(slice, rep(1:n(), dur))
This time you directly pass the slice function to map with additional parameters.

I'm not quite sure what your desired final output is, but you could use tidyr to nest the data that you want to repeat and a simple function to expand levels of your nested data, very similar to Tutuchan's answer.
expand_df <- function(df, repeats) {
df %>% slice(rep(1:n(), repeats))
}
dt %>%
tidyr::nest(var:ep) %>%
mutate(expanded = purrr::map2(data, dur, expand_df)) %>%
select(-data) %>%
tidyr::unnest()
Tutuchan's answer gives exactly the same output as your original approach - is that what you were looking for? I don't know if it will have any advantage over your original method.

Related

How to add column reporting sum of couple of subsequent rows

I have the following dataset
structure(list(Var1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("0", "1"), class = "factor"), Var2 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("congruent", "incongruent"
), class = "factor"), Var3 = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("spoken", "written"), class = "factor"),
Freq = c(8L, 2L, 10L, 2L, 10L, 2L, 10L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
I would like to add another column reporting sum of coupled subsequent rows. Thus the final result would look like this:
I have proceeded like this
Table = as.data.frame(table(data_1$unimodal,data_1$cong_cond, data_1$presentation_mode)) %>%
mutate(Var1 = factor(Var1, levels = c('0', '1')))
row = Table %>% #is.factor(Table$Var1)
summarise(across(where(is.numeric),
~ .[Var1 == '0'] + .[Var1 == '1'],
.names = "{.col}_sum"))
column = c(rbind(row$Freq_sum,rep(NA, 4)))
Table$column = column
But I am looking for the quickest way possible with no scripting separated codes. Here I have used the dplyr package, but if you might know possibly suggest some other ways with map(), for loop, and or the method you deem as the best, please just let me know.

This should do:
df$column <-
rep(colSums(matrix(df$Freq, 2)), each=2) * c(1, NA)

If you are fine with no NAs in the dataframe, you can
df %>%
group_by(Var2, Var3) %>%
mutate(column = sum(Freq))
# A tibble: 8 × 5
# Groups: Var2, Var3 [4]
Var1 Var2 Var3 Freq column
<fct> <fct> <fct> <int> <int>
1 0 congruent spoken 8 10
2 1 congruent spoken 2 10
3 0 incongruent spoken 10 12
4 1 incongruent spoken 2 12
5 0 congruent written 10 12
6 1 congruent written 2 12
7 0 incongruent written 10 12
8 1 incongruent written 2 12

subsetting data based with the condition of the current and previous entity in r

I have data with the status column. I want to subset my data to the condition of 'f' status, and previous condition of 'f' status.
to simplify:
df
id status time
1 n 1
1 n 2
1 f 3
1 n 4
2 f 1
2 n 2
3 n 1
3 n 2
3 f 3
3 f 4
my result should be:
id status time
1 n 2
1 f 3
2 f 1
3 n 2
3 f 3
3 f 4
How can I do this in R?

Here's a solution using dplyr -
df %>%
group_by(id) %>%
filter(status == "f" | lead(status) == "f") %>%
ungroup()
# A tibble: 6 x 3
id status time
<int> <fct> <int>
1 1 n 2
2 1 f 3
3 2 f 1
4 3 n 2
5 3 f 3
6 3 f 4
Data -
df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
status = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L), .Label = c("f", "n"), class = "factor"), time = c(1L,
2L, 3L, 4L, 1L, 2L, 1L, 2L, 3L, 4L)), .Names = c("id", "status",
"time"), class = "data.frame", row.names = c(NA, -10L))

How to calculate the amount of different 'combinations of classes' of a variable depending on a other variable?

CustomerID MarkrtungChannel OrderID
1 A 1
2 B 2
3 A 3
4 B 4
5 C 5
1 C 6
1 A 7
2 C 8
3 B 9
3 B 10
Hi, I want to know which combinations of marketing channels are used by how many customers .
How can I calculate this with R?
E.g. The combination of Marketing channels A and C is used by 1 customer (ID 1)
the combination of Marketing channels C and B is also used by 1 customer (ID 2)
And so on...

and here's a tidyverse way.
library(tidyverse)
data.df%>%
group_by(CustomerID)%>%
summarize(combo=paste0(sort(unique(MarkrtungChannel)),collapse=""))%>%
ungroup()%>%
group_by(combo)%>%
summarize(n.users=n())
counting the number of people using each combo at the end.

You can do it multiple ways. Here is data.table way:
# Here is your data
df<-structure(list(CustomerID = c(1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L,
3L, 3L), MarkrtungChannel = structure(c(1L, 2L, 1L, 2L, 3L, 3L,
1L, 3L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor"),
OrderID = 1:10), .Names = c("CustomerID", "MarkrtungChannel",
"OrderID"), class = "data.frame", row.names = c(NA, -10L))
df[]<-lapply(df[],as.character)
# Here is the combination field
library(data.table)
setDT(df)
df[,Combo:=.(list(unique(MarkrtungChannel))), by=CustomerID]
# Or (to get the combination counts)
df[,list(combo=(list(unique(MarkrtungChannel)))), by=CustomerID][,uniqueN(CustomerID),by=combo]

R - how to avoid repeating filter & row bind

Because I am working on a very large dataset, I need to slice my dataset by groups in order to pursue my computations.
I have a person-period (melt) dataset that looks like this
group id var time
1 A 1 a 1
2 A 1 b 2
3 A 1 a 3
4 A 2 b 1
5 A 2 b 2
6 A 2 b 3
7 B 1 a 1
8 B 1 a 2
9 B 1 a 3
10 B 2 c 1
11 B 2 c 2
12 B 2 c 3
I need to do this simple transformation
library(reshape2)
library(dplyr)
dt %>% dcast(group + id ~ time, value.var = 'var')
In order to get
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
So far, so good.
However, because my database is too big, I need to do this separately for each different groups, such as
a = dt %>% filter(group == 'A') %>% dcast(group + id ~ time, value.var ='var')
b = dt %>% filter(group == 'B') %>% dcast(group + id ~ time, value.var = 'var')
bind_rows(a,b)
My problem is that I would like to avoid doing it by hand. I mean, having to store separately each groups, a = ..., b = ..., c = ..., and so on
Any idea how I could have a single pipe stream that would separate each group, compute the transformation and put it back together in a dataframe ?
dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), time = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor")), .Names = c("group", "id",
"var", "time"), row.names = c(NA, -12L), class = "data.frame")

Package purrr can be useful for working with lists. First split the dataset by group and then use map_df to dcast each list but return everything in a single data.frame.
library(purrr)
dt %>%
split(.$group) %>%
map_df(~dcast(.x, group + id ~ time, value.var = "var"))
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c

lapply is your friend here:
do.call(rbind, lapply(unique(dt$Group), function(grp, dt){
dt %>% filter(Group == grp) %>% dcast(group + id ~ time, value.var = "var")
}, dt = dt))

Finding the max number of occurrences from the available result

I have a dataframe which looks like -
Id Result
A 1
B 2
C 1
B 1
C 1
A 2
B 1
B 2
C 1
A 1
B 2
Now I need to calculate how many 1's and 2's are there for each Id and then select the number whose frequency of occurrence is the greatest.
Id Result
A 1
B 2
C 1
How can I do that? I have tried using the table function in some way but not able to use it effectively. Any help would be appreciated.

Here you can use aggregate in one step:
df <- structure(list(Id = structure(c(1L, 2L, 3L, 2L, 3L, 1L, 2L, 2L,
3L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor"),
Result = c(1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L)),
.Names = c("Id", "Result"), class = "data.frame", row.names = c(NA, -11L)
)
res <- aggregate(Result ~ Id, df, FUN=function(x){which.max(c(sum(x==1), sum(x==2)))})
res
Result:
Id Result
1 A 1
2 B 2
3 C 1

With data.table you can try (df is your data.frame):
require(data.table)
dt<-as.data.table(df)
dt[,list(times=.N),by=list(Id,Result)][,list(Result=Result[which.max(times)]),by=Id]
# Id Result
#1: A 1
#2: B 2
#3: C 1

Using dplyr, you can try
library(dplyr)
df %>% group_by(Id, Result) %>% summarize(n = n()) %>% group_by(Id) %>%
filter(n == max(n)) %>% summarize(Result = Result)
Id Result
1 A 1
2 B 2
3 C 1

An option using table and ave
subset(as.data.frame(table(df1)),ave(Freq, Id, FUN=max)==Freq, select=-3)
# Id Result
# 1 A 1
# 3 C 1
# 5 B 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - dplyr map slice for repeat rows - r

Related

How to add column reporting sum of couple of subsequent rows

subsetting data based with the condition of the current and previous entity in r

How to calculate the amount of different 'combinations of classes' of a variable depending on a other variable?

R - how to avoid repeating filter & row bind

Finding the max number of occurrences from the available result

Categories

Resources