Duplicating data frame rows by freq value in same data frame [duplicate] - r

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 7 years ago.
I have a data frame with names by type and their frequencies. I'd like to expand this data frame so that the names are repeated according to their name-type frequency.
For example, this:
> df = data.frame(name=c('a','b','c'),type=c(0,1,2),freq=c(2,3,2))
name type freq
1 a 0 2
2 b 1 3
3 c 2 2
would become this:
> df_exp
name type
1 a 0
2 a 0
3 b 1
4 b 1
5 b 1
6 c 2
7 c 2
Appreciate any suggestions on a easy way to do this.

You can just use rep to "expand" your data.frame rows:
df[rep(sequence(nrow(df)), df$freq), c("name", "type")]
# name type
# 1 a 0
# 1.1 a 0
# 2 b 1
# 2.1 b 1
# 2.2 b 1
# 3 c 2
# 3.1 c 2
And there's a function expandRows in the splitstackshape package that does exactly this. It also has the option to accept a vector specifying how many times to replicate each row, for example:
expandRows(df, "freq")

Related

Combining the rows of a dataframe where each row is a df itself [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 1 year ago.
I have an object with each row being a dataframe or list itself like this:
[[1]]
1: a b c d
1 1 2 4
[[2]]
1: a b c d
4 3 6 2
[[3]]
1: a b c d
1 2 2 1
How can I transform this to a dataframe like below?
a b c d
1 1 2 4
4 3 6 2
1 2 2 1
We can use rbindlist
library(data.table)
rbindlist(lst1)
Or with rbind and do.call in base R
do.call(rbind, lst1)

How to give each instance its own row in a data frame? [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
How is it possible to transform this data frame so that the count is divided into separate observations?
df = data.frame(object = c("A","B", "A", "C"), count=c(1,2,3,2))
object count
1 A 1
2 B 2
3 A 3
4 C 2
So that the resulting data frame looks like this?
object observation
1 A 1
2 B 1
3 B 1
4 A 1
5 A 1
6 A 1
7 C 1
8 C 1
rep(df$object, df$count)
If you want the 2 columns:
df2 = data.frame(object = rep(df$object, df$count))
df2$count = 1
If you're working with tidyverse - otherwise that's overkill -, you could also do:
library(tidyverse)
uncount(df, count) %>% mutate(observation = 1)
Using data.table:
library(data.table)
setDF(df)[rep(seq_along(count), count), .(object, count = 1L)]
object count
1: A 1
2: B 1
3: B 1
4: A 1
5: A 1
6: A 1
7: C 1
8: C 1

Repeat rows with a variable in r [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have a data.frame with n rows and I would like to repeat this rows according to the observation of another variable
This is an example for a data.frame
df <- data.frame(a=1:3, b=letters[1:2])
df
a b
1 1 a
2 2 b
3 3 c
And this one is an example for a variable
df1 <- data.frame(x=1:3)
df1
x
1 1
2 2
3 3
In the next step I would like to repeat every row from the df with the observation of df1
So that it would look like this
a b
1 1 a
2 2 b
3 2 b
4 3 c
5 3 c
6 3 c
If you have any idea how to solve this problem, I would be very thankful
You simply can repeat the index like:
df[rep(1:3,df1$x),]
# a b
#1 1 a
#2 2 b
#2.1 2 b
#3 3 c
#3.1 3 c
#3.2 3 c
or not fixed to size 3
df[rep(seq_along(df1$x),df1$x),]

Frequency of Characters in Strings as columns in data frame using R

I have a data frame initial of the following format
> head(initial)
Strings
1 A,A,B,C
2 A,B,C
3 A,A,A,A,A,B
4 A,A,B,C
5 A,B,C
6 A,A,A,A,A,B
and the data frame I want is final
> head(final)
Strings A B C
1 A,A,B,C 2 1 1
2 A,B,C 1 1 1
3 A,A,A,A,A,B 5 1 0
4 A,A,B,C 2 1 1
5 A,B,C 1 1 1
6 A,A,A,A,A,B 5 1 0
to generate the data frames the following codes can be used to keep the number of rows high
initial<-data.frame(Strings=rep(c("A,A,B,C","A,B,C","A,A,A,A,A,B"),100))
final<-data.frame(Strings=rep(c("A,A,B,C","A,B,C","A,A,A,A,A,B"),100),A=rep(c(2,1,5),100),B=rep(c(1,1,1),100),C=rep(c(1,1,0),100))
What is the fastest way I can achieve this? Any help will be greatly appreciated
We can use base R methods for this task. We split the 'Strings' column (strsplit(...)), set the names of the output list with the sequence of rows, stack to convert to data.frame with key/value columns, get the frequency with table, convert to 'data.frame' and cbind with the original dataset.
cbind(df1, as.data.frame.matrix(
table(
stack(
setNames(
strsplit(as.character(df1$Strings),','), 1:nrow(df1))
)[2:1])))
# Strings A B C D
#1 A,B,C,D 1 1 1 1
#2 A,B,B,D,D,D 1 2 0 3
#3 A,A,A,A,B,C,D,D 4 1 1 2
or we can use mtabulate after splitting the column.
library(qdapTools)
cbind(df1, mtabulate(strsplit(as.character(df1$Strings), ',')))
# Strings A B C D
#1 A,B,C,D 1 1 1 1
#2 A,B,B,D,D,D 1 2 0 3
#3 A,A,A,A,B,C,D,D 4 1 1 2
Update
For the new dataset 'initial', the second method works. If we need to use the first method with the correct order, convert to factor class with levels specified as the unique elements of 'ind'.
df1 <- stack(setNames(strsplit(as.character(initial$Strings), ','),
seq_len(nrow(initial))))
df1$ind <- factor(df1$ind, levels=unique(df1$ind))
cbind(initial, as.data.frame.matrix(table(df1[2:1])))

R converting from short form to long form with counts in the short form [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Reshaping data.frame from wide to long format
(8 answers)
Closed 4 years ago.
I have a large table (~100M row and 28 columns) in the below format:
ID A B C
1 2 0 1
2 0 1 0
3 0 1 2
4 1 0 0
Columns besides ID (which is unique) gives the counts for each type (i.e. A,B,C). I would like to convert this to the below long form.
ID Type
1 A
1 A
1 C
2 B
3 B
3 C
3 C
4 A
I also would like to use data table (rather than data frame) given the size of my data set. I checked reshape2 package in R regarding converting between long and short form however I am not clear if melt function would allow me to have counts in the short form as above.
Any suggestions on how I can convert this in a fast and efficient way in R using reshape2 and/or data.table?
Update
You can try the following:
DT[, rep(names(.SD), .SD), by = ID]
# ID V1
# 1: 1 A
# 2: 1 A
# 3: 1 C
# 4: 2 B
# 5: 3 B
# 6: 3 C
# 7: 3 C
# 8: 4 A
Keeps the order you want too...
You can try the following. I've never used expandRows on what would become ~ 300 million rows, but it's basically rep, so it shouldn't be slow.
This uses melt + expandRows from my "splitstackshape" package. It works with data.frames or data.tables, so you might as well use data.table for the faster melting....
library(reshape2)
library(splitstackshape)
expandRows(melt(mydf, id.vars = "ID"), "value")
# The following rows have been dropped from the input:
#
# 2, 3, 5, 8, 10, 12
#
# ID variable
# 1 1 A
# 1.1 1 A
# 4 4 A
# 6 2 B
# 7 3 B
# 9 1 C
# 11 3 C
# 11.1 3 C

Resources