r create observations from frequency counts [duplicate]

r create observations from frequency counts [duplicate] - r

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed yesterday.
I have frequency counts based on three variables y , Col1, Col2 as shown below
Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5
How do I expand this table such that the dataset has number of rows, as indicated in column n for combination of responses in Col1, Col2 & y ?
For example the dataset should have 13 rows of Col1=Good, Col2=Rich, y=1, 8 rows of Col1=Good, Col2=Rich, y=0 so on.

You could use uncount:
tidyr::uncount(df,n)
Col1 Col2 y
1 Good Rich 1
2 Good Rich 1
3 Good Rich 1
4 Good Rich 1
5 Good Rich 1
6 Good Rich 1
7 Good Rich 1
8 Good Rich 1
9 Good Rich 1
: : : :
: : : :
The question is why do you need this? You do realize you can still analyze the data the way it is before the counts. What if there were millions of counts for each row? It will not be wise to uncount the data.

Use rep to repeat the row names and subset with its result.
In the first example below I explicitly create an index i, in the second a one-liner solves the problem.
Also, in the first example the output duplicates (as asked for) rows and the row names show which rows are duplicates of which. In the second example by setting the row names to NULL they are recreated to become consecutive numbers starting at 1.
df1 <- "Col1 Col2 y n
Good Poor 0 0
Good Poor 1 0
Good Rich 1 13
Good Rich 0 8
Bad Poor 0 8
Bad Poor 1 0
Bad Rich 1 15
Bad Rich 0 5"
df1 <- read.table(text = df1, header = TRUE)
i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#> Col1 Col2 y n
#> 3 Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13
df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#> Col1 Col2 y n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13
Created on 2023-02-19 with reprex v2.0.2

Here is an alternative using expandRows function from splitstackshape package:
library(splitstackshape)
expandRows(df, "n")
Col1 Col2 y
3 Good Rich 1
3.1 Good Rich 1
3.2 Good Rich 1
3.3 Good Rich 1
3.4 Good Rich 1
3.5 Good Rich 1
3.6 Good Rich 1
3.7 Good Rich 1
3.8 Good Rich 1
....

Related

A "special" case of merging in R

I am a quite unexperienced R user facing the following problem:
I would like to merge two data tables dt1 and dt2.
dt1 contains 1 variable entitled Assessment.
dt2 contains 2 variables entitled ID and Frequency.
Now, I would like to have also the Assessment observations in dt2.
For simplicity, consider this example:
library(dplyr)
library(data.table)
dt1 <- data.table(c("perfect", "perfect", "okay", "unsufficient", "good", "good", "okay", "perfect"))
colnames(dt1) <- "Assessment"
dt2 <- data.table(cbind(c(1,2,3,4,5,6),c(1,3,1,1,1,1)))
colnames(dt2) <- c("ID", "Frequency")
Hence, dt1 looks like that:
Assessment
perfect
perfect
okay
unsufficient
good
good
okay
perfect
dt2 looks like that:
ID
Frequency
1
1
2
3
3
1
4
1
5
1
6
1
My aim would be to get something like:
ID
Frequency
Assessment
1
1
perfect
2
3
perfect;okay;unsufficient
3
1
good
4
1
good
5
1
okay
6
1
perfect
I do not have any idea how to come here and would appreciate each help very much! Thanks a lot!

dt1 %>%
bind_cols(
dt2 %>%
uncount(Frequency)
) %>%
group_by(ID) %>%
summarise(Assessment = paste0(Assessment,collapse = ";"))
# A tibble: 6 x 2
ID Assessment
<dbl> <chr>
1 1 perfect
2 2 perfect;okay;unsufficient
3 3 good
4 4 good
5 5 okay
6 6 perfect

If you trust the right order, as you say in OP, you can rep.int the IDs according to their frequencies.
dt2[dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect, okay, unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
or
dt2[dt1[, list(Assessment=list(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect,okay,unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
The difference is, in second version Assessment is a list column.
Note: if dt2 doesn't contain anything else, there's no need to merge anymore and it simplifies to
dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))]
# ID Assessment
# 1: 1 perfect
# 2: 2 perfect, okay, unsufficient
# 3: 3 good
# 4: 4 good
# 5: 5 okay
# 6: 6 perfect

How many times does the value for column B appear for a value in column A?

I am having the hardest time coming up with a code that lets me match a topic (Column B) to a name (Column A) and create a frequency column for the times B has matched with A (or how many times both have appeared together). Col A and B are codes for longer names.
I thought maybe using the count function from plyr but cant make it work. Maybe you can give me an idea of what I could use for a code?
For example I have a table:
**Col A
Col B**
1
38
1
6
1
38
2
38
2
7
2
7
2
8
2
7
The result that I am looking for is
**Col A
Col B
freq**
1
38
2
1
6
1
2
38
1
2
7
3
2
8
1
So the number 38 has appeared in "1" two times. 6 has appeared one time. and so on.
I have 600 rows of data and cant come up with a useful or even a close call code.
Thank you so much for your help!

Summarise and count using dplyr:
library(dplyr)
df2 <- df %>%
group_by(col1, col2) %>%
summarise(count = n()) %>%
ungroup()
returns:
col1 col2 count
<dbl> <dbl> <int>
1 1 6 1
2 1 38 2
3 2 7 3
4 2 8 1
5 2 38 1

R: Matching and repeating occurence [duplicate]

This question already has answers here:
Complete dataframe with missing combinations of values
(2 answers)
Closed 2 years ago.
(sample code below) I have two data sets. One is a library of products, the other is customer id, date and viewed product and another detail.I want to get a merge where I see per each id AND date all the library of products as well as where the match was. I have tried using full_join and merge and right and left joins, but they do not repeat the rows. below is the sample of what i am trying to achieve.
id=c(1,1,1,1,2,2)
date=c(1,1,2,2,1,3)
offer=c('a','x','y','x','y','a')
section=c('general','kitchen','general','general','general','kitchen')
t=data.frame(id,date,offer,section)
offer=c('a','x','y','z')
library=data.frame(offer)
######
t table
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 2 y general
4 1 2 x general
5 2 1 y general
6 2 3 a kitchen
library table
offer
1 a
2 x
3 y
4 z
and i want to get this:
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 1 y NA
4 1 1 z general
...
(there would have to be 6*4 observations)
I realize because I match by offer it is not going to repeat the values like so, but what is another option to do that? Thanks a lot!!

You can use complete to get all combinations of library$offer for each id and date.
tidyr::complete(t, id, date, offer = library$offer)
# A tibble: 24 x 4
# id date offer section
# <dbl> <dbl> <chr> <chr>
# 1 1 1 a general
# 2 1 1 x kitchen
# 3 1 1 y NA
# 4 1 1 z NA
# 5 1 2 a NA
# 6 1 2 x general
# 7 1 2 y general
# 8 1 2 z NA
# 9 1 3 a NA
#10 1 3 x NA
# … with 14 more rows

You can use tidyr and dplyr to get the data. The crossing() function will create all combinations of the variables you pass in
library(dplyr)
library(tidyr)
t %>%
select(id, date) %>%
{crossing(id=.$id, date=.$date, library)} %>%
left_join(t)

gather() per grouped variables in R for specific columns

I have a long data frame with players' decisions who worked in groups.
I need to convert the data in such a way that each row (individual observation) would contain all group members decisions (so we basically can see whether they are interdependent).
Let's say the generating code is:
group_id <- c(rep(1, 3), rep(2, 3))
player_id <- c(rep(seq(1, 3), 2))
player_decision <- seq(10,60,10)
player_contribution <- seq(6,1,-1)
df <-
data.frame(group_id, player_id, player_decision, player_contribution)
So the initial data looks like:
group_id player_id player_decision player_contribution
1 1 1 10 6
2 1 2 20 5
3 1 3 30 4
4 2 1 40 3
5 2 2 50 2
6 2 3 60 1
But I need to convert it to wide per each group, but only for some of these variables, (in this example specifically for player_contribution, but in such a way that the rest of the data remains. So the head of the converted data would be:
data.frame(group_id=c(1,1),
player_id=c(1,2),
player_decision=c(10,20),
player_1_contribution=c(6,6),
player_2_contribution=c(5,5),
player_3_contribution=c(4,6)
)
group_id player_id player_decision player_1_contribution player_2_contribution player_3_contribution
1 1 1 10 6 5 4
2 1 2 20 6 5 6
I suspect I need to group_by in dplyr and then somehow gather per group but only for player_contribution (or a vector of variables). But I really have no clue how to approach it. Any hints would be welcome!

Here is solution using tidyr and dplyr.
Make a dataframe with the columns for the players contributions. Then join this dataframe back onto the columns of interest from the original Dataframe.
library(tidyr)
library(dplyr)
wide<-pivot_wider(df, id_cols= - player_decision,
names_from = player_id,
values_from = player_contribution,
names_prefix = "player_contribution_")
answer<-left_join(df[, c("group_id", "player_id", "player_decision") ], wide)
answer
group_id player_id player_decision player_contribution_1 player_contribution_2 player_contribution_3
1 1 1 10 6 5 4
2 1 2 20 6 5 4
3 1 3 30 6 5 4
4 2 1 40 3 2 1
5 2 2 50 3 2 1
6 2 3 60 3 2 1

Match dataframe rows according to two variables (Indexing)

I am essentially trying to get disorganized data into long form for linear modeling.
I have 2 data.frames "rec" and "book"
Each row in "book" needs to be pasted onto the end of several of the rows of "rec" according to two variables in the row: "MRN" and "COURSE" which match.
I have tried the following and variations thereon to no avail:
i=1
newlist=list()
colnames(newlist)=colnames(book)
for ( i in 1:dim(rec)[1]) {
mrn=as.numeric(as.vector(rec$MRN[i]));
course=as.character(rec$COURSE[i]);
get.vector<-as.vector(((as.numeric(as.vector(book$MRN))==mrn) & (as.character(book$COURSE)==course)))
newlist[i]<-book[get.vector,]
i=i+1;
}
If anyone has any suggestions on
1)getting this to work
2) making it more elegant (or perhaps just less clumsy)
If I have been unclear in any way I beg your pardons.
I do understand I haven't combined any data above, I think if I can generate a long-format data.frame I can combine them all on my own

Sounds like you need to merge the two data-frames. Try this:
merge(rec, book, by = c('MRN', 'COURSE'))
and do read the help for merge (by doing ?merge at the R console) for more options on how to merge these.

I've created a simple example that may help you. In my case i wanted to paste the 'value' column from df1 in each row of df2, according to variables x1 and x2:
df1 <- read.table(textConnection("
x1 x2 value
1 2 12
1 3 56
2 1 35
2 2 68
"),header=T)
df2 <- read.table(textConnection("
test x1 x2
1 1 2
2 1 3
3 2 1
4 2 2
5 1 2
6 1 3
7 2 1
"),header=T)
library(sqldf)
sqldf("select df2.*, df1.value from df2 join df1 using(x1,x2)")
test x1 x2 value
1 1 1 2 12
2 2 1 3 56
3 3 2 1 35
4 4 2 2 68
5 5 1 2 12
6 6 1 3 56
7 7 2 1 35

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r create observations from frequency counts [duplicate] - r

Here is an alternative using expandRows function from splitstackshape package: library(splitstackshape) expandRows(df, "n") Col1 Col2 y 3 Good Rich 1 3.1 Good Rich 1 3.2 Good Rich 1 3.3 Good Rich 1 3.4 Good Rich 1 3.5 Good Rich 1 3.6 Good Rich 1 3.7 Good Rich 1 3.8 Good Rich 1 ....

Related

A "special" case of merging in R

How many times does the value for column B appear for a value in column A?

R: Matching and repeating occurence [duplicate]

gather() per grouped variables in R for specific columns

Match dataframe rows according to two variables (Indexing)

Categories

Resources