R is it possible to get the output of table() using dcast? [duplicate] - r

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have the following data frame:
id<-c(1,2,3,4,1,1,2,3,4,4,2,2)
period<-c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df<-data.frame(id,period)
typing
table(df)
results in
period
id calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
Is there any way to get the same result using 'dcast' and save it as a new data frame?

Yes, there is a way:
library(reshape2)
dcast(df, id ~ period, length)
Using period as value column: use value.var to override.
id calib first valid
1 1 1 2 0
2 2 2 0 2
3 3 0 0 2
4 4 1 1 1
You can also type just dcast(df, id ~ period) and length will be chosen by default too. As I can see, you tried to find this out in your another question. Extended solution without dcast would look like this:
df <- data.frame(unclass(table(df)))
df$ID <- rownames(df)
df
calib first valid ID
1 1 2 0 1
2 2 0 2 2
3 0 0 2 3
4 1 1 1 4

Related

How do you prepare longitudinal data for survival analysis with various specifications?

I have a question regarding longitudinal study analysis and work with R.
I have the following data format:
ID Visit Behaviour Distance_to_first_visit_in_month
1 0 1 0
1 1 1 6
1 2 1 12
1 3 1 50
2 0 3 0
2 1 3 8
2 2 3 16
2 3 3 25
2 4 3 40
2 5 3 60
3 0 1 0
3 1 1 6
3 2 1 12
3 3 3 24
3 4 3 30
3 5 3 55
I need the data in the following format:
ID Visit Behaviour Distance_to_first_visit_in_month Status
1 0 1 0 0
2 0 3 0 1
3 3 3 24 1
If a person has 1 every time until the end he should be only censored because the study is finished. If a person has 3 for the first time I need the Distance_to_to_first_visit_in_month because there he has the status 1 in the Kapplan-Meyer curve.
I tried to filter the maximal Distance_to_first_visit_in_month and get the Behaviour. When I bring the data to the wide format it is easy to get those. But I can't get the Distance_to_first_visit_in_month when the person 3 as Behaviour at the beginning or when otherwise.
I have 300IDs with sometimes 11 visits so I can't prepare the data manuell.
Do you have an idea?
Thanks you in advance.
Best Christina
As you don't explain how to aggregate your data to the second dataset, I can only show you how to get the ID's that match your conditions and how to implement the status variable. See this example:
library(dplyr)
# get id's with only 1
id_list1 <- lapply(df %>% split(.$ID),function(x){
if(unique(x$ID)==1){
return(unique(x$ID))
}
}) %>%
unlist()
# get id's with 3 as first value
id_list3 <- lapply(df %>% split(.$ID),function(x){
if(x[x$Visit==0,"Behaviour"]==3){
return(unique(x$ID))
}
}) %>%
unlist()
df %>%
mutate(Status = ifelse(ID %in% id_list3,1,0)) %>%
mutate(new_dist = ifelse(!ID %in% id_list3,Distance_to_first_visit_in_month,NA))
Please note that you'll get named vectors in id_list1 and id_list3. There are no duplicates, just the name of the element matching the element.
And do you mean Visit number 0 with "at the beginning"? Otherwise you'll have to adjust x$Visit==0.

how to change my dataframe based on value of a column [duplicate]

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 3 years ago.
there is a dataframe with two column as below,and i want to change it into a dataframe with 3 column
df <- data.frame(key=c('a','a','a','b','b'),value=c(1,2,2,1,3))
I have tried it in python,that's ok,but in r i have no idea
the expect output should be like
1 2 3
a 1 2 0
b 1 0 1
library(data.table)
dcast(key~value, data=df, fun.aggregate=length)
# key 1 2 3
# 1 a 1 2 0
# 2 b 1 0 1

Build rowSums in dplyr based on columns containing pattern in their names [duplicate]

This question already has answers here:
Sum across multiple columns with dplyr
(8 answers)
R, create a new column in a data frame that applies a function of all the columns with similar names
(3 answers)
Closed 4 years ago.
My data frame looks something like this
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3
A 1 0 1 1
A 2 1 1 2
A 3 3 0 0
With dplyr I want to build a columns that sums the values of the count-variables for each row, selecting the count-variables based on their name.
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
A 1 0 1 1 2
A 2 1 1 2 4
A 3 3 0 0 3
How do I do that?
As you asked for a dplyr solution, you can do:
library(dplyr)
df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3

Duplicating data frame rows by freq value in same data frame [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 7 years ago.
I have a data frame with names by type and their frequencies. I'd like to expand this data frame so that the names are repeated according to their name-type frequency.
For example, this:
> df = data.frame(name=c('a','b','c'),type=c(0,1,2),freq=c(2,3,2))
name type freq
1 a 0 2
2 b 1 3
3 c 2 2
would become this:
> df_exp
name type
1 a 0
2 a 0
3 b 1
4 b 1
5 b 1
6 c 2
7 c 2
Appreciate any suggestions on a easy way to do this.
You can just use rep to "expand" your data.frame rows:
df[rep(sequence(nrow(df)), df$freq), c("name", "type")]
# name type
# 1 a 0
# 1.1 a 0
# 2 b 1
# 2.1 b 1
# 2.2 b 1
# 3 c 2
# 3.1 c 2
And there's a function expandRows in the splitstackshape package that does exactly this. It also has the option to accept a vector specifying how many times to replicate each row, for example:
expandRows(df, "freq")

R saving the output of table() into a data frame

I have the following data frame:
id<-c(1,2,3,4,1,1,2,3,4,4,2,2)
period<-c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df<-data.frame(id,period)
typing
table(df)
results in
period
id calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
however if I save it as a data frame 'df'
df<-data.frame(table(df))
the format of 'df' would be like
id period Freq
1 1 calib 2
2 2 calib 1
3 3 calib 1
4 4 calib 0
5 1 first 1
6 2 first 2
7 3 first 0
8 4 first 0
9 1 valid 0
10 2 valid 0
11 3 valid 2
12 4 valid 3
how can I avoid this and how can I save the first output as it is into a data frame?
more importantly is there any way to get the same result using 'dcast'?
Would this help?
> data.frame(unclass(table(df)))
calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
To elaborate just a little bit. I've changed the ids in the example data.frame such that your ids are not 1:4, in order to prove that the ids are carried along into the table and are not a sequence of row counts.
id <- c(10,20,30,40,10,10,20,30,40,40,20,20)
period <- c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df <- data.frame(id,period)
Create the new data.frame one of two ways. rengis answer is fine for 2-column data frames that have the id column first. It won't work so well if your data frame has more than 2 columns, or if the columns are in a different order.
Alternative would be to specify the columns and column order for your table:
df3 <- data.frame(unclass(table(df$id, df$period)))
the id column is included in the new data.frame as row.names(df3). To add it as a new column:
df3$id <- row.names(df3)
df3
calib first valid id
10 1 2 0 10
20 2 0 2 20
30 0 0 2 30
40 1 1 1 40

Resources