Pivoting Nominal Data in R [duplicate] - r

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have a data frame in R that I need to manipulate (pivot). At the simplest level the first few rows would look like the following:
Batch Unit Success InputGrouping
1 1 1 A
2 5 1 B
3 4 0 C
1 1 1 D
2 5 1 A
I would like to pivot this data so that the column names would be InputGrouping and the values would be 1 if it exists and 0 if not. Using above:
Batch Unit Success A B C D
1 1 1 1 0 0 1
2 5 1 1 1 0 0
3 4 0 0 0 1 0
I've looked at reshape/cast but can't figure out if this transformation is possible with the package. Any advice would be very much appreciated.

This is indeed possible using reshape2 with the function dcast().
Recreate your data:
dat <- read.table(header=TRUE, text="
Batch Unit Success InputGrouping
1 1 1 A
2 5 1 B
3 4 0 C
1 1 1 D
2 5 1 A")
Now recast the data:
library("reshape2")
dcast(Batch + Unit + Success ~ InputGrouping, data=dat, fun.aggregate = length)
The results:
Using InputGrouping as value column: use value.var to override.
Batch Unit Success A B C D
1 1 1 1 1 0 0 1
2 2 5 1 1 1 0 0
3 3 4 0 0 0 1 0

Here's a possible solution using the data.table package
library(data.table)
setDT(df)[, as.list(table(InputGrouping)), by = .(Batch, Unit, Success)]
# Batch Unit Success A B C D
# 1: 1 1 1 1 0 0 1
# 2: 2 5 1 1 1 0 0
# 3: 3 4 0 0 0 1 0

Related

Conditionally delete individuals from longtidunal data [duplicate]

This question already has answers here:
Select groups which have at least one of a certain value
(3 answers)
Closed 1 year ago.
I have a longitudinal data set where I want to drop individuals (id) if they do no fulfill the criterion indicated by criteria == 1 at any time points. To put it in context we could say that criteria denotes if the individual was living in the region of interest at any time during.
Using some toy-data that have a similar structure as mine:
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,1)
criteria <- c(1,0,0,0,0,0, 0, 0, 0, 1, 1, 1,0,0,1)
df <- data.frame(cbind(id,time,event, criteria))
> df
id time event criteria
1 1 1 0 1
2 1 2 1 0
3 1 3 0 0
4 2 1 1 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 0
8 3 2 0 0
9 3 3 0 0
10 4 1 0 1
11 4 2 1 1
12 4 3 0 1
13 5 1 1 0
14 5 2 0 0
15 5 3 1 1
So by removing any id that have criteria == 0 at all time points (time) would lead to an end result looking like this:
id time event criteria
1 1 1 0 1
2 1 2 1 0
3 1 3 0 0
4 4 1 0 1
5 4 2 1 1
6 4 3 0 1
7 5 1 1 0
8 5 2 0 0
9 5 3 1 1
I've been trying to achieve this by using dplyr::group_by(id) and then filter on the criterion but that does not achieve the result I want to. I'd prefer a tidyverse solution! :D
Thanks!
df %>%
group_by(id) %>%
# looking for the opposite (i.e. !) of criteria == 1 at least 1 time
mutate(is_good = !any(criteria == 1)) %>%
filter(is_good)
If you'd be willing to look into data.table's, which I recommend, it would be as simple as this:
library(data.table)
setDT(df) # make it a data.table
df[ , .SD[ !all(criteria==0) ], by=id ]
See this page for a general introduction and an explanation of the .SD idiom:
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

Create variables with ones and zeros from a variable options in r [duplicate]

This question already has answers here:
Generate a dummy-variable
(17 answers)
Closed 3 years ago.
I´m trying to create new variables from the options of one I have in my dataframe. This is my initial dataframe:
d1 <- data.frame("id" = c(1,1,2,2,3,4,5), "type" = c("A","B","C","C","A","B","C"))
id type
1 1 A
2 1 B
3 2 C
4 2 C
5 3 A
6 4 B
7 5 C
So, if would like to create new variables depending of the value of "type" for each id, I would like to get this kind of dataframe:
d2 <- data.frame("id" = c(1,1,2,2,3,4,5), "type" = c("A","B","C","C","A","B","C"),
"type.A" = c(1,0,0,0,1,0,0), "type.B" = c(0,1,0,0,0,1,0),
"type.C" = c(0,0,1,1,0,0,1))
id type type.A type.B type.C
1 1 A 1 0 0
2 1 B 0 1 0
3 2 C 0 0 1
4 2 C 0 0 1
5 3 A 1 0 0
6 4 B 0 1 0
7 5 C 0 0 1
The idea is give 1 in the new variable (type.A in this case) if the "type" of an specific "id" is equal to A, if else give 0. Since this is a common problem in big data analysis (I think), I would like to know if there is a function to solve this problem.
cbind(d1, setNames(data.frame(+sapply(unique(d1$type), function(x)
d1$type == x)), unique(d1$type)))
# id type A B C
#1 1 A 1 0 0
#2 1 B 0 1 0
#3 2 C 0 0 1
#4 2 C 0 0 1
#5 3 A 1 0 0
#6 4 B 0 1 0
#7 5 C 0 0 1

Converting data to longitudinal data

Hi i am having difficulties trying to convert my data into longitudinal data using the Reshape package. Would be grateful if anyone could help me, thank you!
Data is as follows:
m <- matrix(sample(c(0, 0:), 100, replace = TRUE), 10)
ID<-c(1:10)
dim(ID)=c(10,1)
m<- cbind(ID,m)
d <- as.data.frame(m)
names(d)<-c('ID', 'litter1', 'litter2', 'litter3', 'litter4', 'litter5', 'litter6', 'litter7', 'litter8', 'litter9', 'litter10')
print(d)
ID litter1 litter2 litter3 litter4 litter5 litter6 litter7 litter8 litter9 litter10
1 0 0 0 3 1 0 2 0 0 3
2 0 2 1 2 0 0 0 2 0 0
3 1 0 1 2 0 3 3 3 2 0
4 2 1 2 3 0 2 3 3 1 0
5 0 1 2 0 0 0 3 3 1 0
6 2 1 2 0 3 3 0 0 0 0
7 0 1 0 3 0 0 1 2 2 0
8 0 1 3 3 2 1 3 2 3 0
9 0 2 0 2 2 3 2 0 0 3
10 2 2 2 2 1 3 0 3 0 0
I wish to convert the above data into a longitudinal data with columns 'ID', 'litter category' which tells us the category of the litter, i.e. 1-10 and 'litter number' which tells us the number of pieces for each litter category:
ID littercategory litternumber
1 4 3
1 5 1
1 7 2
1 10 3
2 2 2
2 3 1
2 4 2
2 8 2
and so on.
Would really appreciate your help thank you!
You could do that as follows:
library(reshape2)
d = melt(d, id.vars=c("ID"))
colnames(d) = c('ID','littercategory','litternumber')
# remove the text in the littercategory column, keep only the number.
d$littercategory = gsub('litter','',d$littercategory)
d = d[d$litternumber!=0]
Output:
ID littercategory litternumber
1 1 4
2 1 8
3 1 6
4 1 4
7 1 6
8 1 5
10 1 10
1 2 6
2 2 9
As you can see, only the ordering is different as the output you requested, but I'm sure you can fix that yourself. (If not, there are plenty of resources on how to do that).
Hope this helps!
To get desired output you have to melt your data and filter out values larger than 0.
library(data.table)
result <- setDT(melt(d, "ID"))[value != 0][order(ID)]
# To get exact structure modify result
result[, .(ID,
littercategory = sub("litter", "", variable),
litternumber = value)]

How to write new column conditional on grouped rows in R?

I have a data frame where each Item has three categories (a, b,c) and a numeric Answer for each category is recorded (either 0 or 1). I would like to create a new column contingent on the rows in the Answer column. This is how my data frame looks like:
Item <- rep(c(1:3), each=3)
Option <- rep(c('a','b','c'), times=3)
Answer <- c(1,1,0,1,0,1,1,1,1)
df <- data.frame(Item, Option, Answer)
Item Option Answer
1 1 a 1
2 1 b 1
3 1 c 0
4 2 a 0
5 2 b 0
6 2 c 1
7 3 a 1
8 3 b 1
9 3 c 1
What is needed: whenever the three categories in the Option column are 1, the New column should receive a 1. In any other case, the column should have a 0. The desired output should look like this:
Item Option Answer New
1 1 a 1 0
2 1 b 1 0
3 1 c 0 0
4 2 a 0 0
5 2 b 0 0
6 2 c 1 0
7 3 a 1 1
8 3 b 1 1
9 3 c 1 1
I tried to achieve this without using a loop, but I got stuck because I don't know how to make a new column contingent on a group of rows, not just a single one. I have tried this solution but it doesn't work if the rows are not grouped in pairs.
Do you have any suggestions? Thanks a bunch!
This should work:
df %>%
group_by(Item)%>%
mutate(New = as.numeric(all(as.logical(Answer))))
using data.table
DT <- data.table(Item, Option, Answer)
DT[, Index := as.numeric(all(as.logical(Answer))), by= Item]
DT
Item Option Answer Index
1: 1 a 1 0
2: 1 b 1 0
3: 1 c 0 0
4: 2 a 1 0
5: 2 b 0 0
6: 2 c 1 0
7: 3 a 1 1
8: 3 b 1 1
9: 3 c 1 1
Or using only base R
df$Index <- with(df, +(ave(!!Answer, Item, FUN = all)))
df$Index
#[1] 0 0 0 0 0 0 1 1 1

convert long to wide format with two factors in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 7 years ago.
I have the following data set:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
Case = c(1,1,1,1,2,2,3,3,3,4,5),
Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"),
Reason = c("Docs","Slip","Docs","","Docs","","Slip","Docs","","",""))
sample.data
Step Case Decision Reason
1 1 1 Referred Docs
2 2 1 Referred Slip
3 3 1 Referred Docs
4 4 1 Approved
5 1 2 Referred Docs
6 2 2 Declined
7 1 3 Referred Slip
8 2 3 Referred Docs
9 3 3 Declined
10 1 4 Approved
11 1 5 Declined
Is it possible in R to translate this into a wide table format, with the decisions on the header, and the value of each cell being the count of the occurrence, for example:
Case Referred Approved Declined Docs Slip
1 3 1 0 2 0
2 1 0 1 1 0
3 2 0 1 1 1
4 0 1 0 0 0
5 0 0 1 0 0
library(reshape2)
df1 <- dcast(sample.data, Case~Decision+Reason)
names(df1)[2:5] <- c("Approved", "Declined", "Docs", "Slip")
df1$Referred <- df1$Docs + df1$Slip
df1
# Case Approved Declined Docs Slip Referred
# 1: 1 1 0 2 1 3
# 2: 2 0 1 1 0 1
# 3: 3 0 1 1 1 2
# 4: 4 1 0 0 0 0
# 5: 5 0 1 0 0 0
Using:
library(reshape2)
tmp <- melt(sample.data, id.var=c("Step", "Case"))
tmp <- tmp[tmp$value!="",]
dcast(tmp, Case ~ value, value.var="Case", length)
you get:
Case Approved Declined Docs Referred Slip
1: 1 1 0 2 3 1
2: 2 0 1 1 1 0
3: 3 0 1 1 2 1
4: 4 1 0 0 0 0
5: 5 0 1 0 0 0
Using the data.table-package, you can use the same melt and dcast functionality as with reshape2, but you don't need a temporary dataframe:
library(data.table)
dcast(melt(setDT(sample.data), id.var=c("Step", "Case"))[value!=""],
Case ~ value, value.var="Case", length)
which will give you the same result.
We can use gather/spread from tidyr
library(tidyr)
library(dplyr)
gather(sample.data, Var, Val, 3:4) %>%
group_by(Case, Val) %>%
summarise(n=n()) %>%
filter(Val!='') %>%
spread(Val, n, fill=0)
# Case Approved Declined Docs Referred Slip
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
#1 1 1 0 2 3 1
#2 2 0 1 1 1 0
#3 3 0 1 1 2 1
#4 4 1 0 0 0 0
#5 5 0 1 0 0 0

Resources