I have a data frame where each Item has three categories (a, b,c) and a numeric Answer for each category is recorded (either 0 or 1). I would like to create a new column contingent on the rows in the Answer column. This is how my data frame looks like:
Item <- rep(c(1:3), each=3)
Option <- rep(c('a','b','c'), times=3)
Answer <- c(1,1,0,1,0,1,1,1,1)
df <- data.frame(Item, Option, Answer)
Item Option Answer
1 1 a 1
2 1 b 1
3 1 c 0
4 2 a 0
5 2 b 0
6 2 c 1
7 3 a 1
8 3 b 1
9 3 c 1
What is needed: whenever the three categories in the Option column are 1, the New column should receive a 1. In any other case, the column should have a 0. The desired output should look like this:
Item Option Answer New
1 1 a 1 0
2 1 b 1 0
3 1 c 0 0
4 2 a 0 0
5 2 b 0 0
6 2 c 1 0
7 3 a 1 1
8 3 b 1 1
9 3 c 1 1
I tried to achieve this without using a loop, but I got stuck because I don't know how to make a new column contingent on a group of rows, not just a single one. I have tried this solution but it doesn't work if the rows are not grouped in pairs.
Do you have any suggestions? Thanks a bunch!
This should work:
df %>%
group_by(Item)%>%
mutate(New = as.numeric(all(as.logical(Answer))))
using data.table
DT <- data.table(Item, Option, Answer)
DT[, Index := as.numeric(all(as.logical(Answer))), by= Item]
DT
Item Option Answer Index
1: 1 a 1 0
2: 1 b 1 0
3: 1 c 0 0
4: 2 a 1 0
5: 2 b 0 0
6: 2 c 1 0
7: 3 a 1 1
8: 3 b 1 1
9: 3 c 1 1
Or using only base R
df$Index <- with(df, +(ave(!!Answer, Item, FUN = all)))
df$Index
#[1] 0 0 0 0 0 0 1 1 1
Related
This question already has answers here:
Generate a dummy-variable
(17 answers)
Closed 3 years ago.
I´m trying to create new variables from the options of one I have in my dataframe. This is my initial dataframe:
d1 <- data.frame("id" = c(1,1,2,2,3,4,5), "type" = c("A","B","C","C","A","B","C"))
id type
1 1 A
2 1 B
3 2 C
4 2 C
5 3 A
6 4 B
7 5 C
So, if would like to create new variables depending of the value of "type" for each id, I would like to get this kind of dataframe:
d2 <- data.frame("id" = c(1,1,2,2,3,4,5), "type" = c("A","B","C","C","A","B","C"),
"type.A" = c(1,0,0,0,1,0,0), "type.B" = c(0,1,0,0,0,1,0),
"type.C" = c(0,0,1,1,0,0,1))
id type type.A type.B type.C
1 1 A 1 0 0
2 1 B 0 1 0
3 2 C 0 0 1
4 2 C 0 0 1
5 3 A 1 0 0
6 4 B 0 1 0
7 5 C 0 0 1
The idea is give 1 in the new variable (type.A in this case) if the "type" of an specific "id" is equal to A, if else give 0. Since this is a common problem in big data analysis (I think), I would like to know if there is a function to solve this problem.
cbind(d1, setNames(data.frame(+sapply(unique(d1$type), function(x)
d1$type == x)), unique(d1$type)))
# id type A B C
#1 1 A 1 0 0
#2 1 B 0 1 0
#3 2 C 0 0 1
#4 2 C 0 0 1
#5 3 A 1 0 0
#6 4 B 0 1 0
#7 5 C 0 0 1
I'm trying to flag duplicate IDs in another column. I don't necessarily want to remove them yet, just create an indicator (0/1) of whether the IDs are unique or duplicates. In sql, it would be like this:
SELECT ID, count(ID) count from TABLE group by ID) a
On TABLE.ID = a.ID
set ID Duplicate Flag Column 1 = 1
where count > 1;
Is there a way to do this simply in r?
Any help would be greatly appreciated.
As an example of duplicated let's start with some values (numbers here, but strings would do the same thing)
x <- c(9, 1:5, 3:7, 0:8)
x
# 9 1 2 3 4 5 3 4 5 6 7 0 1 2 3 4 5 6 7 8
If you want to flag the second and later copies
as.numeric(duplicated(x))
# 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0
If you want to flag all values that occur two or more times
as.numeric(x %in% x[duplicated(x)])
# 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
I have a R dataframe like this:
ID Event Out
A 0 0
A 1 1
A 1 1
A 0 0
A 1 2
B 1 3
B 0 0
C 1 4
C 1 4
C 1 4
I am trying to create the out field which is a sequential conditional (on event =1 or not) repeated index. The index needs to increment by 1 with every new group occurrence of the event but carrying on in the sequence from the previous group. Is there a plyr option for this. Thanks in advance.
One solution could be achieved as below.
The approach:
Logic seems to that out should be incremented whenever there is change in Event or change in ID. out will not increment if Event is 0. The increment to out is beyond boundary of group.
library(dplyr)
df %>% mutate(increment =
ifelse(Event != 0 & (ID != lag(ID) | Event != lag(Event)), 1, 0)) %>%
mutate(out_calculated = ifelse(Event == 0, 0, cumsum(increment))) %>%
select(-increment)
# ID Event Out out_calculated
# 1 A 0 0 0
# 2 A 1 1 1
# 3 A 1 1 1
# 4 A 0 0 0
# 5 A 1 2 2
# 6 B 1 3 3
# 7 B 0 0 0
# 8 C 1 4 4
# 9 C 1 4 4
# 10 C 1 4 4
Data
df <- read.table(text = "ID Event Out
A 0 0
A 1 1
A 1 1
A 0 0
A 1 2
B 1 3
B 0 0
C 1 4
C 1 4
C 1 4", header = TRUE, stringsAsFactor = FALSE)
A somewhat hacky solution using an alternative package data.table. This solution should be faster also.
library(data.table)
setDT(dt) # assuming your data.frame is called dt
dt[, out_dt := frank(rleid(paste(Event, ID)) * Event, ties.method = "dense") - 1]
dt
ID Event Out out_dt
1: A 0 0 0
2: A 1 1 1
3: A 1 1 1
4: A 0 0 0
5: A 1 2 2
6: B 1 3 3
7: B 0 0 0
8: C 1 4 4
9: C 1 4 4
10: C 1 4 4
This question already has answers here:
R: Convert delimited string into variables
(3 answers)
Closed 5 years ago.
I have the following dataset
#datset
id attributes value
1 a,b,c 1
2 c,d 0
3 b,e 1
I wish to make a pivot table out of them and assign binary values to the attribute (1 to the attributes if they exist otherwise assign 0 to them). My ideal output will be the following:
#output
id a b c d e Value
1 1 1 1 0 0 1
2 0 0 1 1 0 0
3 0 1 0 0 1 1
Any tip is really appreciated.
We split the 'attributes' column by ',', get the frequency with mtabulate from qdapTools and cbind with the first and third column.
library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$attributes, ",")), df1[3])
# id a b c d e value
#1 1 1 1 1 0 0 1
#2 2 0 0 1 1 0 0
#3 3 0 1 0 0 1 1
With base R:
attributes <- sort(unique(unlist(strsplit(as.character(df$attributes), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(attributes)), ncol=length(attributes)))
names(cols) <- attributes
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){attributes <- strsplit(x['attributes'], split=','); x[unlist(attributes)] <- 1;x})))[c('id', attributes, 'value')]
df
id a b c d e value
1 1 1 1 1 0 0 1
2 2 0 0 1 1 0 0
3 3 0 1 0 0 1 1
This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have a data frame in R that I need to manipulate (pivot). At the simplest level the first few rows would look like the following:
Batch Unit Success InputGrouping
1 1 1 A
2 5 1 B
3 4 0 C
1 1 1 D
2 5 1 A
I would like to pivot this data so that the column names would be InputGrouping and the values would be 1 if it exists and 0 if not. Using above:
Batch Unit Success A B C D
1 1 1 1 0 0 1
2 5 1 1 1 0 0
3 4 0 0 0 1 0
I've looked at reshape/cast but can't figure out if this transformation is possible with the package. Any advice would be very much appreciated.
This is indeed possible using reshape2 with the function dcast().
Recreate your data:
dat <- read.table(header=TRUE, text="
Batch Unit Success InputGrouping
1 1 1 A
2 5 1 B
3 4 0 C
1 1 1 D
2 5 1 A")
Now recast the data:
library("reshape2")
dcast(Batch + Unit + Success ~ InputGrouping, data=dat, fun.aggregate = length)
The results:
Using InputGrouping as value column: use value.var to override.
Batch Unit Success A B C D
1 1 1 1 1 0 0 1
2 2 5 1 1 1 0 0
3 3 4 0 0 0 1 0
Here's a possible solution using the data.table package
library(data.table)
setDT(df)[, as.list(table(InputGrouping)), by = .(Batch, Unit, Success)]
# Batch Unit Success A B C D
# 1: 1 1 1 1 0 0 1
# 2: 2 5 1 1 1 0 0
# 3: 3 4 0 0 0 1 0