Mutate over every possible combination of columns

Mutate over every possible combination of columns - r

I have a data frame of binary variables:
df <-data.frame(a = c(0,1,0,1,0), b = c(1, 1, 0, 0, 1), c = c(1,0,1,1,0))
And I'd like to create a column for each possible combination of my pre-existing columns:
library(tidyverse)
df %>%
mutate(d = case_when(a==1 & b==1 & c==1 ~ 1),
e = case_when(a==1 & b==1 & c!=1 ~ 1),
f = case_when(a==1 & b!=1 & c==1 ~ 1),
g = case_when(a!=1 & b==1 & c==1 ~ 1))
But my real dataset has too many columns to do this without a function or loop. Is there an easy way to do this in R?

First note that do.call(paste0, df) will combine all of your columns into one string, however many they are:
do.call(paste0, df)
# [1] "011" "110" "001" "101" "010" "011"
Then you can use spread() from the tidyr package to give each its own column. Note that you have to add an extra row column so that it knows to keep each of the rows separate (instead of trying to combine them).
# I added a sixth row that copied the first to make the effect clear
df<-data.frame(a = c(0,1,0,1,0,0), b = c(1, 1, 0, 0, 1, 1), c = c(1,0,1,1,0,1))
# this assumes you want `type_` at the start of each new column,
# but you could use a different convention
df %>%
mutate(type = paste0("type_", do.call(paste0, df)),
value = 1,
row = row_number()) %>%
spread(type, value, fill = 0) %>%
select(-row)
Result:
a b c type_001 type_010 type_011 type_101 type_110
1 0 0 1 1 0 0 0 0
2 0 1 0 0 1 0 0 0
3 0 1 1 0 0 1 0 0
4 0 1 1 0 0 1 0 0
5 1 0 1 0 0 0 1 0
6 1 1 0 0 0 0 0 1

An alternative to David's answer, but I recognize it's a little awkward:
df %>%
unite(comb, a:c, remove = FALSE) %>%
spread(key = comb, value = comb) %>%
mutate_if(is.character, funs(if_else(is.na(.), 0, 1)))
#> a b c 0_0_1 0_1_0 0_1_1 1_0_1 1_1_0
#> 1 0 0 1 1 0 0 0 0
#> 2 0 1 0 0 1 0 0 0
#> 3 0 1 1 0 0 1 0 0
#> 4 1 0 1 0 0 0 1 0
#> 5 1 1 0 0 0 0 0 1
EDIT: funs() is being deprecated as of version 0.8.0 of dplyr, so the last line should be revised to:
mutate_if(is.character, list(~ if_else(is.na(.), 0, 1)))

Related

extracting unique combinations from a long list of binary variables

I have a dataframe containing a long list of binary variables. Each row represents a participant, and columns represent whether a participant made a certain choice (1) or not (0). For the sakes of simplicity, let's say there's only four binary variables and 6 participants.
df <- data.frame(a = c(0,1,0,1,0,1),
b = c(1,1,1,1,0,1),
c = c(0,0,0,1,1,1),
d = c(1,1,0,0,0,0))
>df
# a b c d
# 1 0 1 0 1
# 2 1 1 0 1
# 3 0 1 0 0
# 4 1 1 1 0
# 5 0 0 1 0
# 6 1 1 1 0
In the dataframe, I want to create a list of columns that reflect each unique combination of variables in df (i.e., abc, abd, bcd, cda). Then, for each row, I want to add value "1" if the row contains the particular combination corresponding to the column. So, if the participant scored 1 on "a", "b", and "c", and 0 on "d" he would have a score 1 in the newly created column "abc", but 0 in the other columns. Ideally, it would look something like this.
>df_updated
# a b c d abc abd bcd cda
# 1 0 1 0 1 0 0 0 0
# 2 1 1 0 1 0 1 0 0
# 3 0 1 0 0 0 0 0 0
# 4 1 1 1 0 1 0 0 0
# 5 0 0 1 0 0 0 0 0
# 6 1 1 1 0 0 0 0 0
The ultimate goal is to have an idea of the frequency of each of the combinations, so I can order them from the most frequently chosen to the least frequently chosen. I've been thinking about this issue for days now, but couldn't find an appropriate answer. I would very much appreciate the help.

Something like this?
funCombn <- function(data){
f <- function(x, data){
data <- data[x]
list(
name = paste(x, collapse = ""),
vec = apply(data, 1, function(x) +all(as.logical(x)))
)
}
res <- combn(names(df), 3, f, simplify = FALSE, data = df)
out <- do.call(cbind.data.frame, lapply(res, '[[', 'vec'))
names(out) <- sapply(res, '[[', 'name')
cbind(data, out)
}
funCombn(df)
# a b c d abc abd acd bcd
#1 0 1 0 1 0 0 0 0
#2 1 1 0 1 0 1 0 0
#3 0 1 0 0 0 0 0 0
#4 1 1 1 0 1 0 0 0
#5 0 0 1 0 0 0 0 0
#6 1 1 1 0 1 0 0 0

Base R option using combn :
n <- 3
cbind(df, do.call(cbind, combn(names(df), n, function(x) {
setNames(data.frame(as.integer(rowSums(df[x] == 1) == n)),
paste0(x, collapse = ''))
}, simplify = FALSE))) -> result
result
# a b c d abc abd acd bcd
#1 0 1 0 1 0 0 0 0
#2 1 1 0 1 0 1 0 0
#3 0 1 0 0 0 0 0 0
#4 1 1 1 0 1 0 0 0
#5 0 0 1 0 0 0 0 0
#6 1 1 1 0 1 0 0 0
Using combn create all combinations of column names taking n columns at a time. For each of those combinations assign 1 to those rows where all the 3 combinations are 1 or 0 otherwise.

If you are just looking for a frequency of the combinations (and they don't need to be back in the original data), then you could use something like this:
df <- data.frame(a = c(0,1,0,1,0,1),
b = c(1,1,1,1,0,1),
c = c(0,0,0,1,1,1),
d = c(1,1,0,0,0,0))
n <- names(df)
out <- sapply(n, function(x)ifelse(df[[x]] == 1, x, ""))
combs <- apply(out, 1, paste, collapse="")
sort(table(combs))
# combs
# abd b bd c abc
# 1 1 1 1 2

Ok, so let's use your data, including one row without any 1's:
df <- data.frame(
a = c(0,1,0,1,0,1,0),
b = c(1,1,1,1,0,1,0),
c = c(0,0,0,1,1,1,0),
d = c(1,1,0,0,0,0,0)
)
Now I want to paste all column names together if they have a 1, and then make that a wide table (so that all have a column for a combination). Of course, I fill all resulting NAs with 0's.
df2 <- df %>%
dplyr::mutate(
combination = paste0(
ifelse(a == 1, "a", ""), # There is possibly a way to automate this as well using across()
ifelse(b == 1, "b", ""),
ifelse(c == 1, "c", ""),
ifelse(d == 1, "d", "")
),
combination = ifelse(
combination == "",
"nothing",
paste0("comb_", combination)
),
value = ifelse(
is.na(combination),
0,
1
),
i = dplyr::row_number()
) %>%
tidyr::pivot_wider(
names_from = combination,
values_from = value,
names_repair = "unique"
) %>%
replace(., is.na(.), 0) %>%
dplyr::select(-i)
Since you want to order the original df by frequency, you can create a summary of all combinations (excluding those without anything filled in). Then you just make it a long table and pull the column for every combination (arranged by frequency) from the table.
comb_in_order <- df2 %>%
dplyr::select(
-tidyselect::any_of(
c(
names(df),
"nothing" # I think you want these last.
)
)
) %>%
dplyr::summarise(
dplyr::across(
.cols = tidyselect::everything(),
.fns = sum
)
) %>%
tidyr::pivot_longer(
cols = tidyselect::everything(),
names_to = "combination",
values_to = "frequency"
) %>%
dplyr::arrange(
dplyr::desc(frequency)
) %>%
dplyr::pull(combination)
The only thing to do then is to reconstruct the original df by these after arranging by the columns.
df2 %>%
dplyr::arrange(
across(
tidyselect::any_of(comb_in_order),
desc
)
) %>%
dplyr::select(
tidyselect::any_of(names(df))
)
This should work for all possible combinations.

recode using ifelse clause within groups

I'm trying to set up column (called 'combined) to indicate the combined information of owner and Head within each group (Group). There is only 1 owner in each group, and 'Head' is basically the first row of each group that has the minimum id value.
This combined column should flag '1' if the ID is flagged as owner, then the rest of the id within each group will be 0 regardless of the information in 'Head'. However for groups that do not have any Owner in the IDs (i.e. all 0 in owner within the group), then this column will take the Head column information. My data looks like this and the last column (combined) is the desired outcome.
sample <- data.frame(Group = c("46005589", "46005589","46005590","46005591", "46005591","46005592","46005592","46005592", "46005593", "46005594"), ID= c("189199", "2957073", "272448", "1872092", "10374996", "1153514", "2771118","10281300", "2610301", "3564526"), Owner = c(0, 1, 1, 0, 0, 0, 1, 0, 1, 1), Head = c(1, 0, 0, 1, 0, 1, 0, 0, 1, 1), combined = c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1))
> sample
Group ID Owner Head combined
1 46005589 189199 0 1 0
2 46005589 2957073 1 0 1
3 46005590 272448 1 0 1
4 46005591 1872092 0 1 1
5 46005591 10374996 0 0 0
6 46005592 1153514 0 1 0
7 46005592 2771118 1 0 1
8 46005592 10281300 0 0 0
9 46005593 2610301 1 1 1
10 46005594 3564526 1 1 1
I've tried a few dplyr and ifelse clauses and it didn't seem to give outputs to what I wanted. How should I recode this column? Thanks.

I don't think this is the best way but you might look at visually inspecting IDs with all 0s. You could do this with rowSums and specify these IDs using %in%. Here is a possible solution:
library(dplyr)
df %>%
mutate_at(vars(ID,Group),funs(as.factor)) %>%
mutate(Combined=if_else(Owner==1,1,0),
NewCombi=ifelse(ID== "1872092",Head,Combined))
This yields: NewCombi is our target.
# Group ID Owner Head Combined NewCombi
#1 46005589 189199 0 1 0 0
#2 46005589 2957073 1 0 1 1
#3 46005590 272448 1 0 1 1
#4 46005591 1872092 0 1 0 1
#5 46005591 10374996 0 0 0 0
#6 46005592 1153514 0 1 0 0
#7 46005592 2771118 1 0 1 1
#8 46005592 10281300 0 0 0 0
#9 46005593 2610301 1 1 1 1
#10 46005594 3564526 1 1 1 1

The new combined column can be created in two steps in dplyr: first use filter(all(Owner == 0))by creating a column that only contains 'Head' information of IDs that do not contain any 'Owner', then merge this column back to the original dataframe, sum up the 1s in this column and the 1s 'Owner' column to obtain the combined info.
library(dplyr)
sample2 <- sample %>%
group_by(Group) %>%
filter(all(Owner == 0)) %>%
mutate(Head_nullowner = ifelse(Head == 1, 1, 0)) #select all rows of IDs that do not have any owners
#merge Head_nullowner with the original dataframe by both Group and ID
sample <- merge(sample, sample2[c("Group", "ID", "Head_nullowner")], by.x = c("Group", "ID"), by.y = c("Group", "ID"), all.x = T)
sample$Head_nullowner[is.na(sample$Head_nullowner)] <- 0
sample$OwnerHead_combined = sample$Owner + sample$Head_nullowner
> sample
Group ID Owner Head combined Head_nullowner OwnerHead_combined
1 46005589 189199 0 1 0 0 0
2 46005589 2957073 1 0 1 0 1
3 46005590 272448 1 0 1 0 1
4 46005591 10374996 0 0 0 0 0
5 46005591 1872092 0 1 1 1 1
6 46005592 10281300 0 0 0 0 0
7 46005592 1153514 0 1 0 0 0
8 46005592 2771118 1 0 1 0 1
9 46005593 2610301 1 1 1 0 1
10 46005594 3564526 1 1 1 0 1

Error: expecting a string in R

I am running the following code:
mydataframe <- mydataframe %>%
mutate(newVar1 = abs(as.numeric(CanBe1 == 0 & lead(var_id, default = 0) == (var_id + 1)) - 1)) %>%
group_by(pt, item) %>%
mutate(newVar2 = abs(as.numeric((CanBe2 == 0 & lag(var_id, default = 0) == (var_id - 1)) ) -1),
newVar2 = ifelse(lag(newVar1, default = 0) == 1, 1, newVar2))
but I get an error: Error:expecting a string. What does it mean? where exactly there should be a string?
Here are few examples of the data I have, and I expect:
pt item var_id CanBe1 CanBe2 newVar1 newVar2
1 9 2 0 0 0 1
1 9 3 0 0 0 0
1 9 4 1 0 0 0
1 9 5 0 0 1 0
1 9 7 0 0 0 1
1 9 8 1 0 1 0
1 9 10 0 1 0 1
1 9 11 0 0 0 0
1 9 12 1 0 1 0
1 9 2 1 0 0 1
The variables I am using are:
class(mydataframe$pt) = `factor` #even if I change this one to `character` the code doesn't work
class(mydataframe$item) = `character`
class(mydataframe$var_id) = `character`
class(mydataframe$CanBe1) = `numeric`
class(mydataframe$canBe2) = `numeric`

var_id is currently a character string. Reclassifying it as numeric ahead of time will fix it.
mydataframe <- mydataframe %>%
mutate(var_id = as.numeric(var_id), #Switching from character to numeric
newVar1 = abs(as.numeric(CanBe1 == 0 & lead(var_id, default = 0) == (var_id + 1)) - 1)) %>%
group_by(pt, item) %>%
mutate(newVar2 = abs(as.numeric((CanBe2 == 0 & lag(var_id, default = 0) == (var_id - 1)) ) -1),
newVar2 = ifelse(lag(newVar1, default = 0) == 1, 1, newVar2))

Create a dataframe from a dataframe

I'd like to create a dataframe from a dataframe that created before. my first dataframe is:
Sample motif chromosome
1 CT-G.A 1
1 TA-C.C 1
1 TC-G.C 2
2 CG-A.T 2
2 CA-G.T 2
Then I want to create a dataframe like below, for all (96*24-motifs*chromosomes-):
Sample CT-G.A,chr1 TA-C.C,chr1 TC-G.C,chr1 CG-A.T,ch1 CA-G.T,ch1 CT-G.A,chr2 TA-C.C,chr2 TC-G.C,chr2 CG-A.T,ch2 CA-G.T,ch2
1 1 1 0 0 0 0 0 1 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 1 1

Here is a possble solution using dplyr and tidyr.
We add a column value that indicates if a chromosome is present, then complete the data.frame, making sure we have rows for each motif-chromosome-Sample combination, where missing combinations get a 0 in the value column. We create a key out of the motif and chromosome columns, and then discard those columns. Lastly, we reshape the data.frame from long to wide (see here) to get your desired format. Hope this helps!
df = read.table(text="Sample motif chromosome
1 CT-G.A 1
1 TA-C.C 1
1 TC-G.C 2
2 CG-A.T 2
2 CA-G.T 2
2 CA-G.T 2",header=T)
library(tidyr)
library(dplyr)
df %>% mutate(value=1) %>% complete(motif,chromosome,Sample,fill=list(value=0)) %>%
mutate(key=paste0(motif,',chr',chromosome)) %>%
group_by(Sample,key) %>%
summarize(value = sum(value)) %>%
spread(key,value) %>%
as.data.frame
Output:
Sample CA-G.T,chr1 CA-G.T,chr2 CG-A.T,chr1 CG-A.T,chr2 CT-G.A,chr1 CT-G.A,chr2 TA-C.C,chr1 TA-C.C,chr2 TC-G.C,chr1 TC-G.C,chr2
1 1 0 0 0 0 1 0 1 0 0 1
2 2 0 2 0 1 0 0 0 0 0 0

This seems to be a classic case of when you would want to use factors and ensure that the empty factor levels aren't dropped (which dcast and other functions might do unless explicitly told not to).
Using #Florian's sample data, you can try:
library(data.table)
cols <- c("motif", "chromosome")
setDT(df)[, (cols) := lapply(.SD, factor), .SDcols = cols][
, dcast(unique(.SD)[, value := 1L],
Sample ~ motif + chromosome, value.var = "value",
fill = 0L, drop = FALSE)]
# Sample CA-G.T_1 CA-G.T_2 CG-A.T_1 CG-A.T_2 CT-G.A_1 CT-G.A_2 TA-C.C_1 TA-C.C_2 TC-G.C_1 TC-G.C_2
# 1 1 0 0 0 0 1 0 1 0 0 1
# 2 2 0 1 0 1 0 0 0 0 0 0
I've moved "cols" and myfun() outside of the transformation to save some typing and make things look a little more tidy.
Using the "tidyverse", I'd take a slightly different approach from #Florian, perhaps something like:
library(tidyverse)
df %>%
mutate_at(c("motif", "chromosome"), factor) %>%
mutate(value = 1) %>%
distinct() %>%
mutate(key = interaction(motif, chromosome)) %>%
select(-motif, -chromosome) %>%
spread(key, value, fill = 0, drop = FALSE)
Benchmarks
Benchmarks for these approaches and #Florian's can be found at this Gist.
On 10,000 rows, and 20 resulting columns, the results look like:

This will work for you. I have used package tidyr and dplyr. Actually, I had preferred to use unite and expand.grid from base r to achieve by finally using spread
df <- read.table(text = "Sample motif chromosome
1 CT-G.A 1
1 TA-C.C 1
1 TC-G.C 2
2 CG-A.T 2
2 CA-G.T 2", header = TRUE)
#add a column to represent presence of chromosome
df$val <- 1
library(tidyr)
library(dplyr)
#Complete missing rows
df_complete <- left_join(
expand.grid(Sample=unique(df$Sample), motif=unique(df$motif),
chromosome=unique(df$chromosome)),
df, by = c("Sample", "motif", "chromosome"), copy = TRUE)
#Additional rows should have val = 0
df_complete$val[is.na(df_complete$val)] <- 0
df_complete %>%
unite(motif, c("motif", "chromosome"), sep = ",chr" ) %>%
spread(motif, val)
#Result
Sample CA-G.T,chr1 CA-G.T,chr2 CG-A.T,chr1 CG-A.T,chr2 CT-G.A,chr1 CT-G.A,chr2 TA-C.C,chr1 TA-C.C,chr2 TC-G.C,chr1 TC-G.C,chr2
1 1 0 0 0 0 1 0 1 0 0 1
2 2 0 1 0 1 0 0 0 0 0 0

use a string character-location identity to create a new variable

So I have been able to achieve my desired output, but I am sure that one can use a string to achieve a much more efficient code.
Let play with this data
set.seed(123)
A <- 1:100
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A, type.a, type.b, type.c, type.d)
Now we want to create a new variable for df1 that will identity if a type(a:d) begun with number 1. So I have used this code:
df1$Type_1 <- with(df1, ifelse((type.a < 2000 & type.a > 999)|(type.b < 2000 & type.c > 999)|
(type.c < 2000 & type.c > 999)|(type.d < 2000 & type.d > 999), 1,0))
Or similiarly, this also:
df1$type_1 <- with(df1, ifelse(type.a < 2000 & type.a > 999, 1,
ifelse(type.b < 2000 & type.c > 999, 1,
ifelse(type.c < 2000 & type.c > 999, 1,
ifelse(type.d < 2000 & type.d > 999, 1,0)))))
Now my question form two parts
How can you use a string which will look at only the first digit of type(a:d) to test if it is equal to our constraint. (in this instance equal to 1)
Secondly, I have more than four columns of data. Thus I dont think it is efficient I specify column names each time. Can the use of [,x:y] be used?
The code then be used to create 9 new columns of data (ie. type_1 & type_2 ... type_9), as the first digit of our type(a:d) has a range of 1:9

We can use substr to extract the first character of a string. As there are four columns that start with type, we can use grep to get the numeric index of columns, we loop the columns with lapply, check whether the 1st character is equal to 1. If we want to know whether there is at least one value that meets the condition, we can wrap it with any. Using lapply returns a list output with a length of 1 for each list element. As we need a binary (0/1) instead of logical (FALSE/TRUE), we can wrap with + to coerce the logical to binary representation.
indx <- grep('^type', colnames(df1))
lapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)))
If we need a vector output
vapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)), 1L)

Great and elegant answer by #akrun. I was interested in the 2nd part of your question. Specifically about how you're going to use the first part to create the new 9 columns you mention. I don't know if I'm missing something, but instead of checking each time if the first element matches 1,2,3, etc. you can just simply capture that first element. Something like this:
library(dplyr)
library(tidyr)
set.seed(123)
A <- 1:100
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A, type.a, type.b, type.c, type.d)
df1 %>%
group_by(A) %>%
mutate_each(funs(substr(.,1,1))) %>% # keep first digit
ungroup %>%
gather(variable, type, -A) %>% # create combinations of rows and digits
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,type) %>%
summarise(value = sum(value)) %>% # count how many times the row belongs to each type
ungroup %>%
spread(type, value, fill=0) %>% # create the new columns
inner_join(df1, by="A") %>% # join back initial info
select(A, starts_with("type."), starts_with("type_")) # order columns
# A type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9
# 1 1 4196.838 3987.671 7473.662 4118.106 0 0 1 2 0 0 1 0 0
# 2 2 4670.156 5366.059 6476.465 4071.935 0 0 0 2 1 1 0 0 0
# 3 3 7233.629 4648.464 4701.712 3842.782 0 0 1 2 0 0 1 0 0
# 4 4 5101.039 4504.752 5611.093 3702.251 0 0 1 1 2 0 0 0 0
# 5 5 5185.269 3643.944 4533.868 4460.982 0 0 1 2 1 0 0 0 0
# 6 6 7457.688 4935.835 4464.222 5408.344 0 0 0 2 1 0 1 0 0
# 7 7 5660.493 3881.511 4112.822 2516.478 0 1 1 1 1 0 0 0 0
# 8 8 3187.167 2623.183 4331.056 5261.372 0 1 1 1 1 0 0 0 0
# 9 9 4015.740 4458.177 6857.271 6524.820 0 0 0 2 0 2 0 0 0
# 10 10 4361.366 6309.570 4939.218 7512.329 0 0 0 2 0 1 1 0 0
# .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
Example when we have column A and B in the beginning:
library(dplyr)
library(tidyr)
set.seed(123)
A <- 1:100
B <- 101:200
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A,B, type.a, type.b, type.c, type.d)
# work by grouping on A and B
df1 %>%
group_by(A,B) %>%
mutate_each(funs(substr(.,1,1))) %>%
ungroup %>%
gather(variable, type, -c(A,B)) %>%
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,B,type) %>%
summarise(value = sum(value)) %>%
ungroup %>%
spread(type, value, fill=0) %>%
inner_join(df1, by=c("A","B")) %>%
select(A,B, starts_with("type."), starts_with("type_"))
# A B type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9
# 1 1 101 4196.838 3987.671 7473.662 4118.106 0 0 1 2 0 0 1 0 0
# 2 2 102 4670.156 5366.059 6476.465 4071.935 0 0 0 2 1 1 0 0 0
# 3 3 103 7233.629 4648.464 4701.712 3842.782 0 0 1 2 0 0 1 0 0
# 4 4 104 5101.039 4504.752 5611.093 3702.251 0 0 1 1 2 0 0 0 0
# 5 5 105 5185.269 3643.944 4533.868 4460.982 0 0 1 2 1 0 0 0 0
# 6 6 106 7457.688 4935.835 4464.222 5408.344 0 0 0 2 1 0 1 0 0
# 7 7 107 5660.493 3881.511 4112.822 2516.478 0 1 1 1 1 0 0 0 0
# 8 8 108 3187.167 2623.183 4331.056 5261.372 0 1 1 1 1 0 0 0 0
# 9 9 109 4015.740 4458.177 6857.271 6524.820 0 0 0 2 0 2 0 0 0
# 10 10 110 4361.366 6309.570 4939.218 7512.329 0 0 0 2 0 1 1 0 0
# .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
However, in this case you should notice that you have one A value for each line. So, B isn't really needed in order to define your rows (in a unique way). Therefore, you can work exactly as before (when B wasn't there) and just join B to your result:
df1 %>%
select(-B) %>%
group_by(A) %>%
mutate_each(funs(substr(.,1,1))) %>%
ungroup %>%
gather(variable, type, -A) %>%
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,type) %>%
summarise(value = sum(value)) %>% # count how many times the row belongs to each type
ungroup %>%
spread(type, value, fill=0) %>%
inner_join(df1, by="A") %>%
mutate(B=B) %>%
select(A,B, starts_with("type."), starts_with("type_"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mutate over every possible combination of columns - r

Related

extracting unique combinations from a long list of binary variables

recode using ifelse clause within groups

Error: expecting a string in R

Create a dataframe from a dataframe

use a string character-location identity to create a new variable

Categories

Resources