Faster alternative to nested for loops in R

Faster alternative to nested for loops in R - r

Here is the scenario: I have a sample in which subjects are placed into any of three groups. Next, subjects from each group are grouped together, resulting in several "triplets" consisting of a subject from each group. I would like to count the number of times a subject from a given group (1, 2, or 3) is grouped with a subject i of a different original group.
Here is a simple code example:
data <- cbind(c(1:9), c(rep("Group 1", 3), rep("Group 2", 3), rep("Group 3", 3)))
data <- data.frame(data)
names(data) <- c("ID", "Group")
groups.of.3 <- data.frame(rbind(c(1,4,7),c(2,4,7),c(2,5,7),c(3,6,8),c(3,6,9)))
N <- nrow(data)
n1 <- nrow(data[data$Group == "Group 1", ])
n2 <- nrow(data[data$Group == "Group 2", ])
n3 <- nrow(data[data$Group == "Group 3", ])
# Check the number of times a subject from a group is grouped with a subject i
# from another group
M1 <- matrix(0, nrow = N, ncol = n1)
M2 <- matrix(0, nrow = N, ncol = n2)
M3 <- matrix(0, nrow = N, ncol = n3)
for (i in 1:N){
if (data$Group[i] != "Group 1"){
for (j in 1:n1){
M1[i,j] <- nrow(groups.of.3[groups.of.3[,1] == j &
(groups.of.3[,2] == i |
groups.of.3[,3] == i), ])
}
}
if (data$Group[i] != "Group 2"){
for (j in 1:n2){
M2[i,j] <- nrow(groups.of.3[groups.of.3[,2] == (n1 + j) &
(groups.of.3[,1] == i |
groups.of.3[,3] == i), ])
}
}
if (data$Group[i] != "Group 3"){
for (j in 1:n3){
M3[i,j] <- nrow(groups.of.3[groups.of.3[,3] == (n1 + n2 + j) &
(groups.of.3[,1] == i |
groups.of.3[,2] == i), ])
}
}
}
So I have 9 subjects, with three from each group. And then subjects from each group are subsequently grouped together (allowing for repetition of placement). This takes a lot longer with more subjects, and I am wondering if there is a faster alternative that avoids using for loops.
For instance, the matrix M1 consists of how many times subjects in Group 1 were subsequently grouped with other subjects from any other group:
M1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 1 1 0
[5,] 0 1 0
[6,] 0 0 2
[7,] 1 2 0
[8,] 0 0 1
[9,] 0 0 1
So the 3 columns represent the three subjects from Group 1, and the rows represent all subjects - the entries are how many times each subject from Group 1 is grouped with any of the other subjects (e.g., according to groups.of.3, subject 3 appears in a group with subject 6 twice, and subject 1 with subject 7 once).
Thanks for any help!

Something like this?
library(tidyr)
library(dplyr)
data <- data %>%
mutate(ID = as.numeric(levels(ID))[ID])
tmp <- groups.of.3 %>%
add_rownames() %>%
gather("X", "Person", -rowname) %>%
inner_join(data, by = c("Person" = "ID"))
tmp %>%
inner_join(tmp, by = c("rowname")) %>%
filter(Group.x != Group.y) %>%
group_by(Person.x, Group.x, Group.y) %>%
summarise(N = n()) %>%
spread(key = Group.y, value = N, fill = 0)
Person.x Group.x Group 1 Group 2 Group 3
(dbl) (fctr) (dbl) (dbl) (dbl)
1 1 Group 1 0 1 1
2 2 Group 1 0 2 2
3 3 Group 1 0 2 2
4 4 Group 2 2 0 2
5 5 Group 2 1 0 1
6 6 Group 2 2 0 2
7 7 Group 3 3 3 0
8 8 Group 3 1 1 0
9 9 Group 3 1 1 0

For loops aren't inherently slow:
# coerce the fields in groups.of.3 to factor
for(i in 1:3)
groups.of.3[,i] <- as.factor(groups.of.3[,i],levels =data$ID)
M <- matrix(0, N, N)
out <- NULL
for(i in 1:(3-1))
for(j in (i+1):3)
M <- M + table(groups.of.3[,i],groups.of.3[,j])
M1 <- M[,as.integer(data$Group)==1]
M2 <- M[,as.integer(data$Group)==2]
M3 <- M[,as.integer(data$Group)==3]

I'll answer my own question, using a very slight modification of Thierry's answer:
library(tidyr)
library(dplyr)
data <- data %>%
mutate(ID = as.numeric(levels(ID))[ID])
tmp <- groups.of.3 %>%
add_rownames() %>%
gather("X", "Person", -rowname) %>%
inner_join(data, by = c("Person" = "ID"))
tmp %>%
inner_join(tmp, by = c("rowname")) %>%
filter(Group.x != Group.y) %>%
group_by(Person.x, Group.x, Person.y) %>%
summarise(N = n()) %>%
spread(key = Person.y, value = N, fill = 0)
This gives the following output, which includes M1, M2, and M3 from the previous for loop, adjoined together.
Source: local data frame [9 x 11]
Person.x Group.x 1 2 3 4 5 6 7 8 9
(dbl) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 1 Group 1 0 0 0 1 0 0 1 0 0
2 2 Group 1 0 0 0 1 1 0 2 0 0
3 3 Group 1 0 0 0 0 0 2 0 1 1
4 4 Group 2 1 1 0 0 0 0 2 0 0
5 5 Group 2 0 1 0 0 0 0 1 0 0
6 6 Group 2 0 0 2 0 0 0 0 1 1
7 7 Group 3 1 2 0 2 1 0 0 0 0
8 8 Group 3 0 0 1 0 0 1 0 0 0
9 9 Group 3 0 0 1 0 0 1 0 0 0

Related

extracting unique combinations from a long list of binary variables

I have a dataframe containing a long list of binary variables. Each row represents a participant, and columns represent whether a participant made a certain choice (1) or not (0). For the sakes of simplicity, let's say there's only four binary variables and 6 participants.
df <- data.frame(a = c(0,1,0,1,0,1),
b = c(1,1,1,1,0,1),
c = c(0,0,0,1,1,1),
d = c(1,1,0,0,0,0))
>df
# a b c d
# 1 0 1 0 1
# 2 1 1 0 1
# 3 0 1 0 0
# 4 1 1 1 0
# 5 0 0 1 0
# 6 1 1 1 0
In the dataframe, I want to create a list of columns that reflect each unique combination of variables in df (i.e., abc, abd, bcd, cda). Then, for each row, I want to add value "1" if the row contains the particular combination corresponding to the column. So, if the participant scored 1 on "a", "b", and "c", and 0 on "d" he would have a score 1 in the newly created column "abc", but 0 in the other columns. Ideally, it would look something like this.
>df_updated
# a b c d abc abd bcd cda
# 1 0 1 0 1 0 0 0 0
# 2 1 1 0 1 0 1 0 0
# 3 0 1 0 0 0 0 0 0
# 4 1 1 1 0 1 0 0 0
# 5 0 0 1 0 0 0 0 0
# 6 1 1 1 0 0 0 0 0
The ultimate goal is to have an idea of the frequency of each of the combinations, so I can order them from the most frequently chosen to the least frequently chosen. I've been thinking about this issue for days now, but couldn't find an appropriate answer. I would very much appreciate the help.

Something like this?
funCombn <- function(data){
f <- function(x, data){
data <- data[x]
list(
name = paste(x, collapse = ""),
vec = apply(data, 1, function(x) +all(as.logical(x)))
)
}
res <- combn(names(df), 3, f, simplify = FALSE, data = df)
out <- do.call(cbind.data.frame, lapply(res, '[[', 'vec'))
names(out) <- sapply(res, '[[', 'name')
cbind(data, out)
}
funCombn(df)
# a b c d abc abd acd bcd
#1 0 1 0 1 0 0 0 0
#2 1 1 0 1 0 1 0 0
#3 0 1 0 0 0 0 0 0
#4 1 1 1 0 1 0 0 0
#5 0 0 1 0 0 0 0 0
#6 1 1 1 0 1 0 0 0

Base R option using combn :
n <- 3
cbind(df, do.call(cbind, combn(names(df), n, function(x) {
setNames(data.frame(as.integer(rowSums(df[x] == 1) == n)),
paste0(x, collapse = ''))
}, simplify = FALSE))) -> result
result
# a b c d abc abd acd bcd
#1 0 1 0 1 0 0 0 0
#2 1 1 0 1 0 1 0 0
#3 0 1 0 0 0 0 0 0
#4 1 1 1 0 1 0 0 0
#5 0 0 1 0 0 0 0 0
#6 1 1 1 0 1 0 0 0
Using combn create all combinations of column names taking n columns at a time. For each of those combinations assign 1 to those rows where all the 3 combinations are 1 or 0 otherwise.

If you are just looking for a frequency of the combinations (and they don't need to be back in the original data), then you could use something like this:
df <- data.frame(a = c(0,1,0,1,0,1),
b = c(1,1,1,1,0,1),
c = c(0,0,0,1,1,1),
d = c(1,1,0,0,0,0))
n <- names(df)
out <- sapply(n, function(x)ifelse(df[[x]] == 1, x, ""))
combs <- apply(out, 1, paste, collapse="")
sort(table(combs))
# combs
# abd b bd c abc
# 1 1 1 1 2

Ok, so let's use your data, including one row without any 1's:
df <- data.frame(
a = c(0,1,0,1,0,1,0),
b = c(1,1,1,1,0,1,0),
c = c(0,0,0,1,1,1,0),
d = c(1,1,0,0,0,0,0)
)
Now I want to paste all column names together if they have a 1, and then make that a wide table (so that all have a column for a combination). Of course, I fill all resulting NAs with 0's.
df2 <- df %>%
dplyr::mutate(
combination = paste0(
ifelse(a == 1, "a", ""), # There is possibly a way to automate this as well using across()
ifelse(b == 1, "b", ""),
ifelse(c == 1, "c", ""),
ifelse(d == 1, "d", "")
),
combination = ifelse(
combination == "",
"nothing",
paste0("comb_", combination)
),
value = ifelse(
is.na(combination),
0,
1
),
i = dplyr::row_number()
) %>%
tidyr::pivot_wider(
names_from = combination,
values_from = value,
names_repair = "unique"
) %>%
replace(., is.na(.), 0) %>%
dplyr::select(-i)
Since you want to order the original df by frequency, you can create a summary of all combinations (excluding those without anything filled in). Then you just make it a long table and pull the column for every combination (arranged by frequency) from the table.
comb_in_order <- df2 %>%
dplyr::select(
-tidyselect::any_of(
c(
names(df),
"nothing" # I think you want these last.
)
)
) %>%
dplyr::summarise(
dplyr::across(
.cols = tidyselect::everything(),
.fns = sum
)
) %>%
tidyr::pivot_longer(
cols = tidyselect::everything(),
names_to = "combination",
values_to = "frequency"
) %>%
dplyr::arrange(
dplyr::desc(frequency)
) %>%
dplyr::pull(combination)
The only thing to do then is to reconstruct the original df by these after arranging by the columns.
df2 %>%
dplyr::arrange(
across(
tidyselect::any_of(comb_in_order),
desc
)
) %>%
dplyr::select(
tidyselect::any_of(names(df))
)
This should work for all possible combinations.

Calculate number of time streak of categories change in a row in R

I have the following data frame in R:
Row number A B C D E F G H I J
1 1 1 0 0 1 0 0 1 1
2 1 0 0 0 1 0 0 1
3 1 0 0 0 1 0 0 1 1
I am trying to calculate the number of times the number changes between 1 and 0 excluding the Nulls
The result I am expecting is this
Row Number No of changes
---------- --------------
1 4
2 4
3 4
An explanation for row 1
In row 1, A has a null so we exclude that.
B and C have 1 which is our first set of values.
D and E have 0 which is our second set of values. Now Change = 1
F has our third set of values which is 1. Now Change = 1+1
G and H have 0 which is our third set of values. Now Change = 1+1+1
I and J have 1 which is our fourth set of values. Now Change = 1+1+1+1 =4

Here's a tidyverse approach.
I gather into longer format (from tidyr::pivot_longer), then add a helper column noting when we have a change from 0 to 1 or from 1 to 0, and then sum those by row.
library(tidyverse)
df %>%
# before tidyr 1.0, this would be gather(col, value, -1)
pivot_longer(-1, "col") %>%
group_by(Row.number) %>%
mutate(chg = value == 1 & lag(value) == 0 |
value == 0 & lag(value) == 1) %>%
summarize(no_chgs = sum(chg, na.rm = T))
# A tibble: 3 x 2
Row.number no_chgs
<int> <int>
1 1 4
2 2 4
3 3 4
Sample data:
df <- read.table(
header = T,
stringsAsFactors = F,
text = "'Row number' A B C D E F G H I J
1 NA 1 1 0 0 1 0 0 1 1
2 NA NA 1 0 0 0 1 0 0 1
3 NA 1 0 0 0 1 0 0 1 1")

Here's a data.table solution:
library(data.table)
dt <- as.data.table(df)
dt[,
no_change := max(rleid(na.omit(t(.SD)))) - 1,
by = RowNumber
]
dt
Alternatively, here's a base version:
apply(df[, -1],
1,
function(x) {
complete_case = complete.cases(x)
if (sum(complete_case) > 0) {
return(length(rle(x[complete_case])$lengths) - 1)
} else {
return (0)
}
}
)

dummy variable columns based on strings from other columns [duplicate]

This question already has answers here:
Dummy variables from a string variable
(7 answers)
Closed 3 years ago.
I have a database with patient id number and the treatment they recived. I would like to have a dummy column for every different INDIVIDUAL treatment (ie, as in did the patient recieve treatment A,B,C,D).
This is way simplified because I have over 20 treatments and thousands of patients, and I can't figure out a simple way to do so.
example <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A", "C"))
I would like to have something like this:
desired_result <- data.frame(id_number = c(0, 1, 2, 3, 4),
treatment = c("A", "A+B+C+D", "C+B", "B+A","C"),
A=c(1,1,0,1,0),
B=c(0,1,1,1,0),
C=c(0,1,1,0,1),
D=c(0,1,0,0,0))

A base version:
example["A"] <- as.numeric(grepl("A", example[,"treatment"]))
example["B"] <- as.numeric(grepl("B", example[,"treatment"]))
example["C"] <- as.numeric(grepl("C", example[,"treatment"]))
example["D"] <- as.numeric(grepl("D", example[,"treatment"]))
example
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
The grepl function tests the presence of each pattern in each row, and as.numeric changes the logical TRUE/FALSE to 1/0

One tidyverse possibility could be:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
spread(treatment2, treatment2) %>%
mutate_at(vars(-id_number, -treatment), ~ (!is.na(.)) * 1)
id_number treatment A B C D
1 0 A 1 0 0 0
2 1 A+B+C+D 1 1 1 1
3 2 C+B 0 1 1 0
4 3 B+A 1 1 0 0
5 4 C 0 0 1 0
Or:
example %>%
mutate(treatment2 = strsplit(treatment, "+", fixed = TRUE)) %>%
unnest() %>%
mutate(val = 1) %>%
spread(treatment2, val, fill = 0)

How to repeat a code in R fulfilling a condition across repeats

I need to repeat a code 24 times (for 24 different participants), making sure that overall, for each Scene2 in each Trial and Route, I have the same number of 1 and 0 across the columns Random of each participant (i.e., Part.1, Part.2, Part.3, etc.) when the Target is equal to 0.
Here is the code I am using:
Scene2 = rep(c(1:10), times=9)
myDF2 <- data.frame(Scene2)
myDF2$Target <- rep(0,10, each=9)
myDF2$Target[myDF2$Scene2==7] <- 1
myDF2$Trial <- rep(c(1:9),each=10)
myDF2$Route <- rep(LETTERS[1:6], each=10, length=nrow(myDF2))
library(plyr)
myDF3 <- myDF2 %>% group_by(Trial, Route) %>% mutate(Random = ifelse(myDF2$Target==0,sample(c(rep(0,5),rep(1,5))),1)) %>% as.data.frame()
I need to obtain something like this:
Scene2 Target Trial Route Part.1 Part.2 Part.3 Part.4 … Part.24 Tot.1 Tot.0
1 0 1 A 0 1 1 0 0 12 12
2 0 1 A 1 0 1 0 0 12 12
3 0 1 A 1 0 0 0 0 12 12
4 0 1 A 0 1 0 1 0 12 12
5 0 1 A 1 0 1 1 0 12 12
6 0 1 A 1 0 0 0 1 12 12
7 1 1 A 1 1 1 1 1 24 0
8 0 1 A 0 0 1 1 1 12 12
9 0 1 A 0 1 1 1 1 12 12
10 0 1 A 0 1 0 0 1 12 12
How to achieve this? Any suggestion would be very much appreciated.

Since there's some conditional logic here that needs to meet particular specifications, I think this is easier to do with a function.
Scene2 = rep(c(1:10), times=9)
myDF2 <- data.frame(Scene2)
myDF2$Target <- rep(0,10, each=9)
myDF2$Target[myDF2$Scene2==7] <- 1
myDF2$Trial <- rep(c(1:9),each=10)
myDF2$Route <- rep(LETTERS[1:6], each=10, length=nrow(myDF2))
library(tidyverse)
fill_random_columns <- function(df, reps) {
# Start a loop with a counter
for (i in 1:reps) {
# Create a vector of 1s and 0s for filling rows
bag <- c(rep(0, 12), rep(1, 12))
# Build up conditional data frame of 1s and 0s
row_vector <- as.data.frame(t(sapply(df$Target, function(v) {
if (v == 1) return(rep(1, reps))
else (return(sample(bag, reps)))
})))
}
# Create column names
colnames <- lapply(1:reps, function(i) {paste0("Part.", i)})
# Name columns and sum up rows
row_vector <- row_vector %>%
`colnames<-`(colnames) %>%
mutate(Total = rowSums(.))
# Attach to original data frame
df <- bind_cols(df, row_vector)
return(df)
}
myDF3 <- myDF2 %>%
group_by(Trial, Route) %>%
fill_random_columns(., 24)

use a string character-location identity to create a new variable

So I have been able to achieve my desired output, but I am sure that one can use a string to achieve a much more efficient code.
Let play with this data
set.seed(123)
A <- 1:100
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A, type.a, type.b, type.c, type.d)
Now we want to create a new variable for df1 that will identity if a type(a:d) begun with number 1. So I have used this code:
df1$Type_1 <- with(df1, ifelse((type.a < 2000 & type.a > 999)|(type.b < 2000 & type.c > 999)|
(type.c < 2000 & type.c > 999)|(type.d < 2000 & type.d > 999), 1,0))
Or similiarly, this also:
df1$type_1 <- with(df1, ifelse(type.a < 2000 & type.a > 999, 1,
ifelse(type.b < 2000 & type.c > 999, 1,
ifelse(type.c < 2000 & type.c > 999, 1,
ifelse(type.d < 2000 & type.d > 999, 1,0)))))
Now my question form two parts
How can you use a string which will look at only the first digit of type(a:d) to test if it is equal to our constraint. (in this instance equal to 1)
Secondly, I have more than four columns of data. Thus I dont think it is efficient I specify column names each time. Can the use of [,x:y] be used?
The code then be used to create 9 new columns of data (ie. type_1 & type_2 ... type_9), as the first digit of our type(a:d) has a range of 1:9

We can use substr to extract the first character of a string. As there are four columns that start with type, we can use grep to get the numeric index of columns, we loop the columns with lapply, check whether the 1st character is equal to 1. If we want to know whether there is at least one value that meets the condition, we can wrap it with any. Using lapply returns a list output with a length of 1 for each list element. As we need a binary (0/1) instead of logical (FALSE/TRUE), we can wrap with + to coerce the logical to binary representation.
indx <- grep('^type', colnames(df1))
lapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)))
If we need a vector output
vapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)), 1L)

Great and elegant answer by #akrun. I was interested in the 2nd part of your question. Specifically about how you're going to use the first part to create the new 9 columns you mention. I don't know if I'm missing something, but instead of checking each time if the first element matches 1,2,3, etc. you can just simply capture that first element. Something like this:
library(dplyr)
library(tidyr)
set.seed(123)
A <- 1:100
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A, type.a, type.b, type.c, type.d)
df1 %>%
group_by(A) %>%
mutate_each(funs(substr(.,1,1))) %>% # keep first digit
ungroup %>%
gather(variable, type, -A) %>% # create combinations of rows and digits
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,type) %>%
summarise(value = sum(value)) %>% # count how many times the row belongs to each type
ungroup %>%
spread(type, value, fill=0) %>% # create the new columns
inner_join(df1, by="A") %>% # join back initial info
select(A, starts_with("type."), starts_with("type_")) # order columns
# A type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9
# 1 1 4196.838 3987.671 7473.662 4118.106 0 0 1 2 0 0 1 0 0
# 2 2 4670.156 5366.059 6476.465 4071.935 0 0 0 2 1 1 0 0 0
# 3 3 7233.629 4648.464 4701.712 3842.782 0 0 1 2 0 0 1 0 0
# 4 4 5101.039 4504.752 5611.093 3702.251 0 0 1 1 2 0 0 0 0
# 5 5 5185.269 3643.944 4533.868 4460.982 0 0 1 2 1 0 0 0 0
# 6 6 7457.688 4935.835 4464.222 5408.344 0 0 0 2 1 0 1 0 0
# 7 7 5660.493 3881.511 4112.822 2516.478 0 1 1 1 1 0 0 0 0
# 8 8 3187.167 2623.183 4331.056 5261.372 0 1 1 1 1 0 0 0 0
# 9 9 4015.740 4458.177 6857.271 6524.820 0 0 0 2 0 2 0 0 0
# 10 10 4361.366 6309.570 4939.218 7512.329 0 0 0 2 0 1 1 0 0
# .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
Example when we have column A and B in the beginning:
library(dplyr)
library(tidyr)
set.seed(123)
A <- 1:100
B <- 101:200
type.a <- rnorm(100, mean=5000, sd=1433)
type.b <- rnorm(100, mean=5000, sd=1425)
type.c <- rnorm(100, mean=5000, sd=1125)
type.d <- rnorm(100, mean=5000, sd=1233)
df1 <- data.frame(A,B, type.a, type.b, type.c, type.d)
# work by grouping on A and B
df1 %>%
group_by(A,B) %>%
mutate_each(funs(substr(.,1,1))) %>%
ungroup %>%
gather(variable, type, -c(A,B)) %>%
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,B,type) %>%
summarise(value = sum(value)) %>%
ungroup %>%
spread(type, value, fill=0) %>%
inner_join(df1, by=c("A","B")) %>%
select(A,B, starts_with("type."), starts_with("type_"))
# A B type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9
# 1 1 101 4196.838 3987.671 7473.662 4118.106 0 0 1 2 0 0 1 0 0
# 2 2 102 4670.156 5366.059 6476.465 4071.935 0 0 0 2 1 1 0 0 0
# 3 3 103 7233.629 4648.464 4701.712 3842.782 0 0 1 2 0 0 1 0 0
# 4 4 104 5101.039 4504.752 5611.093 3702.251 0 0 1 1 2 0 0 0 0
# 5 5 105 5185.269 3643.944 4533.868 4460.982 0 0 1 2 1 0 0 0 0
# 6 6 106 7457.688 4935.835 4464.222 5408.344 0 0 0 2 1 0 1 0 0
# 7 7 107 5660.493 3881.511 4112.822 2516.478 0 1 1 1 1 0 0 0 0
# 8 8 108 3187.167 2623.183 4331.056 5261.372 0 1 1 1 1 0 0 0 0
# 9 9 109 4015.740 4458.177 6857.271 6524.820 0 0 0 2 0 2 0 0 0
# 10 10 110 4361.366 6309.570 4939.218 7512.329 0 0 0 2 0 1 1 0 0
# .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
However, in this case you should notice that you have one A value for each line. So, B isn't really needed in order to define your rows (in a unique way). Therefore, you can work exactly as before (when B wasn't there) and just join B to your result:
df1 %>%
select(-B) %>%
group_by(A) %>%
mutate_each(funs(substr(.,1,1))) %>%
ungroup %>%
gather(variable, type, -A) %>%
select(-variable) %>%
mutate(type = paste0("type_",type),
value = 1) %>%
group_by(A,type) %>%
summarise(value = sum(value)) %>% # count how many times the row belongs to each type
ungroup %>%
spread(type, value, fill=0) %>%
inner_join(df1, by="A") %>%
mutate(B=B) %>%
select(A,B, starts_with("type."), starts_with("type_"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Faster alternative to nested for loops in R - r

Related

extracting unique combinations from a long list of binary variables

Calculate number of time streak of categories change in a row in R

dummy variable columns based on strings from other columns [duplicate]

How to repeat a code in R fulfilling a condition across repeats

use a string character-location identity to create a new variable

Categories

Resources