Combine two sequences of data - r

I have two sequences of data (with five variables in each sequence) that I want to combine accordingly into one using this rubric:
variable sequence 1 variable sequence 2 variable in combined sequence
0 0 1
0 1 2
1 0 3
1 1 4
Here are some example data:
set.seed(145)
mm <- matrix(0, 5, 10)
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("s1_1", "s1_2", "s1_3", "s1_4", "s1_5", "s2_1", "s2_2", "s2_3", "s2_4", "s2_5")
> df
s1_1 s1_2 s1_3 s1_4 s1_5 s2_1 s2_2 s2_3 s2_4 s2_5
1 1 0 0 0 0 0 1 1 0 0
2 1 1 1 0 1 1 0 0 0 0
3 1 1 0 0 0 1 1 0 1 1
4 0 0 1 0 1 1 0 1 0 1
5 0 1 0 0 1 0 0 1 1 0
Here s1_1 represents variable 1 in sequence 1, s2_1 represents variable 2 in sequence 2, and so on. For this example, s1_1=1 and s2_1=0, the variable 1 in combined sequence would be coded as 3. How do I do this in R?

Here's a way -
return_value <- function(x, y) {
dplyr::case_when(x == 0 & y == 0 ~ 1,
x == 0 & y == 1 ~ 2,
x == 1 & y == 0 ~ 3,
x == 1 & y == 1 ~ 4)
}
sapply(split.default(df, sub('.*_', '', names(df))), function(x)
return_value(x[[1]], x[[2]]))
# 1 2 3 4 5
#[1,] 3 2 2 1 1
#[2,] 4 3 3 1 3
#[3,] 4 4 1 2 2
#[4,] 2 1 4 1 4
#[5,] 1 3 2 2 3
split.default splits the data by sequence and using sapply we apply the function return_value to compare the two columns in each dataframe.

Related

Count number of pairs across elements in a list in R?

Similar questions have been asked about counting pairs, however none seem to be specifically useful for what I'm trying to do.
What I want is to count the number of pairs across multiple list elements and turn it into a matrix. For example, if I have a list like so:
myList <- list(
a = c(2,4,6),
b = c(1,2,3,4),
c = c(1,2,5,7),
d = c(1,2,4,5,8)
)
We can see that the pair 1:2 appears 3 times (once each in a, b, and c). The pair 1:3 appears only once in b. The pair 1:4 appears 2 times (once each in b and d)... etc.
I would like to count the number of times a pair appears and then turn it into a symmetrical matrix. For example, my desired output would look something like the matrix I created manually (where each element of the matrix is the total count for that pair of values):
> myMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 3 1 2 2 0 1 1
[2,] 3 0 1 3 2 1 1 1
[3,] 1 1 0 1 0 0 0 0
[4,] 2 3 1 0 0 0 0 1
[5,] 2 2 0 0 0 0 1 1
[6,] 0 1 0 0 0 0 0 0
[7,] 1 1 0 0 1 0 0 0
[8,] 1 1 0 1 1 0 0 0
Any suggestions are greatly appreciated
Inspired by #akrun's answer, I think you can use a crossproduct to get this very quickly and simply:
out <- tcrossprod(table(stack(myList)))
diag(out) <- 0
# values
#values 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
Original answer:
Use combn to get the combinations, as well as reversing each combination.
Then convert to a data.frame and table the results.
tab <- lapply(myList, \(x) combn(x, m=2, FUN=\(cm) rbind(cm, rev(cm)), simplify=FALSE))
tab <- data.frame(do.call(rbind, unlist(tab, rec=FALSE)))
table(tab)
# X2
#X1 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
We could loop over the list, get the pairwise combinations with combn, stack it to a two column dataset, convert the 'values' column to factor with levels specified as 1 to 8, get the frequency count (table), do a cross product (crossprod), convert the output back to logical, and then Reduce the list elements by adding elementwise and finally assign the diagonal elements to 0. (If needed set the names attributes of dimnames to NULL
out <- Reduce(`+`, lapply(myList, function(x)
crossprod(table(transform(stack(setNames(
combn(x,
2, simplify = FALSE), combn(x, 2, paste, collapse="_"))),
values = factor(values, levels = 1:8))[2:1]))> 0))
diag(out) <- 0
names(dimnames(out)) <- NULL
-output
> out
1 2 3 4 5 6 7 8
1 0 3 1 2 2 0 1 1
2 3 0 1 3 2 1 1 1
3 1 1 0 1 0 0 0 0
4 2 3 1 0 1 1 0 1
5 2 2 0 1 0 0 1 1
6 0 1 0 1 0 0 0 0
7 1 1 0 0 1 0 0 0
8 1 1 0 1 1 0 0 0
I thought of a solution based on #TarJae answer, is not a elegant one, but it was a fun challenge!
Libraries
library(tidyverse)
Code
map_df(myList,function(x) as_tibble(t(combn(x,2)))) %>%
count(V1,V2) %>%
{. -> temp_df} %>%
bind_rows(
temp_df %>%
rename(V2 = V1, V1 = V2)
) %>%
full_join(
expand_grid(V1 = 1:8,V2 = 1:8)
) %>%
replace_na(replace = list(n = 0)) %>%
arrange(V2,V1) %>%
pivot_wider(names_from = V1,values_from = n) %>%
as.matrix()
Output
V2 1 2 3 4 5 6 7 8
[1,] 1 0 3 1 2 2 0 1 1
[2,] 2 3 0 1 3 2 1 1 1
[3,] 3 1 1 0 1 0 0 0 0
[4,] 4 2 3 1 0 1 1 0 1
[5,] 5 2 2 0 1 0 0 1 1
[6,] 6 0 1 0 1 0 0 0 0
[7,] 7 1 1 0 0 1 0 0 0
[8,] 8 1 1 0 1 1 0 0 0
First identify the possible combination of each vector from the list to a tibble then I bind them to one tibble and count the combinations.
library(tidyverse)
a <- as_tibble(t(combn(myList[[1]],2)))
b <- as_tibble(t(combn(myList[[2]],2)))
c <- as_tibble(t(combn(myList[[3]],2)))
d <- as_tibble(t(combn(myList[[4]],2)))
bind_rows(a,b,c,d) %>%
count(V1, V2)
V1 V2 n
<dbl> <dbl> <int>
1 1 2 3
2 1 3 1
3 1 4 2
4 1 5 2
5 1 7 1
6 1 8 1
7 2 3 1
8 2 4 3
9 2 5 2
10 2 6 1
11 2 7 1
12 2 8 1
13 3 4 1
14 4 5 1
15 4 6 1
16 4 8 1
17 5 7 1
18 5 8 1

How to count number of columns that have a value by a grouping variable in R?

I have data like this:
repetition Ob1 Ob2 Ob3 Ob4
1 0 0 0 1
1 0 0 3 0
1 1 3 3 0
1 2 3 3 0
2 4 0 2 2
2 4 0 3 0
2 0 0 0 0
3 0 0 0 0
3 4 0 4 0
3 0 0 0 0
I want to count the number of columns per repetition that have a certain value e.g. 1.
So in this case repetition 1 should return a 2 because Ob1 and Ob4 have a value of 1. Everything else gets a 0 because there are no other repetitions with a 1.
you can get count using dplyr package below code:
df$count <- rowSums(df[,2:5] == df$repetition)
df %>% select(repetition, count) %>% group_by(repetition) %>% summarise(count = sum(count))
# A tibble: 3 x 2
repetition count
<int> <dbl>
1 1 2
2 2 2
3 3 0
You can use by like:
by(x[-1]==1, x$repetition, function(y) sum(colSums(y) > 0))
#INDICES: 1
#[1] 2
#------------------------------------------------------------
#INDICES: 2
#[1] 0
#------------------------------------------------------------
#INDICES: 3
#[1] 0
or to return a named vector
c(by(x[-1]==1, x$repetition, function(y) sum(colSums(y) > 0)))
#1 2 3
#2 0 0

How to remove duplicate values from different rows per unique identifier?

I'm just starting to use R. I have a dataset with in the first column unique identifiers (1958 patients) and in columns 2-35 0's en 1's.
For example:
Patient A: 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA NA
I want to change this to:
Patient A: 0 1 0 1 0 1
Thanks in advance.
We can use tapply and grouping our variable based on whether it changes value or not, i.e.
tapply(x[!is.na(x)], cumsum(c(TRUE, diff(x[!is.na(x)]) != 0)), FUN = unique)
#1 2 3 4 5 6
#0 1 0 1 0 1
Based on your example, it is not clear whether NA's can also occur in the middle, and how you would want to deal with that situation (e.g. make 1 NA 1 to 1 1 (option 1) and hence combine the two 1's, or whether NA would mark a boundary and you would keep both 1's (option 2).
That determines at which point to remove NA's in the code.
You could use S4Vectors run length encoding, which would allow you to have more than just 0 and 1.
library(S4Vectors)
## create example data
set.seed(1)
x <- sample(c(0,1), (1958*34), replace=TRUE, prob=c(.4, .6))
x[sample(length(x), 200)] <- NA
x <- matrix(x, nrow=1958, ncol=34)
df <- data.frame(patient.id = paste0("P", seq_len(1958)), x, stringsAsFactors = FALSE)
## define function to remove NA values
# option 1
fun.NA.boundary <- function(x) {
a <- runValue(Rle(x))
a[!is.na(a)]
}
# option 2
fun.NA.remove <- function(x) runValue(Rle(x[!is.na(x)]))
## calculate results
# option 1
reslist <- apply(x[,-1], 1, function(y) fun.NA.boundary(y))
# option 2
reslist <- apply(x[,-1], 1, function(y) fun.NA.remove(y))
names(reslist) <- df$patient.id
head(reslist)
#> $P1
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P2
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P3
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P4
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P5
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P6
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

Change the value of variables that occur 80% of the times in each row, R

In my data, I have 74 observations (rows) and 128 variables (columns), where each variable takes either 0 or 1 as value. In R, I am trying to write a code, where I can find in each row, the variables that has 1 as value and calculate 80% of the times 1 appears in each row. Pick those variables that has 80% of the times value as 1 and change the value from 1 to 0. I could write code, where I can calculate the 80% of times, 1 appears in each row, but I am not able to pick these variables in each row and change their value from 1 to 0.
data# data frame with 74 observations and 128 variables
row1 <- data[1,]
count1 <- length(which(data[1,] == 1)) # #number of 1 in row 1
print(count1)
perform <- 80/100*count1# 80% of count1
Below code works for one row:
test <- t(apply(data[1,], 1, function(x,n){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
If specify all the rows, code is not working:
test <- t(apply(data[1:74,], 1, function(x,n){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
Example of desired output:
original data frame
df
a b c d e f
1 1 1 1 1 1 1
2 1 0 1 1 0 1
3 1 1 1 0 1 1
When the code is applied to all the three rows in df, output should like this in all the three rows (80% of 1 replaced as 0):
a b c d e f
1 1 0 0 0 1 0
2 0 0 1 0 0 0
3 0 1 1 0 0 0
Thanks
Any suggestions
Thank you
Priya
A solution is to use apply row-wise and get indices where value is 1 using which. Afterwards, pick 80% of those indices (with value as 1) using sample and replace those to '0`.
t(apply(df, 1, function(x){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
# a b c d e f
# [1,] 0 0 0 1 0 0
# [2,] 0 0 0 1 0 0
# [3,] 0 0 1 0 0 1
# [4,] 0 1 0 0 0 0
# [5,] 0 1 0 0 0 0
# [6,] 1 0 0 0 0 0
# [7,] 0 0 0 0 0 1
# [8,] 0 0 1 0 0 0
# [9,] 0 0 1 0 1 0
# [10,] 0 0 0 0 0 1
Sample Data:
set.seed(1)
df <- data.frame(a = sample(c(0,1,1,1), 10, replace = TRUE),
b = sample(c(0,1,1,1), 10, replace = TRUE),
c = sample(c(0,1,1,1), 10, replace = TRUE),
d = sample(c(0,1,1,1), 10, replace = TRUE),
e = sample(c(0,1,1,1), 10, replace = TRUE),
f = sample(c(0,1,1,1), 10, replace = TRUE))
df
# a b c d e f
# 1 1 0 1 1 1 1
# 2 1 0 0 1 1 1
# 3 1 1 1 1 1 1
# 4 1 1 0 0 1 0
# 5 0 1 1 1 1 0
# 6 1 1 1 1 1 0
# 7 1 1 0 1 0 1
# 8 1 1 1 0 1 1
# 9 1 1 1 1 1 1
# 10 0 1 1 1 1 1
# Answer on OP's data
t(apply(df1, 1, function(x){
onesInX <- which(x==1)
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
# a b c d e f
# 1 1 1 0 0 0 0 <- .8*6 = 4.8 => 4 has been converted to 0
# 2 0 0 0 1 0 0 <- .8*5 = 4.0 => 4 has been converted to 0
# 3 0 1 0 0 0 0 <- .8*4 = 3.2 => 3 has been converted to 0
# Data from OP
df1 <- read.table(text="
a b c d e f
1 1 1 1 1 1 1
2 1 0 1 1 0 1
3 1 1 1 0 1 1",
header = TRUE)
df1
# a b c d e f
# 1 1 1 1 1 1 1 <- No of 1 = 6
# 2 1 0 1 1 0 1 <- No of 1 = 4
# 3 1 1 1 0 1 1 <- No of 1 = 5

rearrange a variable based on another variable

Data:
set.seed(25)
df<- data.frame(rank=round(rnorm(10)),category=round(runif(10)),v=round(rnorm(10)))
rank category v
1 0 0 1
2 -1 0 -1
3 -1 0 1
4 0 0 2
5 -2 0 -1
6 0 0 1
7 2 0 0
8 1 1 0
9 0 1 2
10 0 0 -2
I want the variable "v" follows the same ranking as the variable "rank1", within each category. My question is how could I create the desired variable "v1"?
Desired output:
df <- transform(df, rank1 = ave(v, category, FUN = function(x) rank(x, ties.method = "random")))
rank category v rank1 v1
1 0 0 1 6 -1
2 -1 0 -1 3 1
3 -1 0 1 7 -1
4 0 0 2 8 -2
5 -2 0 -1 2 1
6 0 0 1 5 0
7 2 0 0 4 1
8 1 1 0 1 2
9 0 1 2 2 0
10 0 0 -2 1 2
So I get the desired result:
set.seed(25)
df <- data.frame(rank=round(rnorm(10)), category=round(runif(10)), v=round(rnorm(10)))
df <- transform(df, rank1 = ave(v, category, FUN = function(x) rank(x, ties.method = "random")))
df$v1 <- NA
for (i in unique(df$category)) {
df$v1[df$category==i] <- sort(df$v[df$category==i], decrea=TRUE)[df$rank1[df$category==i]]
}
The idea is going through the categories and apply the order given by rank1 to the sorted part of the vector v.

Resources