make data frame with binaries to sum to 1 - r

I have a data frame with only zeros and ones, e.g.
df <- data.frame(v1 = rbinom(100, 1, 0.5),
v2 = rbinom(100, 1, 0.2),
v3 = rbinom(100, 1, 0.4))
Now I want to modify this data set so that each row sums to 1.
So this
1 0 0
1 1 0
0 0 1
1 1 1
0 0 0
should become this:
1 0 0
0.5 0.5 0
0 0 1
0.33 0.33 0.33
0 0 0
edit: rows with all zeros should be left as is

As already pointed out by #lmo the data.frame (or matrix) can be modified with
df <- df / rowSums(df)
In the case of rows containing only zeros this will lead to rows containing only NaN. Since these rows should be kept as they were, the easiest way is probably to correct for this afterwards with
df[is.na(df)] <- 0

Here is a quick method:
# create matrix
temp <- matrix(c(1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1), ncol=3, byrow=T)
temp / rowSums(temp)
This exploits the fact that matrices are ordered column-wise, so that the element by element division of rowsSums and the recycling are aligned.
In the case that all elements in a row are zero, and you don't want an Inf, another method from #RHertel s is the following:
# save rowSum:
mySums <- rowSums(temp)
temp / ifelse(mySums != 0, mySums, 1)

Related

Merging 2 Vectors to 1 Vector that satisfies certain criteria

I have two vectors that can be written as follows:
aa <- c(0, 0, 0, 0, 1, 0, 0, 0)
bb <- c(0, 2, 0, 0, 3, 1, 1, 1)
I want to merge these vectors such that the rest of vector bb takes the value zero when vector aa interfere with the value 1. In this example the result should look like:
cc <- c(0, 2, 0, 0, 3, 0, 0, 0)
What is the fastest and most efficient way to do this in R?
We may do
library(dplyr)
ifelse(lag(cummax(aa), default = 0) == 0, bb, aa)
[1] 0 2 0 0 3 0 0 0
Or another way is
bb * !c(0, head(cummax(aa), -1))
[1] 0 2 0 0 3 0 0 0
Or another option
ind <- (which.max(aa) + 1):length(aa)
bb[ind] <- aa[ind]
> bb
[1] 0 2 0 0 3 0 0 0
This is maybe too much for this task. At least for me it is easier to follow:
library(dplyr)
cc <- tibble(aa,bb) %>%
group_by(id_group=lag(cumsum(aa==1), default = 0)) %>%
mutate(cc = ifelse(id_group == 0, coalesce(bb,aa), coalesce(aa,bb))) %>%
pull(cc)
output:
[1] 0 2 0 0 3 0 0 0

Transform categorical attribute vector into similarity matrix

I need to transfrom a categorical attribute vector into a "same attribute matrix" using R.
For example I have a vector which reports gender of N people (male = 1, female = 0). I need to convert this vector into a NxN matrix named A (with people names on rows and columns), where each cell Aij has the value of 1 if two persons (i and j) have the same gender and 0 otherwise.
Here is an example with 3 persons, first male, second female, third male, which produce this vector:
c(1, 0, 1)
I want to transform it into this matrix:
A = matrix( c(1, 0, 1, 0, 1, 0, 1, 0, 1), nrow=3, ncol=3, byrow = TRUE)
Like lmo said in acomment it's impossible to know the structure of your dataset so what follows is just an example for you to see how it could be done.
First, make up some data.
set.seed(3488) # make the results reproducible
x <- LETTERS[1:5]
y <- sample(0:1, 5, TRUE)
df <- data.frame(x, y)
Now tabulate it according to your needs
A <- outer(df$y, df$y, function(a, b) as.integer(a == b))
dimnames(A) <- list(df$x, df$x)
A
# A B C D E
#A 1 1 1 0 0
#B 1 1 1 0 0
#C 1 1 1 0 0
#D 0 0 0 1 1
#E 0 0 0 1 1

Take value from list and use it as index

I have a matrix (A) like this (the names of row and column are identification codes (ID):
1 3 10 38 46
1 0 0.4 0 0 0
3 0 0 0 0 0
10 0 0 0.9 0.8 0
38 0 0 0 0 0
46 0 0.1 0 0 0
And another matrix (B) like this:
a b c
1 2.676651e-04 4.404911e-06 9.604227e-06
3 6.073389e-10 3.273222e-05 3.360321e-04
10 4.156392e-08 1.269607e-06 7.509217e-06
38 4.200699e-08 3.227431e-02 8.286920e-11
46 9.352353e-05 3.318948e-20 8.694981e-06
I would like to take the index of the elements of the A matrix >0, therefore I used this command:
temp <- apply(A,1, FUN=function(x) which(x>0))
it returned a list with the correct index of the elements >0.
After that I would like to multiply the element of the matrix B using the index. In particular, I would like to do something like these for each row:
1: 6.073389e-10*3.273222e-05*3.360321e-04
I have used the information of the matrix A (in the second column of the first row I have a value >0) as index to take the element in the matrix B for the first row.
For the second row, I obtained 0 because there aren't element in A[2,]>0
For the third row, I would like to obtain something like the first row, but I should sum the two products
10: 4.156392e-08*1.269607e-06*7.509217e-06 +4.200699e-08*3.227431e-02*8.286920e-11
I have tried to unlist the list but in this way I obtained a vector losing the corresponding between the ID
A <-
matrix(
c(0, 0.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.9, 0.8, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0),
nrow = 5,
ncol = 5,
byrow = T
)
B <-
matrix(
c(
2.676651e-04, 4.404911e-06, 9.604227e-06,
6.073389e-10, 3.273222e-05, 3.360321e-04,
4.156392e-08, 1.269607e-06, 7.509217e-06,
4.200699e-08, 3.227431e-02, 8.286920e-11,
9.352353e-05, 3.318948e-20, 8.694981e-06
),
nrow = 5,
ncol = 3,
byrow = T
)
idx<-which(A>0, arr.ind = T)
result <- 0;
for (i in 1:nrow(idx)) {
cat(A[idx[i,1],idx[i,2]], sep="\n")
cat(B[idx[i,2], ], sep="\n")
result = result + sum(A[idx[i,1],idx[i,2]] * B[idx[i,2],])
}
cat("result=")
cat(result)

Merge binary columns and keep the one

I have two binary columns:
col1 col2
0 1
0 0
1 0
1 1
I would like to merge this columns and if value 1 exist into one of in both columns I would like to have the 1 value. Example of output
merged_col
1
0
1
1
The general merged I tried is this:
merge(df$col1, df$col2, all = TRUE)
Any idea how can I handle the values?
You can just treat them as logical values and use or...
df$col3 <- as.integer(df$col1|df$col2)
The code below should do what you need:
df <- data.frame(col1 = c(0, 0, 1, 1), col2 = c(1, 0, 0, 1))
df$merge_col <- ifelse(df$col1 == 1 | df$col2 == 1, 1, 0)

Recode a value in a vector based on surrounding values

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.
For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.
Here is data in the vector dummy_code in the data.frame df:
original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))
Here is how I'm trying to have the values be recoded:
desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)
I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.
Is there a way to do this in an efficient way programmatically?
An rle alternative, using the x from #G. Grothendieck's answer:
r <- rle(x)
Find indexes of runs of three 1:
i1 <- which(r$lengths == 3 & r$values == 1)
Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:
i2 <- i1[which(diff(i1) == 2)] + 1
Replace relevant 0 with 1:
r$values[i2] <- 1
Reverse the rle operation on the updated runs:
inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:
library(data.table)
d <- data.table(x)
Calculate length of each run:
d[ , n := .N, by = rleid(x)]
For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:
d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Here is a one-liner using rollapply from zoo:
library(zoo)
rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
## [1] 1 0 0 1 1 1 1 1 1 1 0 0 1
Note: Input used was:
x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)

Resources