I have a matrix in R which is in the following format:
A B C D E
1 0 0 1 0
0 0 1 0 1
1 1 1 0 1
.
.
.
I need to compare, for each column the value in the row, with the corresponding value in the column D & E. For example:
if(mat[1,1] == 1 && mat[1,4]==1)
vec[1]+=1
else if(mat[1,1] == 1 && mat[1,5]==1)
vec[1]-=1
Basically, vec will be positive if number of 1s in column 4 is greater than number of 1s in column 5.
For this I pass use a apply function which calls the elements row wise, and then I run a loop from 1 to the number of columns, and perform the above conditions as shown below:
outputv = vector(,ncol(mat))
A <- function(vec){
for(i in 1:length(vec)-2)
{
if(vec[i]==1 &&vec[length(vec)-1]==1)
outputv[i] = outputv[i] + 1
else if(vec[i] == 1&& vec[length(vec)-2]==1)
outputv[i] = outputv[i] - 1
}
}
apply(mat,1,A)
I do realize the loop isnt an efficient method, but even with this approach, the values in outputv are all 0.
The expected outputv for the given input matrix would be
0 1 2
For the first column, 1 appears in the 1st and 3rd row. In the first row 4th column, there is 1, subtract 1. 3rd 5th Column has 1, so add 1
Total = -1 + 1 = 0
Logical operation & is equivalent to binary multiplication. So you can simplify your condition to columnwise multiplication and then calculating sums.
> mat <- as.matrix(read.table(
+ text = "
+ A B C D E
+ 1 0 0 1 0
+ 0 0 1 0 1
+ 1 1 1 0 1", header = TRUE ) )
>
> outputv <- rep(0, ncol(mat)-2 ) # initialize vector with zeros
>
> for (n in 1:(ncol(mat)-2) ) # calculate outputv in loop
+ outputv[n] <- sum( mat[,n]*(-mat[,ncol(mat)-1] +mat[,ncol(mat)]) )
> outputv
[1] 0 1 2
> vec <- sum(outputv)
> vec
[1] 3
But the best (and fast) solution may be this approach based on matrix multiplication:
- (mat[,ncol(mat)-1] %*% mat[,1:(ncol(mat)-2)] ) +
mat[,ncol(mat)] %*% mat[,1:(ncol(mat)-2)]
It gives desired output:
A B C
[1,] 0 1 2
Related
Imagine I give you a vector like a = (8 - 2) - (7 - 1) which can be simplified as z = (8 - 2 - 7 + 1).
Now imagine I give you a vector consisting of nine 0s, b = c(0,0,0,0,0,0,0,0,0).
Can R turn a to the following vector desired_output = c(1,-1,0,0,0,0,-1,1,0)?
The logic
The numbers in a are locations of elements in b (ex. 8 in a means 8th element in b).
The logic is to assign either 1 or -1 to the elements indicated in a based on their sign and assign 0 to all other elements in b so to get the desired_output.
I don't entirely understand your problem setup — in R terms, a = (8 - 2) - (7 - 1) is an expression rather than a vector — but here's a start:
b <- rep(0,9)
a <- c(8, -2, -7, 1)
b[abs(a)] <- sign(a)
## [1] 1 -1 0 0 0 0 -1 1 0
We can use for loop
for(i in a){
if(i > 0) b[i] <- 1
else b[abs(i)] <- -1
}
Output
[1] 1 -1 0 0 0 0 -1 1 0
Data
a <- c(8 ,- 2 ,- 7 ,1)
b <- c(0,0,0,0,0,0,0,0,0)
I have data about machines in the following form
Number of rows - 900k
Data
A B C D E F G H I J K L M N
---- -- --- ---- --- --- --- --- --- --- --- --- --- ---
1 1 1 1 1 1 1 1 1 1 0 1 1 0 0
2 0 0 0 0 1 1 1 0 1 1 0 0 1 0
3 0 0 0 0 0 0 0 1 1 1 1 1 0 0
1 indicates that the machine was active and 0 indicates that it was inactive.
I want my output to look like
A B C D E F G H I J K L M N
---- -- --- ---- --- --- --- --- --- --- --- --- --- ---
1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
2 0 0 0 0 1 1 1 1 1 1 0 0 1 0
3 0 0 0 0 0 0 0 1 1 1 1 1 0 0
Basically all I am trying to do is look for zeros in a particular row and if that zero is surrounded by ones on either side, replace 0 with 1
example -
in row 1 you have zero in column J
but you also have 1 in column I and K
which means I replace that 0 by 1 because it is surrounded by 1s
The code I am using is this
for(j in 2:13) {
if(data[i,j]==0 && data[i,j-1]==1 && data[i,j+1]==1){
data[i,j] = 1
}
}
}
Is there a way to reduce the time computation for this? This takes me almost 30 mins to run in R. Any help would be appreciated.
this is faster because it does not require to iterate through the rows.
for(j in 2:13) {
data[,j] = ifelse(data[,j-1] * data[,j+1]==1,1,data[,j])
}
or a littlebit more optimized, without using ifelse
for(j in 2:(ncol(data) - 1)) {
data[data[, j - 1] * data[, j + 1] == 1, j] <- 1
}
You could also use gsub to replace any instances of 101 with 111 using the following code:
collapsed <- gsub('101', '111', apply(df1, 1, paste, collapse = ''))
data <- as_tibble(t(matrix(unlist(sapply(collapsed, strsplit, split = '')), nrow = numLetters)))
names(data) <- LETTERS[1:numLetters]
Here's a comparison of everyone's solutions:
library(data.table)
library(rbenchmark)
library(tidyverse)
set.seed(1)
numLetters <- 13
df <- as_tibble(matrix(round(runif(numLetters * 100)), ncol = numLetters))
names(df) <- LETTERS[1:numLetters]
benchmark(
'gsub' = {
data <- df
collapsed <- gsub('101', '111', apply(data, 1, paste, collapse = ''))
data <- as_tibble(t(matrix(unlist(sapply(collapsed, strsplit, split = '')), nrow = numLetters)))
names(data) <- LETTERS[1:numLetters]
},
'for_orig' = {
data <- df
for(i in 1:nrow(data)) {
for(j in 2:(ncol(data) - 1)) {
if(data[i, j] == 0 && data[i, j - 1] == 1 && data[i, j + 1] == 1) {
data[i, j] = 1
}
}
}
},
'for_norows' = {
data <- df
for(j in 2:(ncol(data) - 1)) {
data[, j] = ifelse(data[, j - 1] * data[, j + 1] == 1, 1, data[, j])
}
},
'vectorize' = {
data <- df
for(i in seq(ncol(data) - 2) + 1) {
condition <- data[, i - 1] == data[, i + 1] & data[, i - 1] == 1 & data[, i] == 0
data[which(condition), i] <- 1
}
},
'index' = {
data <- df
idx <- apply(data, 1, function(x) c(0, diff(x)))
data[which(idx == -1 & lead(idx == 1), arr.ind = TRUE)[, 2:1]] <- 1
},
replications = 100
)
The indexing solution (which has since been deleted) wins hands-down in terms of computational time for a 13-by-100 data frame.
test replications elapsed relative user.self sys.self user.child
3 for_norows 100 1.19 7.438 1.19 0 NA
2 for_orig 100 9.29 58.063 9.27 0 NA
1 gsub 100 0.28 1.750 0.28 0 NA
5 index 100 0.16 1.000 0.16 0 NA
4 vectorize 100 0.87 5.438 0.87 0 NA
sys.child
3 NA
2 NA
1 NA
5 NA
4 NA
Cut the time by using vectorized operations. As you are planning to do the same thing for every row, this can be done by utilizing the vectorized conditional statements.
for(i in seq(ncol(data) - 2) + 1){ #<== all but last and first column
#Find all neighbouring columns that are equal, where the the center column is equal to 0
condition <- data[, i - 1] == data[, i + 1] & data[, i - 1] == 1 & data[, i] == 0
#Overwrite only the values that holds the condition
data[which(condition), i] <- 1
}
You can avoid loops altogether and use indexing to replace all the values at once:
nc <- ncol(df)
df[, 2:(nc - 1)][df[, 1:(nc - 2)] * df[, 3:nc] == 1] <- 1
Here's my problem I couldn't solve it all.
Suppose that we have the following code as follows:
## A data frame named a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
## 1st function calculates all the combinaisons of colnames of a and the output is a character vector named item2
items2 <- c()
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
items2 <- c(items2, paste(colnames(a[i]), colnames(a[j]), collapse = '', sep = ""))
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
And here's my code I'm trying to solve (the output is a numeric vector called count_1):
## 2nd function
colnames(a) <- NULL ## just for facilitating the calculation
count_1 <- numeric(ncol(a)*2)
countI <- 1
while(countI <= ncol(a)){
for(i in countI){
countJ <- countI + 1
while(countJ <= ncol(a)){
for(j in countJ){
s <- a[, i]
p <- a[, j]
count_1[i*2] <- as.integer(s[i] == p[j] & s[i] == 1)
}
countJ <- countJ + 1
}
countI <- countI + 1
}
}
But when I execute this code in RStudio Console, a non-expectation result returned!:
count_1
[1] 0 0 0 0 0 1 0 1 0 0
However, I am expecting the following result:
count_1
[1] 1 2 2 2 1 1 1 1 2 1
You can see visit the following URL where you can find an image on Dropbox for detailed explanation.
https://www.dropbox.com/s/5ylt8h8wx3zrvy7/IMAG1074.jpg?dl=0
I'll try to explain a little more,
I posted the 1st function (code) just to show you what I'm looking for exactly that is an example that's all.
What I'm trying to get from the second function (code) is calculating the number of occurrences of number 1 (firstly we put counter = 0) in each row (while each row of two columns (AB, for example) must equal to one in both columns to say that counter = counter + 1) we continue by combing each column by all other columns (with AC, AD, AE, BC, BD, BE, CD, CE, and then DE), combination is n!/2!(n-2)!, that means for example if I have the following data frame:
a =
A B C D E
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
Then, the number of occurrences of the number 1 for each row by combining the two first columns is as follows: (Note that I put colnames(a) <- NULL just to facilitate the work and be more clear)
0 1 0 0 0
0 0 0 0 1
1 1 1 1 1
1 0 0 1 0
1 0 1 0 1
### Example 1: #####################################################
so from here I put (for columns A and B (AB))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 1 0 1 0 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j], and for this example a numeric vector will be [1] 1
### Example 2: #####################################################
From here I put (for columns A and D (AD))
s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 0 0 1 1 0
Then I'll look for the occurrence of the number 1 in both vectors in condition it must be the same, i.e. a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j], and for this example a numeric vector will be [1] 2
And so on,
I'll have a numeric vector named count_1 equal to:
[1] 1 2 2 2 1 1 1 1 2 1
while each index of count_1 is a combination of each column by others (without the names of the data frame)
AB AC AD AE BC BD BE CD CE DE
1 2 2 2 1 1 1 1 2 1
Not clear what you're after at all.
As to the first code chunk, that is some ugly R coding involving a whole bunch of unnecessary while/for loops.
You can get the same result items2 in one single line.
items2 <- sort(toupper(unlist(sapply(1:4, function(i)
sapply(5:(i+1), function(j)
paste(letters[i], letters[j], sep = ""))))));
items2;
# [1] "AB" "AC" "AD" "AE" "BC" "BD" "BE" "CD" "CE" "DE"
As to the second code chunk, please explain what you're trying to calculate. It's likely that these while/for loops are as unnecessary as in the first case.
Update
Note that this is based on a as defined at the beginning of your post. Your expected output is based on a different a, that you changed further down the post.
There is no need for a for/while loop, both "functions" can be written in two one-liners.
# Your sample dataframe a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
# Function 1
items2 <- toupper(unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
paste(letters[i], letters[j], sep = "")))));
# Function 2
count_1 <- unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
sum(a[, i] + a[, j] == 2))));
# Add names and sort
names(count_1) <- items2;
count_1 <- count_1[order(names(count_1))];
# Output
count_1;
#AB AC AD AE BC BD BE CD CE DE
# 1 2 2 2 1 1 1 2 1 1
I want to fit a Bradley-Terry model to many observers' rankings of three objects from within a larger set of objects.
My raw data looks like this:
obs1=c("A","C","D")
obs2=c("B","D","E")
obs3=c("C","B","E")
obs4=c("C","D","E")
obs5=c("C","E","D")
data=cbind(obs1,obs2,obs3,obs4,obs5)
obs1 obs2 obs3 obs4 obs5
1 A B C C C
2 C D B D E
3 D E E E D
but what I need as an input to countsToBinomial{BradleyTerry2} and then to BTm{BradleyTerry2} is a table like below, which contains the number of times the column-name objects were ranked before the row-name object by the five observers.
All information I found begins with the counts as data, but since my number of observations is very high, I wonder if there is a way of automatising this transformation.
A B C D E
A 0 0 0 0 0
B 0 0 1 0 0
C 1 0 0 0 0
D 1 1 3 0 1
E 0 2 3 2 0
Any ideas highly appreciated!
Ok here's the answer using nested for loops and match...
x <- unique(as.vector(data))
x <- sort(x)
cmatrix <- matrix(0,nrow = length(x), ncol = length(x))
colnames(cmatrix) <- x
row.names(cmatrix) <- x
This creates your output matrix as cmatrix with initial values all 0. Then we run the loops as follows...
count <- 0
for (i in 1:ncol(cmatrix) ){
for (j in 1:nrow(cmatrix) ){
for (k in 1:ncol(data)){
if( is.na(match(colnames(cmatrix)[i],data[,k])) == FALSE){
if( is.na(match(row.names(cmatrix)[j],data[,k])) == FALSE){
if( match(colnames(cmatrix)[i],data[,k]) < match(row.names(cmatrix)[j],data[,k]) ){count <- count+1}
}
cmatrix[j,i] <- cmatrix[j,i]+count
}
count <- 0
}
}
}
This will give you required output table. This solution will work for any number of values and not just for A to E.
I have computed a vector of the frequency of different events, represented as fractions and sorted in descending order. I need to interface to a tool that requires positive integer percentages that must sum up to exactly 100. I would like to generate the percentages in a fashion that best represents the input distribution. That is, I would like relationship (ratios) among the percentages to best match the one in the input fractions, despite any non-linearities resulting in cutting a long tail.
I have a function that generates these percentages, but I don't think it is optimal or elegant. In particular, I would like to do more of the work in numeric space before resorting to "stupid integer tricks".
Here is an example frequency vector:
fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
And here is my function:
# Convert vector of fractions to integer percents summing to 100
percentize <- function(fractionals) {
# fractionals is sorted descending and adds up to 1
# drop elements that wouldn't round up to 1% vs. running total
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]
# calculate initial percentages
percentages <- round((fractionals / sum(fractionals)) * 100)
# if sum of percentages exceeds 100, remove proportionally
i <- 1
while (sum(percentages) > 100) {
excess <- sum(percentages) - 100
if (i > length(percentages)) {
i <- 1
}
partialExcess <- max(1, round((excess * percentages[i]) / 100))
percentages[i] <- percentages[i] - min(partialExcess,
percentages[i] - 1)
i <- i + 1
}
# if sum of percentages shorts 100, add proportionally
i <- 1
while (sum(percentages) < 100) {
shortage <- 100 - sum(percentages)
if (i > length(percentages)) {
i <- 1
}
partialShortage <- max(1, round((shortage * percentages[i]) / 100))
percentages[i] <- percentages[i] + partialShortage
i <- i + 1
}
return(percentages)
}
Any ideas?
How about this? It rescales the variables so that it should add to 100, but if due to rounding it comes to 99 it adds 1 to the largest frequency.
fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]
bunnies <- as.integer(fractionals / sum(fractionals) * 100) + 1
bunnies[bunnies > 1] <- round(bunnies[bunnies > 1] * (100 -
sum(bunnies[bunnies == 1])) / sum(bunnies[bunnies > 1]))
if((sum(bunnies) < 100) == TRUE) bunnies[1] <- bunnies[1] + 1
> bunnies
[1] 45 6 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1