R Replace values in multiply columns based on specified condition? - r

How can I replace 2nd to 7th values of "N" to "Y" in the first row ? the first value stays "N"
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N N N N N N N
2 N N N N N N Y
3 N N N N N Y N
My desire outcone is :
1 N Y Y Y Y Y Y
Many thanks,
A.

a <- read.table("a.txt", sep = '\t', header=TRUE, stringsAsFactors=FALSE)
a
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N N N N N N N
2 N N N N N N Y
3 N N N N N Y N
a[1,2:7] <- "Y"
a
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
3 N N N N N Y N
Ok, it's a bit tricky but possible to do. I will edit this answer. We want to change N to Y only in rows where from column 2:7 we have only N, sooo I added new column with value FALSE and TRUE. If row have only N from column 2:7 value is FALSE becase we have not any Y. I use
b$new <- apply(b[,2:7], 1, function(x) any(x %in% c("Y")))
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090 new
1 N N N N N N N FALSE
2 N N N N N N Y TRUE
3 N N N N N Y N TRUE
Then if we have FALSE in column new we can put values Y in columns 2:7
b[,2:7][b$new==FALSE ,] <- "Y"
So we have desired result.
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090 new
1 N Y Y Y Y Y Y FALSE
2 N N N N N N Y TRUE
3 N N N N N Y N TRUE
Summarizing, each value in roww with value N in columns 2:7 will be replaced with Y.
Of course we dont need column new so we can remove it by
b$new <- NULL
Ok, so count occarances in columns and barplot:
x <- apply(a, 2, table)
y <- do.call(rbind, x)
Easy R bulit barplot
z <- as.data.frame(t(y))
barplot(data.matrix(z[1:2,]), col=c("darkblue","red"),beside=TRUE)
X-axis labels will expand, if you plot it by yourself.
There's other way to get this plot using ggplot package but I would have to re-build datafile what is a bit time consuming, cheers!
>dat
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
3 N N N N N N N
4 N N N N N Y N
5 N Y N Y N N N
6 Y Y Y Y Y Y Y
dat$new <- apply(dat[,1:7], 1, function(x) all(x %in% c("Y") | all((x %in% c("N")))))
result <- dat[dat$new!=TRUE, ]
result$new <- NULL
> result
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
4 N N N N N Y N
5 N Y N Y N N N

Related

Tensorflow: Find greater than pairs and stack along axis

The problem I have using tensorflow is as follows:
For one tensor X with dims n X m
X = [[x11,x12...,x1m],[x21,x22...,x2m],...[xn1,xn2...,xnm]]
I want to get an n X m X m tensor which are n m X m matrices
Each m X m matrix is the result of:
tf.math.greater(tf.reshape(x,(-1,1)), x) where x is a row of X
In words, for every row k in X, Im trying to get the pairs i,j where xki > xkj. This gives me a matrix, and then I want to stack those matrices along the first axis, to get a n m x m cube.
Example:
X = [[1,2],[4,3], [5,7]
Result = [[[False, False],[True, False]],[[False, True],[False, False]], [[False, False],[True, False]]]
Result has shape 3 X 2 X 2
Reshaping each row is the same as reshaping all rows. Try this:
def fun(X):
n, m = X.shape
X1 = tf.expand_dims(X, -1)
X2 = tf.reshape(X, (n, 1, m))
return tf.math.greater(X1, X2)
X = tf.Variable([[1,2],[4,3], [5,7]])
print(fun(X))
Output:
tf.Tensor(
[[[False False]
[ True False]]
[[False True]
[False False]]
[[False False]
[ True False]]], shape=(3, 2, 2), dtype=bool)

placing value between specific numbers in cycle

so lets say I have
x = 1,4,2
i = 2
j = 4
k = 3
So i = 2 and j = 4, the point is i need to place k (3) between the numbers i,j in x so the result would be x = 1,4,3,2. I need it to work in a cycle because the numbers in i,j,k always change and so does the length of x when a new number from k is placed in x. The new x after step one is
x = 1,4,3,2 and lets say new values:
i = 4
j = 3
k = 5 so again in the cycle it should place 5 in x between 4 and 3 so final x = 1,4,5,3,2
Is there a way i could do it?
When i is always the number before j,
You could use append function:
ie:
x = c(1,4,2)
i = 4
k = 3
x <- append(x, k, match(i, x))
x
[1] 1 4 3 2
i = 4
k = 5
x <- append(x, k, match(i, x))
x
[1] 1 4 5 3 2
Putting this in a function:
insert <- function(x, k, i){
append(x, k, match(i, x))
}
Note that you did not specify what would happen if you had more than 1 four in your vector. ie x<- c(1,4,2,4,2) where exactly do you want to place the 3? Is it after the first four or the second four? etc
You can try this function :
insert_after <- function(x, i, k) {
ind <- match(i, x)
new_inds <- sort(c(seq_along(x), ind))
new_x <- x[new_inds]
new_x[duplicated(new_inds)] <- k
new_x
}
x = c(1,4,2)
x <- insert_after(x, 4, 3)
x
#[1] 1 4 3 2
x <- insert_after(x, 4, 5)
x
#[1] 1 4 5 3 2

R - build a matrix from other matrices with linking information [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 3 years ago.
I need to build a matrix from data that is stored in several other matrices that all have a pointer in their first column. This is how the original matrices might look, with a-e being the pointers connecting the the data from all the matrices and the v-z being the data that is linked together. The arrow points to what I want my final matrix to look like.
a x x
b y y
c z z
d w w
e v v
e v v
d w w
c z z
b y y
a x x
----->
x x x x
y y y y
z z z z
w w w w
v v v v
I cant seem to write the right algorithm to do this, I am either getting subscript out of bounds errors or replacement has length zero errors. Here is what I have now but it is not working.
for(i in 1:length(matlist)){
tempmatrix = matlist[[i]] # list of matrices to be combined
genMatrix[1,i] = tempmatrix[1,2]
for(j in 2:length(tempmatrix[,1])){
index = which(indexv == tempmatrix[j,1]) #the row index for the data that needs to be match
# with an ECID
for(k in 1:length(tempmatrix[1,])){
genMatrix[index,k+i] = tempmatrix[j,k]
}
# places the data in same row as the ecid
}
}
print(genMatrix)
EDIT: I just want to clarify that my example only shows two matrices but in the list matlist there can be any number of matrices. I need to find a way of merging them without having to know how many matrices are in matlist at the time.
We can merge all the matrices in the list using Reduce and merge from base package.
as.matrix(read.table(text="a x x
b y y
c z z
d w w
e v v")) -> mat1
as.matrix(read.table(text="e v v
d w w
c z z
b y y
a x x")) -> mat2
as.matrix(read.table(text="e x z
d z w
c w v
b y x
a v y")) -> mat3
matlist <- list(mat1=mat1, mat2=mat2, mat3=mat3)
Reduce(function(m1, m2) merge(m1, m2, by = "V1", all.x = TRUE),
matlist)[,-1]
#> V2.x V3.x V2.y V3.y V2 V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-05 by the reprex package (v0.3.0)
Or we can append all the matrices together and then use tidyr to go from long to wide and get the desired output.
library(tidyr)
library(dplyr)
bind_rows(lapply(matlist, as.data.frame), .id = "mat") %>%
gather(matkey, val, c("V2","V3")) %>%
unite(matkeyt, mat, matkey, sep = ".") %>%
spread(matkeyt, val) %>%
select(-V1)
#> mat1.V2 mat1.V3 mat2.V2 mat2.V3 mat3.V2 mat3.V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-06 by the reprex package (v0.3.0)

Randomise across columns for half a dataset

I have a data set for MMA bouts.
The structure currently is
Fighter 1, Fighter 2, Winner
x y x
x y x
x y x
x y x
x y x
My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.
I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.
Ideally i would have this
Fighter 1, Fighter 2, Winner
x y x
y x x
x y y
y x x
x y y
is there a way to randomise across columns without messing up the order of the rows ??
I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.
Sample data:
set.seed(42)
x <- data.frame(
F1 = sample(letters, size = 5),
F2 = sample(LETTERS, size = 5),
stringsAsFactors = FALSE
)
x$W <- x$F1
x
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 g D g
# 4 t P t
# 5 o W o
Choose some rows to change, randomly:
(ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
# [1] 3 5 4
This means that we expect rows 3-5 to change.
Now the random changes:
within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 D g g
# 4 P t t
# 5 W o o
Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.
I also found that this code worked
matches_clean[, c("fighter1", "fighter2")] <- lapply(matches_clean[, c("fighter1", "fighter2")], as.character)
changeInd <- !!((match(matches_clean$fighter1, levels(as.factor(matches_clean$fighter1))) -
match(matches_clean$fighter2, levels(as.factor(matches_clean$fighter2)))) %% 2)
matches_clean[changeInd, c("fighter1", "fighter2")] <- matches_clean[changeInd, c("fighter2", "fighter1")]

how to do triple summations in matrices

I have got a triple summation expression like this
sum(l(from 1 to n))
sum(i(from 1 to m))
sum(t(from 1 to m)
[phil_z1_1[i]*phil_z1_1[t}*I(X(l)<min(y(i),y(t))]
I have done:
set.seed(1234567)
x <- rnorm(2900)
n <- length(x)
y <- rnorm(3000)*0.25
m <-length(y)
z1 <- runif(m,min=0,max=1)
z2 <- runif(m,min=0,max=1)
phil_z1_1 <- sqrt(12*(z1/z2)))
for min(y[i],y[t]) I have done something like
y_m<-matrix(rep(y,length(y)),ncol=length(y))
y_m_t<-t(y_m)
y_min<-pmin(y_m_t,y_m)
After expanding the two inner summation, For example, for example m=2,n=3
I can put the original expression into the matrices like x*A*x'
where
x=[phil_z1_1[1] phil_z1_1[2]]
A is a 2*2 matrix
A=[sum(from 1 to n) I(x[l]<=min(y[1],y[1]), sum(from 1 to n) I(x[l]<=min(y1,y2); sum(from 1 to n) I(x[l]<=min(y[2],y[1]), sum(from 1 to n) I(x[l]<=min(y[2],y[2])]
Therefore,
x*A*x'=[phil_z1_1[1] phil_z1_1[2]]*[sum(from 1 to n) I(x[l]<=min(y[1],y[1]), sum(from 1 to n) I(x[l]<=min(y1,y2); sum(from 1 to n) I(x[l]<=min(y[2],y[1]), sum(from 1 to n) I(x[l]<=min(y[2],y[2])][phil_z1_1[1] phil_z1_1[2]]'
Basically I want to create a m*m matrix for A, in which each individual element is equal to the sum of its corresponding part, for example, sum(from 1 to n)x[l]<=min(y[1],y[1]) will be the a11 of matrix A I want to create
I have tried to use
args <- expand.grid(l=1:n, i=1:m, t=1:m)
args <- subset(args, x[l] <= pmin(y[i],y[t])-z1[i]*z2[t])
args <- transform(args, result=phil_z1_1[i]*phil_z1_1[t])
sum(args[,"result"])
But r cannot run the above programming, as the sample size of data set is too big, around 3,000.
Can someone tell me how to solve this problem?
Thanks in advance!
Here is a matrix approach for your triple sum
set.seed(1234567)
n <- 10
x <- rnorm(n)
m <- 3000
y <- rnorm(m)/4
y_m <- pmin(matrix(rep(y,m), ncol=m, byrow=TRUE), y)
z1 <- runif(m,min=0,max=1)
z2 <- runif(m,min=0,max=1)
phi <- sqrt(12*(z1/z2))
phi_m <- phi %o% phi
f1 <- function(l) sum(phi_m * (x[l] < y_m))
sum(sapply(1:n, f1))
[1] 242034847337
It is not lightning fast, but much faster than the data.frame approach
f2 <- function(lrng) {
args <- expand.grid(l=lrng, i=1:m, t=1:m)
args <- subset(args, x[l] <= pmin(y[i],y[t]))
args <- transform(args, result=phi[i]*phi[t])
sum(args[,"result"])
}
sum(sapply(1:n, f2)) # 90 times slower
[1] 242034847337

Resources