Subtraction matrix in R - r

I want to get the subtraction matrix, the matrix obtained by subtracting each row from other rows. My MWE is below (not working as expected). The resulting matrix should be 36*3 containing subtracted values of each row from other rows. Thanks
X <-
matrix(
data=
c(
5, 9, 20
, 6, 11, 2
, 4, 5, 20
, 6, 9, 46
, 5, 7, 1
, 3, 1, 12
)
, nrow = 6
, ncol = 3
, byrow=TRUE
)
XSub <-
matrix(data=NA, nrow=nrow(X)^2, ncol=ncol(X))
for(i in 1:nrow(X)){
for(j in 1:nrow(X)){
XSub[i+j-1, ] <- X[i, ]-X[j,]
}
}
XSub

I believe this is what you might want (avoids using a loop):
X <-
matrix(
data=
c(
5, 9, 20
, 6, 11, 2
, 4, 5, 20
, 6, 9, 46
, 5, 7, 1
, 3, 1, 12
)
, nrow = 6
, ncol = 3
, byrow=TRUE
)
comb <- expand.grid(x1=1:nrow(X), x2=1:nrow(X))
XSub <- X[comb$x1,] - X[comb$x2,]
rownames(XSub) <- paste(comb$x1, comb$x2, sep="-")
Results in the following:
> XSub
[,1] [,2] [,3]
1-1 0 0 0
2-1 1 2 -18
3-1 -1 -4 0
4-1 1 0 26
5-1 0 -2 -19
6-1 -2 -8 -8
1-2 -1 -2 18
2-2 0 0 0
3-2 -2 -6 18
4-2 0 -2 44
5-2 -1 -4 -1
6-2 -3 -10 10
1-3 1 4 0
2-3 2 6 -18
3-3 0 0 0
4-3 2 4 26
5-3 1 2 -19
6-3 -1 -4 -8
1-4 -1 0 -26
2-4 0 2 -44
3-4 -2 -4 -26
4-4 0 0 0
5-4 -1 -2 -45
6-4 -3 -8 -34
1-5 0 2 19
2-5 1 4 1
3-5 -1 -2 19
4-5 1 2 45
5-5 0 0 0
6-5 -2 -6 11
1-6 2 8 8
2-6 3 10 -10
3-6 1 4 8
4-6 3 8 34
5-6 2 6 -11
6-6 0 0 0

Related

Conditional cumsum with reset when accumulating and substracting at once

I have a data frame with three variables i.e. V1, V2 and V3.
ts<- c(-2, 4, 3,-5,-5,-7, -8, -2, -3, -5,-7, -8, -9, -2, 1, 2,4)
x<- c(6, 0, 0 ,1, 0, 2, 3,5,7,7,8,2,0, 0, 0 , 0, 0)
y<- c(0, 5, 8, 0, 0 , 0 , 0 , 0 , 0, 0, 0, 0, 0, 7, 9, 12, 0)
ve <- data.frame(V1 = ts, V2 = x, V3 =y)
I applied conditional cumsum with the code given below:
ve$yt<- cumsum(ifelse(ve$V1>0, ve$V2-(ve$V3), ve$V2))
I have to admit, this code did its job partially for me until I encountered negative value. As such, I have different desired output (DO). I want to restart cumulating the value as shown in table below, once I have the negative value as encountered in column yt.
View(Ve)
V1 V2 V3 yt DO
-2 6 0 6 6
4 0 5 1 1
3 0 8 -7 0
-5 1 0 -6 1
-5 0 0 -6 1
-7 2 0 -4 3
-8 3 0 -1 6
-2 5 0 4 11
-3 7 0 11 18
-5 7 0 18 25
-7 8 0 26 33
-8 2 0 28 35
-9 0 0 28 35
-2 0 7 28 35
1 0 9 19 26
2 0 12 7 14
4 0 0 7 14
I searched for similar problem but I was unable to get any answer to solve my problem. These are some of the links I tried to solve my problem:
Conditional cumsum with reset
resetting cumsum if value goes to negative in r
I sincerely request to help me solve my problem.
Here is one way you might do this:
ve$DO <- Reduce(function(x,y) pmax(x + y, 0), with(ve, V2-V3*(V1 > 0)), accumulate = TRUE)
ve
V1 V2 V3 DO
1 -2 6 0 6
2 4 0 5 1
3 3 0 8 0
4 -5 1 0 1
5 -5 0 0 1
6 -7 2 0 3
7 -8 3 0 6
8 -2 5 0 11
9 -3 7 0 18
10 -5 7 0 25
11 -7 8 0 33
12 -8 2 0 35
13 -9 0 0 35
14 -2 0 7 35
15 1 0 9 26
16 2 0 12 14
17 4 0 0 14
Equivalent using purrr/dplyr:
library(purrr)
library(dplyr)
ve %>%
mutate(DO = accumulate(V2-V3*(V1 > 0), .f = ~pmax(.x + .y, 0)))

in R: how to take value from i+1th row of 1 dataframe and subtract from every row in i+1th column of 2nd dataframe

Note that the actual dataset is 1000s of columns and 100s of rows so I am looking for a way that does not require that i manually name either columns or rows.
With a dataset that has similar structure as follows:
subvalues <- c(1:10)
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))
call the rows of subvalues SVa, SVb, SVc...
call the rows of the dataframe's columns Xa, Xb, Xc... Ya, Yb, Yc... etc.
What I am trying to build is the following: A function that takes first the first cell of subvalues (SVa) and subtracts it from every row in column X (Xa, Xb, Xc, etc.), 2nd to take the 2nd cell of subvalues (SVb) and subtract it from every row in column y (Ya, Yb, Yc, etc.)
What I have so far is:
res <- numeric(length = length(x))
for (i in seq_along(x)) {
res[i] <- xpos - [**SVi+1**]
}
res
I need to figure out the 'SVi+1' loop and how to properly do the loop-within a loop.
Any help is much appreciated
The example dataset you provide won't work, because you need the same length for subvalues and the number of df columns.
After some modifications, here is an example. You don't need to extract the value from subvalues, as it's just a substraction.
Note that I've saved df in tmp, to modify this data.frame without loosing your initial data. Also, if the entire data.frame is numeric, consider using matrix, which can save you time.
subvalues <- c(1:5) # Note here the length 5 for the 5 columns of df.
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))
tmp <- df
for(i in seq_along(subvalues)){
# print(subvalues[i])
tmp[,i] <- tmp[,i] - subvalues[i]
}
tmp[,i] is a vector returning the i column of the data.frame, and so you can substract a value to a vector, and save it in it's initial place.
Maybe you can try replicate to create a matrix of same dimensions as df, and do subtraction afterwards, i.e.,
dfout <- df - t(replicate(nrow(df),subvalues))
such that
> dfout
x y z q t
1 0 1 -1 2 -4
2 0 0 0 -2 -1
3 1 1 -2 -2 -3
4 3 0 -2 -3 -2
5 0 0 0 -1 -1
6 3 1 -2 -2 -3
7 3 -2 0 -2 -5
8 1 0 -3 -3 -4
9 1 1 -2 -3 -2
10 -1 1 -2 -2 -4
11 0 0 -2 -2 -3
12 0 2 -3 -4 -2
13 2 0 -1 -4 -2
14 0 -1 1 -2 -4
15 2 -2 0 0 -4
16 1 -2 0 -2 -1
17 2 -1 -1 -2 -3
18 5 0 -1 -2 -2
19 0 0 0 2 -3
20 2 0 -1 -2 -1
21 3 2 -1 -1 -4
22 0 -1 -2 -2 -4
23 1 0 -2 -3 -1
24 -1 -1 3 -3 -3
25 0 0 -1 -1 -1
26 0 -1 -2 -2 -4
27 -1 0 -3 -3 -2
28 0 1 -1 -1 -2
29 3 -2 1 -4 -1
30 0 2 -1 0 -3
31 1 -1 2 -2 -2
32 1 1 0 -2 -4
33 1 -1 -2 -3 -5
34 0 -1 -1 -2 -1
35 2 0 -2 -2 -4
36 1 2 -3 -3 -3
37 2 2 0 -2 -5
38 -1 -1 -3 -4 -2
39 2 1 -1 -3 -4
40 1 3 -1 -3 -2
DATA
set.seed(1)
subvalues <- c(1:5) # Note here the length 5 for the 5 columns of df.
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))

Consecutive values below threshold value

I have a vector of approximately 1000 years of PDSI values that range from 6 to -6. I'd like to find a quick way to search when there are four or more consecutive years that are <= -2 value. Furthermore I need the data to be kept within a vector of the same length as the original 1000 so that I can plot them together. The end product could even be a logic vector. Here's an example of what I have and what I'd like.
Original <- c(1,6,5,-2,-6,-4,-2,0,1,-2,-3,0)
New <- c(0,0,0,1,1,1,1,0,0,0,0,0) # expected output
You can try the following
x <- c(rep(-2, 5), rep(0, 3), rep(-2, 4), rep(0,6), rep(5, 2), rep(-2, 6), rep(-6, 2))
(r <- rle(x))
## Run Length Encoding
## lengths: int [1:7] 5 3 4 6 2 6 2
## values : num [1:7] -2 0 -2 0 5 -2 -6
(r$lengths[r$lengths > 3 & r$values== -2]) # length of each sequence
## [1] 5 4 6
To get the vector with only a sequence of "-2" you can try
r$values[r$values != -2] <- 0
rep(r$values, r$lengths)
## [1] -2 -2 -2 -2 -2 0 0 0 -2 -2 -2 -2 0 0 0 0 0 0 0 0 -2 -2 -2 -2 -2 -2 0 0

creating variable from a matrix R

i'm tryng to create a set of variables from a matrix, this is my code
matrix<-cbind(paste("a",letters[1:11],sep=""),
paste("b",letters[1:11],sep=""),
paste("c",letters[1:11],sep=""),
paste("d",letters[1:11],sep=""),
paste("e",letters[1:11],sep=""),
paste("f",letters[1:11],sep=""),
paste("g",letters[1:11],sep=""),
paste("h",letters[1:11],sep=""),
paste("i",letters[1:11],sep=""),
paste("j",letters[1:11],sep=""),
paste("k",letters[1:11],sep=""))
so i've got a matrix with all the combination between the letters, aa, ab, ac and so on;
what can I do if i want create variables with the same name and assign a value of each?
for example
aa<-0
ab<-0
and so on; is there a method to do automatically?thanks
Consider this alternate strategy:
> m <- matrix(NA, 10, 10, dimnames=list(letters[1:10], letters[1:10]) )
> m[] <- outer(1:10, 1:10, FUN="-")
> m
a b c d e f g h i j
a 0 -1 -2 -3 -4 -5 -6 -7 -8 -9
b 1 0 -1 -2 -3 -4 -5 -6 -7 -8
c 2 1 0 -1 -2 -3 -4 -5 -6 -7
d 3 2 1 0 -1 -2 -3 -4 -5 -6
e 4 3 2 1 0 -1 -2 -3 -4 -5
f 5 4 3 2 1 0 -1 -2 -3 -4
g 6 5 4 3 2 1 0 -1 -2 -3
h 7 6 5 4 3 2 1 0 -1 -2
i 8 7 6 5 4 3 2 1 0 -1
j 9 8 7 6 5 4 3 2 1 0
Now you can access a single element with a letter pair:
m['d','f']
[1] -2

Calculating change from baseline with data in long format

Here is a small reproducible example of my data:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
I would like to generate a new variable that is the "change from baseline". That is, I would like
subject time measure change
1 0 10 0
1 1 12 2
1 2 8 -2
2 0 7 0
2 1 0 -7
2 2 0 -7
Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?
There are many possibilities. My favorites:
library(plyr)
ddply(mydata,.(subject),transform,change=measure-measure[1])
subject time measure change
1 1 0 10 0
2 1 1 12 2
3 1 2 8 -2
4 2 0 7 0
5 2 1 0 -7
6 2 2 0 -7
library(data.table)
myDT <- as.data.table(mydata)
myDT[,change:=measure-measure[1],by=subject]
print(myDT)
subject time measure change
1: 1 0 10 0
2: 1 1 12 2
3: 1 2 8 -2
4: 2 0 7 0
5: 2 1 0 -7
6: 2 2 0 -7
data.table is preferable if your dataset is large.
What about:
mydata$change <- do.call("c", with(mydata, lapply(split(measure, subject), function(x) x - x[1])))
alternatively you could also use the ave function:
with(mydata, ave(measure, subject, FUN=function(x) x - x[1]))
# [1] 0 2 -2 0 -7 -7
or
within(mydata, change <- ave(measure, subject, FUN=function(x) x - x[1]))
# subject time measure change
# 1 1 0 10 0
# 2 1 1 12 2
# 3 1 2 8 -2
# 4 2 0 7 0
# 5 2 1 0 -7
# 6 2 2 0 -7
you can use tapply:
mydata$change<-as.vector(unlist(tapply(mydata$measure,mydata$subject,FUN=function(x){return (x-rep(x[1],length(x)))})));

Resources