Retaining lagged value to compound towards end value - r

I would like to please ask for your help concerning the following issue.
In a table-like object where each row corresponds to an observation in time, I would like to obtain the value from the previous row for one particular variable (:= p0), multiply it with an element of another column (:= returnfactor) and write the result to the current row as an element of another column (:= p1).
Illustrated via two pictures, I want to go from
to
.
I have written
matrix <- cbind (
1:10,
1+rnorm(10, 0, 0.05),
NA,
NA
)
colnames(matrix) <- c("timeid", "returnfactor", "p0", "p1")
matrix[1, "p0"] <- 100
for (i in 1:10)
{
if (i==1)
{
matrix[i, "p1"] <- matrix[1, "p0"] * matrix[i, "returnfactor"]
}
else
{
matrix[i, "p0"] <- matrix[i-1, "p1"]
matrix[i, "p1"] <- matrix[i, "p0"] * matrix[i, "returnfactor"]
}
}
That is, I implemented what I would like to reach using a loop. However, this loop is too slow. Obviously, I am new to R.
Could you please give me a hint how to improve the speed using the capabilities R has to offer? I assume there is no need for a loop here, though I lack an approach how to do it else. In SAS, I used its reading of data frames by row and the retain-statement in a data step.
Yours sincerely,
Sinistrum

We can indeed improve this. The key thing to notice is that values of both p0 and p1 involve mostly cumulative products. In particular, we have
mat[, "p1"] <- mat[1, "p0"] * cumprod(mat[, "returnfactor"])
mat[-1, "p0"] <- head(mat[, "p1"], -1)
where head(mat[, "p1"], -1) just takes all the mat[, "p1"] except for its last element. This gives
# timeid returnfactor p0 p1
# [1,] 1 0.9903601 100.00000 99.03601
# [2,] 2 1.0788946 99.03601 106.84941
# [3,] 3 1.0298117 106.84941 110.03478
# [4,] 4 0.9413212 110.03478 103.57806
# [5,] 5 0.9922179 103.57806 102.77200
# [6,] 6 0.9040545 102.77200 92.91149
# [7,] 7 0.9902371 92.91149 92.00440
# [8,] 8 0.8703836 92.00440 80.07913
# [9,] 9 1.0657001 80.07913 85.34033
# [10,] 10 0.9682228 85.34033 82.62846

Related

Problems with the names of the elements of a list

I have created a list whose elements are themselves a list of matrices. I want to be able to extract the vectors of observations for each variable
p13 = 0.493;p43 = 0.325;p25 = 0.335;p35 = 0.574;p12 = 0.868
std_e2 = sqrt(1-p12^2)
std_e3 = sqrt(1-(p13^2+p43^2))
std_e5 = sqrt(1-(p25^2+p35^2+2*p25*p35*(p13*p12)))
set.seed(1234)
z1<-c(0,1)
z2<-c(0,1)
z3<-c(0,1)
z4<-c(0,1)
z5<-c(0,1)
s<-expand.grid(z1,z2,z3,z4,z5); s
s<-s[-1,];s
shift<-3
scenari<-s*shift;scenari
scenario_1<-scenari[1];scenario_1
genereting_fuction<-function(n){
sample<-list()
for (i in 1:nrow(scenario_1)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(10, 100), genereting_fuction)
dati_fault[[1]]
[[1]]
X1 X2 X3 X4 X5
[1,] 2.505826 1.736593 1.0274581 -0.6038358 1.9967656
[2,] 4.127593 3.294344 2.8777777 1.2386725 3.0207723
[3,] 1.853050 1.312617 1.1875699 0.5994921 1.0471564
[4,] 4.481019 3.330629 2.1880050 -0.1087338 2.7331061
[5,] 3.916191 3.306036 0.7258404 -1.1388570 1.0293168
[6,] 3.335131 2.379439 1.2407679 0.3198553 1.6755424
[7,] 3.574675 3.769436 1.1084120 -1.0065481 2.0034434
[8,] 3.203620 2.842074 0.6550587 -0.8516120 -0.1433508
[9,] 2.552959 2.642094 2.5376430 2.0387860 3.5318055
[10,] 2.656474 1.607934 2.2760391 -1.3959822 1.0095796
I only want to save the elements of X1 in an object, and so for the other variables. .
Here you have a list of matrix with scenario in row and n columns.
genereting_fuction <- function(n, scenario, scenari){
# added argument because you assume global variable use
nr <- nrow(scenario)
sample <- vector("list", length = nr) # sample<-list()
# creating a list is better than expanding it each iteration
for (i in 1:nr){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
colnames(sample[[i]])<-c("X1","X2","X3","X4","X5")
}
sample
}
set.seed(123)
dati_fault<- lapply(rep(3, 2), function(x) genereting_fuction(x, scenario_1, scenari))
dati_fault
lapply(dati_fault, function(x) {
tmp <- lapply(x, function(y) y[,"X1"])
tmp <- do.call(rbind, tmp)
})
If you want to assemble this list of matrix, like using cbind, I suggest you just use a single big n value and not the lapply with rep inside it.
Also I bet there is easier way to simulate this number of scenari, but it's difficult to estimate without knowing the context of your code piece.
Also, try to solve your issue with a minimal example, working with a list of 100 list of 32 matrix of 5*10 is a bit messy !
Good luck !

Use a FOR loop to run function over each column in r

I have several columns of data that I want to use a for loop (specifically a for loop. Please, no answers that don't involve a for loop) to run a function for each column in a matrix.
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
x y z
[1,] 9.5262742 16.22999 21.93228
[2,] 5.8183264 14.53771 21.81774
[3,] 3.9509342 17.36694 22.46594
[4,] 3.0245614 19.46411 25.80411
[5,] 5.0284351 13.89636 21.61767
[6,] 3.0291715 17.50267 26.28110
[7,] 8.4727471 16.77365 27.60535
[8,] 3.3816903 15.23395 22.01265
[9,] 0.3182083 13.97575 29.25909
[10,] 2.6499290 16.71129 27.05160
for (i in 1:ncol(tab)){
print(mean(i))
}
I have almost no familiarity with R and have had trouble finding a solution that specifically uses a for loop to run a function and output a result per column.
Well, strictly using a for loop, I think this would do what you want to!
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
for (i in 1:ncol(tab)){
print(mean(tab[, i]))
}
You need to index the matrix by using [row, column]. When you want to select all rows for a specific column (which is your case), just leave the row field empty. So that's why you have to use [, i], where i is your index.

How it iterate over a matrix using a function in R?

I have created a function to order a vector of length 2, using the following code
x = (c(6,2))
orders = function(x){
for(i in 1:(length(x)-1)){
if(x[i+1] < x[i]){
return(c(x[i+1], x[i]))} else{
(return(x))
}}}
orders(x)
I have been asked to use this function to process a dataset with 2 columns as follows. Iterate over the rows of the
data set, and if the element in the 2nd column of row i is less than the element in the first
column of row i, switch the order of the two entries in the row by making a suitable call to
the function you just wrote.
I've tried using the following code
set.seed(1128719)
data=matrix(rnorm(20),byrow=T,ncol=2)
df = for (i in 1:2) {
for(j in 1:10){
data = orders(c(x[i], x[j]))
return(data)
}
}
The output is null. I'm not quite sure where I'm going wrong.
Any suggestions?
I modified your code a bit but tried to keep the 'style' the same
Ther is no need for a loop
i in 1:(length(x)-1) always evaluates to
for i in 1:1 and i will only take the value of 1.
orders = function(x){
# Since the function will only work on vectors of length 2
# its good practice to raise an error right at the start
#
if (length(x) != 2) {
stop("x must be vector of lenght 2")
}
if (x[2] < x[1]) {
return(c(x[2], x[1]))
} else {
return(x)
}
}
orders(c(6, 2))
set.seed(1128719)
data <- matrix(rnorm(20),byrow=T,ncol=2)
The for loop itself cant be assigned to a variable
But we use the loop to mutate the matrix 'data'
in place
for (row in 1:nrow(data)) {
data[row, ] <- orders(data[row,])
}
data
Edit:
This is the input:
[,1] [,2]
[1,] -0.04142965 0.2377140
[2,] -0.76237866 -0.8004284
[3,] 0.18700893 -0.6800310
[4,] 0.76499646 0.4430643
[5,] 0.09193440 -0.2592316
[6,] 1.17478053 -0.4044760
[7,] -1.62262500 0.1652850
[8,] -1.54848857 0.7475451
[9,] -0.05907252 -0.8324074
[10,] -1.11064318 -0.1148806
This is the output i get:
[,1] [,2]
[1,] -0.04142965 0.23771403
[2,] -0.80042842 -0.76237866
[3,] -0.68003104 0.18700893
[4,] 0.44306433 0.76499646
[5,] -0.25923164 0.09193440
[6,] -0.40447603 1.17478053
[7,] -1.62262500 0.16528496
[8,] -1.54848857 0.74754509
[9,] -0.83240742 -0.05907252
[10,] -1.11064318 -0.11488062
Here are two ways of ordering the 2 columns matrix.
This is the test matrix posted in the question.
set.seed(1128719)
data <- matrix(rnorm(20), byrow = TRUE, ncol = 2)
1. With a function orders.
The function expects as input a 2 element vector. If they are out of order, return the vector with its elements reversed, else return the vector as is.
orders <- function(x){
stopifnot(length(x) == 2)
if(x[2] < x[1]){
x[2:1]
}else{
x
}
}
Test the function.
x <- c(6,2)
orders(x)
#[1] 2 6
Now with the matrix data.
df1 <- t(apply(data, 1, orders))
2. Vectorized code.
Creates a logical index with TRUE whenever the elements are out of order and reverse only those elements.
df2 <- data
inx <- data[,2] < data[,1]
df2[inx, ] <- data[inx, 2:1]
The results are the same.
identical(df1, df2)
#[1] TRUE

How to write linearly dependent column in a matrix in terms of linearly independent columns?

I have a large mxn matrix, and I have identified the linearly dependent columns. However, I want to know if there's a way in R to write the linearly dependent columns in terms of the linearly independent ones. Since it's a large matrix, it's not possible to do based on inspection.
Here's a toy example of the type of matrix I have.
> mat <- matrix(c(1,1,0,1,0,1,1,0,0,1,1,0,1,1,0,1,0,1,0,1), byrow=TRUE, ncol=5, nrow=4)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 0
[2,] 1 1 0 0 1
[3,] 1 0 1 1 0
[4,] 1 0 1 0 1
Here it's obvious that x3 = x1-x2, x5=x1-x4. I want to know if there's an automated way to get that for a larger matrix.
Thanks!
I'm sure there is a better way but I felt like playing around with this. I basically do a check at the beginning to see if the input matrix is full column rank to avoid unnecessary computation in case it is full rank. After that I start with the first two columns and check if that submatrix is of full column rank, if it is then I check the first thee columns and so on. Once we find some submatrix that isn't of full column rank I regress the last column in that submatrix on the previous one which tells us how to construct linear combinations of the first columns to get the last column.
My function isn't very clean right now and could do some additional checking but at least it's a start.
mat <- matrix(c(1,1,0,1,0,1,1,0,0,1,1,0,1,1,0,1,0,1,0,1), byrow=TRUE, ncol=5, nrow=4)
linfinder <- function(mat){
# If the matrix is full rank then we're done
if(qr(mat)$rank == ncol(mat)){
print("Matrix is of full rank")
return(invisible(seq(ncol(mat))))
}
m <- ncol(mat)
# cols keeps track of which columns are linearly independent
cols <- 1
for(i in seq(2, m)){
ids <- c(cols, i)
mymat <- mat[, ids]
if(qr(mymat)$rank != length(ids)){
# Regression the column of interest on the previous
# columns to figure out the relationship
o <- lm(mat[,i] ~ mat[,cols] + 0)
# Construct the output message
start <- paste0("Column_", i, " = ")
# Which coefs are nonzero
nz <- !(abs(coef(o)) <= .Machine$double.eps^0.5)
tmp <- paste("Column", cols[nz], sep = "_")
vals <- paste(coef(o)[nz], tmp, sep = "*", collapse = " + ")
message <- paste0(start, vals)
print(message)
}else{
# If the matrix subset was of full rank
# then the newest column in linearly independent
# so add it to the cols list
cols <- ids
}
}
return(invisible(cols))
}
linfinder(mat)
which gives
> linfinder(mat)
[1] "Column_3 = 1*Column_1 + -1*Column_2"
[1] "Column_5 = 1*Column_1 + -1*Column_4"

Identifying overlap zones in R raster package

Package:
raster
Data:
A rasterStack with 10 bands.
Each of the bands contains an image area surrounded by NAs
Bands are logical, i.e. "1" for image data and "0"/NA for surrounding area
The "image areas" of each band do not align completely with each other, though most have partial overlaps
Objective:
Write a fast function that can return either a rasterLayer or cell numbers for each "zone", for instance a pixel containing data only from bands 1 and 2 falls in zone 1, a pixel containing data only from bands 3 and 4 falls in zone 2, etc. If a rasterLayer is returned, I need to be able to match the zone value with band numbers later.
First attempt:
# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
combs = combn(1:nlayers(myraster), i)
for(j in 1:ncol(combs)){
values = c(values, list(combs[,j]))
}
}
# Define the zone finding function
find_zones = function(bands){
# The intersection of the bands of interest
a = subset(myraster, 1)
values(a) = TRUE
for(i in bands){
a = a & myraster[[i]]
}
# Union of the remaining bands
b = subset(myraster, 1)
values(b) = FALSE
for(i in seq(1:nlayers(myraster))[-bands]){
b = b | myraster[[i]]
}
#plot(a & !b)
cells = Which(a & !b, cells=TRUE)
return(cells)
}
# Applying the function
results = lapply(values, find_zones)
My current function takes a very long time to execute. Can you think of a better way? Note that I don't simply want to know how many bands have data at each pixel, I also need to know which bands. The purpose of this is to process different the areas differently afterwards.
Note also that the real-life scenario is a 3000 x 3000 or more raster with potentially more than 10 bands.
EDIT
Some sample data consisting of 10 offset image areas:
# Sample data
library(raster)
for(i in 1:10) {
start_line = i*10*1000
end_line = 1000000 - 800*1000 - start_line
offset = i * 10
data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
current_layer = raster(nrows=1000, ncols=1000)
values(current_layer) = data
if(i == 1) {
myraster = stack(current_layer)
} else {
myraster = addLayer(myraster, current_layer)
}
}
NAvalue(myraster) = 0 # You may not want to do this depending on your solution...
EDIT : Answer updated using Nick's trick and matrix multiplication.
You could try the following function, optimized by using Nick's trick and matrix multiplication. The bottleneck now is filling up stack with the seperate layers, but I guess the timings are quite OK now. Memory usage is a bit less, but given your data and the nature of R, I don't know if you can nibble of a bit without hampering the performance big time.
> system.time(T1 <- FindBands(myraster,return.stack=T))
user system elapsed
6.32 2.17 8.48
> system.time(T2 <- FindBands(myraster,return.stack=F))
user system elapsed
1.58 0.02 1.59
> system.time(results <- lapply(values, find_zones))
Timing stopped at: 182.27 35.13 217.71
The function returns either a rasterStack with the different level combinations present in the plot (that's not all possible level combinations, so you have some gain there already), or a matrix with the level number and level names. This allows you to do something like :
levelnames <- attr(T2,"levels")[T2]
to get the level names for each cell point. As shown below, you can easily put that matrix inside a rasterLayer object.
The function :
FindBands <- function(x,return.stack=F){
dims <- dim(x)
Values <- getValues(x)
nn <- colnames(Values)
vec <- 2^((1:dims[3])-1)
#Get all combinations and the names
id <- unlist(
lapply(1:10,function(x) combn(1:10,x,simplify=F))
,recursive=F)
nameid <- sapply(id,function(i){
x <- sum(vec[i])
names(x) <- paste(i,collapse="-")
x
})
# Nicks approach
layers <- Values %*% vec
# Find out which levels we need
LayerLevels <- unique(sort(layers))
LayerNames <- c("No Layer",names(nameid[nameid %in% LayerLevels]))
if(return.stack){
myStack <- lapply(LayerLevels,function(i){
r <- raster(nr=dims[1],nc=dims[2])
r[] <- as.numeric(layers == i)
r
} )
myStack <- stack(myStack)
layerNames(myStack) <- LayerNames
return(myStack)
} else {
LayerNumber <- match(layers,LayerLevels)
LayerNumber <- matrix(LayerNumber,ncol=dims[2],byrow=T)
attr(LayerNumber,"levels") <- LayerNames
return(LayerNumber)
}
}
Proof of concept, using the data of RobertH :
r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
aRaster <- stack(s)
> X <- FindBands(aRaster,return.stack=T)
> plot(X)
> X <- FindBands(aRaster,return.stack=F)
> X
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 2
[3,] 2 2 2 2 2 2 2 2 2 2
[4,] 2 2 2 2 2 2 2 2 2 4
[5,] 4 4 4 4 4 4 4 4 4 8
[6,] 8 8 8 8 8 8 8 8 8 8
[7,] 7 7 7 7 7 7 7 7 7 7
[8,] 5 5 5 5 5 5 5 5 5 5
[9,] 5 5 5 5 5 5 5 5 5 6
[10,] 6 6 8 7 7 3 3 3 1 1
attr(,"levels")
[1] "No Layer" "1" "2" "3" "1-2" "1-3"
"2-3" "1-2-3"
> XX <- raster(ncol=10,nrow=10)
> XX[] <- X
> plot(XX)
I'm not familiar with raster, but from what I grasp from the above, you essentially have a 10*3000*3000 array, right?
If so, for each position in the raster (second and third indices, currow and curcol), you can calculate a unique identifier for its 'zone' by using binary: run i over the 'bands' (first index) and sum r[i,currow, curcol]*2^(i-1). Depending on the internal workings of raster, it should be possible to have a rather quick implementation of this.
This results in a new 'raster' of size 3000*3000 holding the unique identifiers of each position. Finding the unique values in there gives you back the zones that actually occur in your data, and reversing the binary logic should give you the bands that belong to a given zone.
Pardon me if my interpretation of raster is incorrect: then please ignore my musings. Either way not a complete solution.
How about this?
library(raster)
#setting up some data
r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
plot(stack(s))
# write a vectorized function that classifies the data
#
fun=function(x,y,z)cbind(x+y+z==0, x==1&y+z==0, y==1&x+z==0, z==1&x+y==0, x==0&y+z==2, y==0&x+z==2, z==0&x+y==2,x+y+z==3)
z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
# equivalent to
#s <- stack(s)
#z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
ln <- c("x+y+z==0", "x==1&y+z==0", "y==1&x+z==0", "z==1&x+y==0", "x==0&y+z==2", "y==0&x+z==2", "z==0&x+y==2", "x+y+z==3")
layerNames(z) <- ln
x11()
plot(z)
more generic:
s <- stack(s)
fun=function(x)as.numeric(paste(which(x==1), collapse=""))
x <- calc(s,fun)
this is not good when nlayers(s) has double digits ("1", "2" is the same as "12", and in those cases you could use the function below (fun2) instead:
fun2=function(x)as.numeric(paste(c(9, x), collapse=""))
x2 <- calc(s,fun2)
unique(x)
# [1] 1 2 3 12 13 23 123
unique(x2)
# [1] 9000 9001 9010 9011 9100 9101 9110 9111
for the toy example only:
plot(x)
text(x)
p=rasterToPolygons(x)
plot(p, add=T)
I've written code for #Nick Sabbe's suggestion, which I think is very concise and relatively fast. This assumes that the input rasterStack already has logical 1 or 0 data:
# Set the channels to 2^i instead of 1
bands = nlayers(myraster)
a = stack()
for (i in 1:bands) {
a = addLayer(a, myraster[[i]] * 2^i)
}
coded = sum(a)
#plot(coded)
values = unique(coded)[-1]
remove(a, myraster)
# Function to retrieve which coded value means which channels
which_bands = function(value) {
single = numeric()
for (i in bands:1) {
if ((0 < value) & (value >= 2^i)) {
value = value - 2^i
single = c(single, i)
}
}
return(single)
}

Resources