How to reconstruct diff data with leading NAs using diffinv in R? - r

I have a large data matrix (“trial.matrix”) similar to the one below.
[,1] [,2]
[1,] 3 NA
[2,] 5 NA
[3,] 7 NA
[4,] 9 10
[5,] 11 12
[6,] 13 14
My problem requires that I shuffle some rows of the difference version of this matrix and then reconstruct a matrix from the shuffled difference matrix. When I apply diff(trial.matrix) I get:
[,1] [,2]
[1,] 2 NA
[2,] 2 NA
[3,] 2 NA
[4,] 2 2
[5,] 2 2
To reconstruct the original data frame, I need to use cumsum() or diffinv(), e.g.:
new.df <- diffinv(diff(trial.matrix), xi = t(c(3, 10)))
but this gives:
[,1] [,2]
[1,] 3 10
[2,] 5 NA
[3,] 7 NA
[4,] 9 NA
[5,] 11 NA
[6,] 13 NA
Obviously, the beginning value (“xi”) for column 2 has to be applied starting in row 3 (or 4?). I have a number of columns in the real matrix, some with leading NAs and some without. I need to preserve the leading NAs in the reconstruction. I cannot figure out an easy way to reconstruct the columns with NAs in the difference matrix in a straightforward way.
(For each column I am able to construct two vectors, one containing the first non-NA row, and the other containing the first NA value, but cannot figure out a straightforward way to use these.)
Suggestions appreciated.

You can temporarily replace NAs by zeroes:
trial.matrix <- matrix(c(seq(3,13,by=2),rep(NA,3),10,12,14),ncol=2)
xi <- apply(trial.matrix,2,function(cl) cl[which(!is.na(cl))[1]])
z2 <- diff(trial.matrix)
# temporarily replace NAs in the second column by zeroes:
nas <- which(is.na(z2[,2]))
z2[nas,2] <- 0
new.df <- diffinv(z2,xi = t(xi))
# return NAs
new.df[nas,2] <- NA
# [,1] [,2]
# [1,] 3 NA
# [2,] 5 NA
# [3,] 7 NA
# [4,] 9 10
# [5,] 11 12
# [6,] 13 14

Related

Remove duplicate rows based on a column values by storing the row whose entry in another column is maximum

I have the following matrix
> mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
> mat
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 12 8
[6,] 12 9
[7,] 12 10
[8,] 12 11
[9,] 12 12
[10,] 13 12
I would like to remove duplicate rows based on first column values and store the row whose entry in the second column is maximum. E.g. for the example above, the desidered outcome is
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
I tried with
> mat[!duplicated(mat[,1]),]
but I obtained
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 13 12
which is different from the desidered outcome for the entry [4,2]. Suggestions?
You can sort the matrix first, using ascending order for column 1 and descending order for column 2. Then the duplicated function will remove all but the maximum column 2 value for each column 1 value.
mat <- mat[order(mat[,1],-mat[,2]),]
mat[!duplicated(mat[,1]),]
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
Like Josephs solution, but if you add row names first you can keep the original order (which will be the same in this case).
rownames(mat) <- 1:nrow(mat)
mat <- mat[order(mat[,2], -mat[,2]),]
mat <- mat[!duplicated(mat[,1]),]
mat[order(as.numeric(rownames(mat))),]
# [,1] [,2]
# 1 9 6
# 2 10 6
# 3 11 7
# 4 12 12
# 5 13 12
First Sort then keep only the first row for each duplicate
mat <- mat[order(mat[,1], mat[,2]),]
mat[!duplicated(mat[,1]),]
EDIT: Sorry I thought your desired result is last df,Ok so you want max value
mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
#Reverse sort
mat <- mat[order(mat[,1], mat[,2], decreasing=TRUE),]
#Keep only the first row for each duplicate, this will give the largest values
mat <- mat[!duplicated(mat[,1]),]
#finally sort it
mat <- mat[order(mat[,1], mat[,2]),]

Length of longest stretch of NAs in a column of data-frame object

I want to write a code that finds the length of longest continuous stretch of NA values in a column of a data-frame object.
>> df
[,1] [,2]
[1,] 1 1
[2,] NA 1
[3,] 2 4
[4,] NA NA
[6,] 1 NA
[7,] NA 8
[8,] NA NA
[9,] NA 6
# e.g.
>> longestNAstrech(df[,1])
>> 3
>> longestNAstrech(df[,2])
>> 2
# What should be the length of longestNAstrech()?
Using base R we could create a function
longestNAstrech <- function(x) {
with(rle(is.na(x)), max(lengths[values]))
}
longestNAstrech(df[, 1])
#[1] 3
longestNAstrech(df[, 2])
#[1] 2

Duplicating rows in R matrix

I have a small matrix, say
x <- matrix(1:10, nrow = 5) # values 1:10 across 5 rows and 2 columns
The result is
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
What I want to be able to do now is duplicate random rows in x; for example, producing
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 4 9
[5,] 5 10
I believe the R function 'rep()' is the solution and also 'sample()', but I don't want to have to specify the size argument in sample(); i.e., I want an arbitrary number of rows to be duplicated each time.
Is there a simple way of accomplishing this using rep() and sample()?
We can use the sample function. I've used set.seed for reproducibility, if you remove that line the results should change.
set.seed(1848) # reproducibility
x[sample(x = nrow(x), size = nrow(x), replace = T), ]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 1 6
[5,] 5 10
Another option could be as sample a row number and replace that with another sampled row number. It will be as:
x[sample(1:nrow(x),1),] <- x[sample(1:nrow(x),1),]
x
# [,1] [,2]
#[1,] 5 10
#[2,] 2 7
#[3,] 3 8
#[4,] 4 9
#[5,] 5 10
OR
Just to duplicate upto 3 random rows, solution could be:
x[sample(1:nrow(x),3),] <- x[sample(1:nrow(x),3),]

Simple way to assign the matrix

Is there another way to assign the matrix?
> x<-matrix(NA,nrow=3,ncol=4)
> x
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
One way is x[2:10]<-2:10 t(x) .
[,1] [,2] [,3]
[1,] NA 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 NA NA
i am asking in general how to assign values to part of matrices (the part of matrices are not matrices,it is only part of it).
In the general case, where the matrix elements you wish to assign may not even be neighbors, you should use both indices with the [<- tools. E.g. (for a larger matrix than your example)
x[1:3,4]<-8:10
or
x[5,c(3,7,11)]<- c(5,3,1)
and so on. If there's a pattern to the replacement locations, you can write loops over the indices of interest.

Data frame with NA in R

I have a data frame which has some rows with NA entries, I want to find the index of the row and the column at which the entry is NA. I am looping in a nested fashion to do that, and that is taking too long. Is there a quicker way to do it? Thanks.
set.seed(123)
dfrm <- data.frame(a=sample(c(1:5, NA), 25,T), b=sample(c(letters,NA), 25,rep=T)
which(is.na(dfrm), arr.ind=TRUE)
row col
[1,] 4 1
[2,] 5 1
[3,] 8 1
[4,] 11 1
[5,] 16 1
[6,] 20 1
[7,] 21 1
[8,] 24 1
[9,] 6 2

Resources