Difference between row in df with na - r

My sample data looks like this
DF
n a b c d
1 NA NA NA NA
2 1 2 3 4
3 5 6 7 8
4 9 NA 11 12
5 NA NA NA NA
6 4 5 6 NA
7 8 9 10 11
8 12 13 15 16
9 NA NA NA NA
I need to substract row 2 from row 3 and row 4.
Similarly i need to subtract row 6 from row 7 and row 8
My real data is huge, is there a way of doing it automatically. It seems it could be some for loop but as I am dummy R user my trials were not successful.
Thank you for any help and tips.
UPDATE
I want to achieve something like this
DF2
rowN1<-DF$row3-DF$row2
rowN2<-DF$row4-DF$row2
rowN3<-DF$row7-DF$row6 # there is NA in row 6 so after subtracting there should be NA also
rowN4<-DF$row8-DF$row6

Here's one idea
set.seed(1)
(m <- matrix(sample(c(1:9, NA), 60, T), ncol=5))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 7 3 8 8
# [2,] 4 4 4 2 7
# [3,] 6 8 1 8 5
# [4,] NA 5 4 5 9
# [5,] 3 8 9 9 5
# [6,] 9 NA 4 7 3
# [7,] NA 4 5 8 1
# [8,] 7 8 6 6 1
# [9,] 7 NA 5 6 4
# [10,] 1 3 2 8 6
# [11,] 3 7 9 1 7
# [12,] 2 2 7 5 5
idx <- seq(2, nrow(m)-2, 4)
do.call(rbind, lapply(idx, function(x) {
rbind(m[x+1, ]-m[x, ], m[x+2, ]-m[x, ])
}))
# [1,] 2 4 -3 6 -2
# [2,] NA 1 0 3 2
# [3,] NA NA 1 1 -2
# [4,] -2 NA 2 -1 -2
# [5,] 2 4 7 -7 1
# [6,] 1 -1 5 -3 -1

Related

save a loop while an matriz or a data frame

I want to save a while in an matrix or in a data frame, in such a way that it places me in an orderly way
i <- 15#year
pon<-list()
while (i < 63) {
pon[i] <-cumprod( vlookup(i:62,Tabla_de_mortalidad_css,4))
i = i+1}
this is my command that i am doing
I want you to print something like that: for example
v1 v2 v3
1
2 1
3 2 1
4 3 2
. . .
. . .
. . .
v1, v2, v3 are my variabl
Do you need something like this?
n <- 1:63
mat <- cbind(1:63, sapply(1:3, function(x) c(rep(NA, x), head(n, -x))))
mat
# [,1] [,2] [,3] [,4]
# [1,] 1 NA NA NA
# [2,] 2 1 NA NA
# [3,] 3 2 1 NA
# [4,] 4 3 2 1
# [5,] 5 4 3 2
# [6,] 6 5 4 3
# [7,] 7 6 5 4
#...
#...
use tidyverse
library(tidyverse)
n <- 1:10
Lag <- 1:3
df <- data.frame(n = n)
bind_cols(df, map_dfc(Lag, ~transmute(df, !!paste0("Lag", .x) := lag(n, n = .x))))
#> n Lag1 Lag2 Lag3
#> 1 1 NA NA NA
#> 2 2 1 NA NA
#> 3 3 2 1 NA
#> 4 4 3 2 1
#> 5 5 4 3 2
#> 6 6 5 4 3
#> 7 7 6 5 4
#> 8 8 7 6 5
#> 9 9 8 7 6
#> 10 10 9 8 7
Created on 2020-12-11 by the reprex package (v0.3.0)
We can do this easily with shift from data.table
library(data.table)
do.call(cbind, shift(1:10, n = 1:3))
-output
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] 1 NA NA
# [3,] 2 1 NA
# [4,] 3 2 1
# [5,] 4 3 2
# [6,] 5 4 3
# [7,] 6 5 4
# [8,] 7 6 5
# [9,] 8 7 6
#[10,] 9 8 7

How to force all data in to end on the same row? [duplicate]

This question already has an answer here:
Move NA to the start of each column in a matrix
(1 answer)
Closed 2 years ago.
I have a bunch of columns which all start on the same row but I would rather them all end on the same row. Here is a simplified example
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- cbind(A,B,C)
Which gives:
A B C
[1,] 2 NA NA
[2,] 7 5 NA
[3,] 3 2 NA
[4,] 5 1 NA
[5,] 5 6 3
[6,] 9 4 6
[7,] 8 6 7
[8,] 1 7 1
[9,] NA NA 5
[10,] NA NA 6
But I want to manipulate this so it is output like this:
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
Couldn't really find a solution on this site. Thanks for any help.
You can try:
apply(Start, 2, function(x) rev(`length<-`(na.omit(rev(x)), nrow(Start))))
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
We can try apply + is.na
apply(Start,2,function(x) c(x[is.na(x)],x[!is.na(x)]))
or
apply(Start,2,function(x) do.call(c,rev(split(x,is.na(x)))))
such that
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
There is a sort parameter for this:
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- as.data.frame(cbind(A,B,C) ) # added "as.data.frame" here ..
do.call(cbind, lapply(Start, sort, na.last = FALSE))
Or:
do.call(cbind, lapply(Start, function(x) {
res <- sort(x, na.last = FALSE)
res[!is.na(res)] <- x[!is.na(x)]
res
}))
# A B C
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] 2 NA NA
# [4,] 7 5 NA
# [5,] 3 2 3
# [6,] 5 1 6
# [7,] 5 6 7
# [8,] 9 4 1
# [9,] 8 6 5
#[10,] 1 7 6

Check for overlap in multiple columns across waves in R

I have a dataset of adolescents over 5 waves. In each wave they nominate up to 3 friends. I want to add variables that indicate whether each friend was nominated in the previous wave of data collection.
My data look like this sample:
student_id wave friend1_id friend2_id friend3_id
1 1 3 NA NA
2 1 5 2 3
3 1 2 4 5
4 1 1 6 NA
5 1 1 NA 6
6 1 5 NA 2
7 1 8 NA NA
8 1 NA 9 NA
9 1 8 7 NA
10 1 7 9 NA
1 2 4 NA NA
2 2 5 3 NA
3 2 NA NA 5
4 2 NA NA NA
5 2 6 NA NA
6 2 5 NA NA
7 2 10 1 3
8 2 9 NA NA
9 2 8 6 7
10 2 7 4 NA
So wave 2 "consistency" variables should look like this (0 is not present in previous wave 1 is present in previous wave, NA if they didn't nominate someone in wave 2):
student_id wave friend1_consit friend2_consit friend3_consit
1 2 0 NA NA
2 2 1 1 NA
3 2 NA NA 1
4 2 NA NA NA
5 2 1 NA NA
6 2 1 NA NA
7 2 0 0 0
8 2 1 NA NA
9 2 1 2 1
10 2 1 0 NA
This answer in Base-R returns a matrix with the student_id as the rows, and the wave# as the columns:
votes_bywave <- split(df1[,3:5],df1$wave)
votes_bywave <- lapply(votes_bywave, function(x) unique(unlist(x)))
votes_bywave <- sapply(votes_bywave, function(x) unique(df1$student_id) %in% x )
> votes_bywave
1 2
[1,] TRUE TRUE
[2,] TRUE FALSE
[3,] TRUE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE
[6,] TRUE TRUE
[7,] TRUE TRUE
[8,] TRUE TRUE
[9,] TRUE TRUE
[10,] FALSE TRUE
or you may prefer to have the actual Ids listed, in which case add this line at the end:
cbind(student_id = unique(df1$student_id), votes_bywave)
student_id 1 2
[1,] 1 1 1
[2,] 2 1 0
[3,] 3 1 1
[4,] 4 1 1
[5,] 5 1 1
[6,] 6 1 1
[7,] 7 1 1
[8,] 8 1 1
[9,] 9 1 1
[10,] 10 0 1

manipulation of list of matrices in R

I have a list of matrices, generated with the code below
a<-c(0,5,0,1,5,1,5,4,6,7)
b<-c(3,1,0,2,4,2,5,5,7,8)
c<-c(5,9,0,1,3,2,5,6,2,7)
d<-c(6,5,0,1,3,4,5,6,7,1)
k<-data.frame(a,b,c,d)
k<-as.matrix(k)
#dimnames(k)<-list(cntry,cntry)
e<-c(0,5,2,2,1,2,3,6,9,2)
f<-c(2,0,4,1,1,3,4,5,1,4)
g<-c(3,3,0,2,0,9,3,2,1,9)
h<-c(6,1,1,1,5,7,8,8,0,2)
l<-data.frame(e,f,g,h)
l<-as.matrix(l)
#dimnames(l)<-list(cntry,cntry)
list<-list(k,l)
names(list)<-2010:2011
list
list
$`2010`
a b c d
[1,] 0 3 5 6
[2,] 5 1 9 5
[3,] 0 3 2 2
[4,] 1 2 1 1
[5,] 5 4 3 3
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[9,] 6 7 2 7
[10,] 7 8 7 1
$`2011`
e f g h
[1,] 0 2 3 6
[2,] 5 0 3 1
[3,] 2 4 0 1
[4,] 2 1 2 1
[5,] 1 1 0 5
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[9,] 9 1 1 0
[10,] 2 4 9 2
In each matrix I would like to delete the rows that are smaller than 1. But when I delete in matrix "2010" the first row (because <1), all other first rows in 2010 and 2011 should be deleted. Then the third row of first column is <1, then all other third columns should be deleted and so on...
The result should look like:
a b c d
[4,] 1 2 1 1
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[10,] 7 8 7 1
$`2011`
e f g h
[4,] 2 1 2 1
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[10,] 2 4 9 2
We can use rowSums
lapply(list, function(x) x[!rowSums(x <1),])
If we need to remove the rows that are common
ind <- Reduce(`&`, lapply(list, function(x) !rowSums(x < 1)))
lapply(list, function(x) x[ind,])
# a b c d
#[1,] 1 2 1 1
#[2,] 1 2 2 4
#[3,] 5 5 5 5
#[4,] 4 5 6 6
#[5,] 7 8 7 1
#$`2011`
# e f g h
#[1,] 2 1 2 1
#[2,] 2 3 9 7
#[3,] 3 4 3 8
#[4,] 6 5 2 8
#[5,] 2 4 9 2
Update
Based on the OP's comments about removing rows where the row is greater than the standard deviation of each columns,
lapply(list, function(x) {
for(i in seq_len(ncol(x))) x <- x[!rowSums(x > sd(x[,i])),]
x
})
# get union of the row index with at least one of the elements less 1
removed <- Reduce(union, lapply(list, function(x) which(rowSums(x < 1) != 0)))
lapply(list, function(x) x[-removed, ])
$`2010`
a b c d
[1,] 1 2 1 1
[2,] 1 2 2 4
[3,] 5 5 5 5
[4,] 4 5 6 6
[5,] 7 8 7 1
$`2011`
e f g h
[1,] 2 1 2 1
[2,] 2 3 9 7
[3,] 3 4 3 8
[4,] 6 5 2 8
[5,] 2 4 9 2

Extend by adding rows to a matrix in R with the same pattern

I have matrix, but want to extend it with the same pattern. Note that it may be extended for any given number of rows and columns, and is not normally square
04/06/2012 11/06/2012 18/06/2012 25/06/2012 02/07/2012
26/03/2012 10 11 12 13 14
02/04/2012 9 10 11 12 13
09/04/2012 8 9 10 11 12
16/04/2012 7 8 9 10 11
23/04/2012 6 7 8 9 10
30/04/2012 5 6 7 8 9
07/05/2012 4 5 6 7 8
14/05/2012 3 4 5 6 7
21/05/2012 2 3 4 5 6
28/05/2012 1 2 3 4 5
I.e. I want to extend it to something like this:
04/06/2012 11/06/2012 18/06/2012 25/06/2012 02/07/2012
26/03/2012 10 11 12 13 14
02/04/2012 9 10 11 12 13
09/04/2012 8 9 10 11 12
16/04/2012 7 8 9 10 11
23/04/2012 6 7 8 9 10
30/04/2012 5 6 7 8 9
07/05/2012 4 5 6 7 8
14/05/2012 3 4 5 6 7
21/05/2012 2 3 4 5 6
28/05/2012 1 2 3 4 5
04/06/2012 0 1 2 3 4
11/06/2012 NA 0 1 2 3
18/06/2012 NA NA 0 1 2
25/06/2012 NA NA NA 0 1
02/07/2012 NA NA NA NA 0
I'm sure there's a clever way to do this with Reduce or something, but this is what came to mind:
lengthOut <- 6 ## Set to one less than the number of columns you want to create
startAt <- 10 ## Set the maximum value of the FIRST column
vapply(c(0, sequence(lengthOut)), function(x) {
x <- (startAt + x):0 # Create a sequence in the normal manner
length(x) <- startAt + lengthOut + 1 # Extend the length of that sequence
x
}, numeric(startAt + lengthOut + 1)) # Specify what to return
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 10 11 12 13 14 15 16
# [2,] 9 10 11 12 13 14 15
# [3,] 8 9 10 11 12 13 14
# [4,] 7 8 9 10 11 12 13
# [5,] 6 7 8 9 10 11 12
# [6,] 5 6 7 8 9 10 11
# [7,] 4 5 6 7 8 9 10
# [8,] 3 4 5 6 7 8 9
# [9,] 2 3 4 5 6 7 8
# [10,] 1 2 3 4 5 6 7
# [11,] 0 1 2 3 4 5 6
# [12,] NA 0 1 2 3 4 5
# [13,] NA NA 0 1 2 3 4
# [14,] NA NA NA 0 1 2 3
# [15,] NA NA NA NA 0 1 2
# [16,] NA NA NA NA NA 0 1
# [17,] NA NA NA NA NA NA 0
Here's another approach
x <- 16:0
matrix(c(sapply(6:1, function(z) rep(lead(x, z))), x), ncol=7)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 10 11 12 13 14 15 16
#[2,] 9 10 11 12 13 14 15
#[3,] 8 9 10 11 12 13 14
#[4,] 7 8 9 10 11 12 13
#[5,] 6 7 8 9 10 11 12
#[6,] 5 6 7 8 9 10 11
#[7,] 4 5 6 7 8 9 10
#[8,] 3 4 5 6 7 8 9
#[9,] 2 3 4 5 6 7 8
#[10,] 1 2 3 4 5 6 7
#[11,] 0 1 2 3 4 5 6
#[12,] NA 0 1 2 3 4 5
#[13,] NA NA 0 1 2 3 4
#[14,] NA NA NA 0 1 2 3
#[15,] NA NA NA NA 0 1 2
#[16,] NA NA NA NA NA 0 1
#[17,] NA NA NA NA NA NA 0
Edit: forgot to mention that I used dplyr::lead
Not sure if this helps:
m1 <- matrix(rep(10:1,each=7)+0:6,ncol=7,byrow=T)
m2 <- matrix(NA,ncol=7,nrow=7)
indx <- 0:6+rep(c(0:-6),each=7)
m2[lower.tri(m2, diag=TRUE)] <- indx[indx>=0]
rbind(m1,t(m2))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 10 11 12 13 14 15 16
# [2,] 9 10 11 12 13 14 15
# [3,] 8 9 10 11 12 13 14
# [4,] 7 8 9 10 11 12 13
# [5,] 6 7 8 9 10 11 12
# [6,] 5 6 7 8 9 10 11
# [7,] 4 5 6 7 8 9 10
# [8,] 3 4 5 6 7 8 9
# [9,] 2 3 4 5 6 7 8
# [10,] 1 2 3 4 5 6 7
# [11,] 0 1 2 3 4 5 6
# [12,] NA 0 1 2 3 4 5
# [13,] NA NA 0 1 2 3 4
# [14,] NA NA NA 0 1 2 3
# [15,] NA NA NA NA 0 1 2
# [16,] NA NA NA NA NA 0 1
# [17,] NA NA NA NA NA NA 0

Resources