Replace sequence of identical values of length > 2 - r

I have a sensor that measures a variable and when there is no connection it returns always the last value seen instead of NA. So in my vector I would like to replace these identical values by an imptuted value (for example with na.approx).
set.seed(3)
vec <- round(runif(20)*10)
#### [1] 2 8 4 3 6 6 1 3 6 6 5 5 5 6 9 8 1 7 9 3
But I want only the sequences bigger than 2 (3 or more identical numbers) because 2 identical numbers can appear naturally. (in previous example the sequence to tag would be 5 5 5)
I tried to do it with diff to tag my identical points (c(0, diff(vec) == 0)) but I don't know how to deal with the length == 2 condition...
EDIT
my expected output could be like this:
#### [1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3
(The second identical value of a sequence of 3 or more is very probably a wrong value too)
Thanks

you can use the lag function
set.seed(3)
> vec <- round(runif(20)*10)
>
> vec
[1] 2 8 4 3 6 6 1 3 6 6 5 5 5 6 9 8 1 7 9 3
>
> vec[vec == lag(vec) & vec == lag(vec,2)] <- NA
>
> vec
[1] 2 8 4 3 6 6 1 3 6 6 5 5 NA 6 9 8 1 7 9 3
>

you can use rle to get the indices of the positions where NA should be assigned.
vec[with(data = rle(vec),
expr = unlist(sapply(which(lengths > 2), function(i)
(sum(lengths[1:i]) - (lengths[i] - 2)):sum(lengths[1:i]))))] = NA
vec
#[1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3
In function
foo = function(X, length){
replace(x = X,
list = with(data = rle(X),
expr = unlist(sapply(which(lengths > length), function(i)
(sum(lengths[1:i]) - (lengths[i] - length)):sum(lengths[1:i])))),
values = NA)
}
foo(X = vec, length = 2)
#[1] 2 8 4 3 6 6 1 3 6 6 5 NA NA 6 9 8 1 7 9 3

Related

Remove identical values if the same as previous in a time series

I have a time series:
df <- data.frame(t=1:10, x= c(5,7,8,9,5,5,5,5,4,3))
I want to remove values that are identical to the previous value to obtain:
x = c(5,7,8,9,5,4,3)
I tried:
df[unique(df$x),]
But this gives the incorrect answer.
You can do:
df[c(1, diff(df$x)) != 0, ]
t x
1 1 5
2 2 7
3 3 8
4 4 9
5 5 5
6 9 4
7 10 3
With dplyr, you can do:
df %>%
filter(x != lag(x, default = first(x)-1))
t x
1 1 5
2 2 7
3 3 8
4 4 9
5 5 5
6 9 4
7 10 3
In base R, we can use head and tail
subset(df, c(TRUE, head(x, -1) != tail(x, -1)))
# t x
#1 1 5
#2 2 7
#3 3 8
#4 4 9
#5 5 5
#9 9 4
#10 10 3
Another base solution would be using rle.
If you want to subset the dataframe based on the criteria, you can use lengths. Otherwise, if you only need the subset of x column, we should extract the values from rle. See below;
df[cumsum(rle(df$x)$lengths), ] # dataframe subset
# t x
# 1 1 5
# 2 2 7
# 3 3 8
# 4 4 9
# 8 8 5
# 9 9 4
# 10 10 3
rle(df$x)$values # vector of values
# [1] 5 7 8 9 5 4 3
Or using data.table:
library(data.table)
setDT(df_large)[, rn :=1:.N, by = rleid(x)][rn == 1, .(t, x)]
# t x
# 1: 1 5
# 2: 2 7
# 3: 3 8
# 4: 4 9
# 5: 5 5
# 6: 9 4
# 7: 10 3
library(dplyr)
df <- data.frame(t=1:10, x= c(5,7,8,9,5,5,5,5,4,3))
subsetVec <- df$x - lag(df$x) != 0
subsetVec <- replace_na(subsetVec, TRUE)
df[subsetVec,]

Compare 2 values of the same row of a matrix with the row and column index of another matrix in R

I have a matrix1 with 11217 rows and 2 columns, a second matrix2 which has 10 rows and 10 columns. Now, I want to compare the values in the rows of matrix 1 with the indices of matrix 2 and if these are the same then the value of the corresponding index (currently 0) of the matrix2 should be increased with +1.
c1 <- x[2:11218] #these values go from 1 to 10
#second column from index 3 to N
c2 <- x[3:11219] #these values also go from 1 to 10
#matrix with column c1 and c2
m1 <- as.matrix(cbind(c1 = c1, c2 = c2))
#empty matrix which will count the frequencies
m2 <- matrix(0, nrow = 10, ncol = 10)
#change row and column names of m2 to the numbers of 1 to 10
dimnames(m2) <-list(c(1:10), c(1:10))
#go through every row of the matrix m1 and look which rotation appears, add 1 to m2 if the rotation
#equals the corresponding index
r <- c(1:10)
c <- c(1:10)
for (i in 1:nrow(m1)) {
if(m1[i,1] == r & m1[i,2] == c)
m2[r,c]+1
}
no frequencies where calculated, i don't understand why?
It appears that you are trying to replicate the behavior of table. I'd recommend just using it instead.
Simpler data (it appears you did not include variable x):
m1 <-
matrix(round(runif(20, 1,10))
, ncol = 2)
Then, use table. Here, I am setting the values of each column to be a factor to ensure that the right columns are generated:
table(factor(m1[,1], 1:10)
, factor(m1[,2], 1:10))
gives:
1 2 3 4 5 6 7 8 9 10
1 3 4 0 4 2 0 5 3 2 0
2 3 7 9 7 4 5 3 4 5 2
3 4 6 3 10 8 9 4 2 7 3
4 5 2 14 3 7 13 8 11 3 3
5 2 13 2 5 8 5 7 7 8 6
6 1 10 7 4 5 6 8 5 8 5
7 3 3 6 5 4 5 4 8 7 7
8 5 5 8 7 6 10 5 4 3 4
9 2 5 8 4 7 4 4 6 4 2
10 3 1 2 3 3 5 3 5 1 0

repeat sequences from vector

Say I have a vector like so:
vector <- 1:9
#$ [1] 1 2 3 4 5 6 7 8 9
I now want to repeat every i to i+x sequence n times, like so for x=3, and n=2:
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
I'm accomplishing this like so:
index <- NULL
x <- 3
n <- 2
for (i in 1:(length(vector)/3)) {
index <- c(index, rep(c(1:x + (i-1)*x), n))
}
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
This works just fine, but I have a hunch there's got to be a better way (especially since usually, a for loop is not the answer).
Ps.: the use case for this is actually repeating rows in a dataframe, but just getting the index vector would be fine.
You can try to first split the vector, then use rep and unlist:
x <- 3 # this is the length of each subset sequence from i to i+x (see above)
n <- 2 # this is how many times you want to repeat each subset sequence
unlist(lapply(split(vector, rep(1:(length(vector)/x), each = x)), rep, n), use.names = FALSE)
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Or, you can try creating a matrix and converting it to a vector:
c(do.call(rbind, replicate(n, matrix(vector, ncol = x), FALSE)))
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9

How to replace the NA values after merge two data.frame? [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

How do I split a vector into a list of vectors when a condition is met?

I would like to split a vector of into a list of vectors. The resulting vectors will be of variable length, and I need the split to occur only when certain conditions are met.
Sample data:
set.seed(3)
x <- sample(0:9,100,repl=TRUE)
For example, in this case I would like to split the above vector x at each 0.
Currently I do this with my own function:
ConditionalSplit <- function(myvec, splitfun) {
newlist <- list()
splits <- which(splitfun(x))
if (splits == integer(0)) return(list(myvec))
if (splits[1] != 1) newlist[[1]] <- myvec[1:(splits[1]-1)]
i <- 1
imax <- length(splits)
while (i < imax) {
curstart <- splits[i]
curend <- splits[i+1]
if (curstart != curend - 1)
newlist <- c(newlist, list(myvec[curstart:(curend-1)]))
i <- i + 1
}
newlist <- c(newlist, list(myvec[splits[i]:length(vector)]))
return(newlist)
}
This function gives the output I'd like, but I'm certain there's a better way than mine.
> MySplit <- function(x) x == 0
> ConditionalSplit(x, MySplit)
[[1]]
[1] 1 8 3 3 6 6 1 2 5 6 5 5 5 5 8 8 1 7 8 2 2
[[2]]
[1] 0 1
[[3]]
[1] 0 2 7 5 9 5 7 3 3 1 4 2 3 8 2 5 2 2 7 1 5 4 2
...
The following line seems to work just fine:
split(x,cumsum(x==0))
Another solution is to use tapply. A good reason to use tapply instead of split is because it lets you perform other operations on the items in the list while you're splitting it.
For example, in this solution to the question:
> x <- sample(0:9,100,repl=TRUE)
> idx <- cumsum(x==0)
> splitList <- tapply(x, idx, function(y) {list(y)})
> splitList
$`0`
[1] 2 9 2
$`1`
[1] 0 5 5 3 8 4
$`2`
[1] 0 2 5 2 6 2 2
$`3`
[1] 0 8 1 7 5
$`4`
[1] 0 1 6 6 3 8 7 2 4 2 3 1
$`5`
[1] 0 6 8 9 9 1 1 2
$`6`
[1] 0 1 2 2 2 7 8 1 9 7 9 3 4 8 4 6 4 5 3 1
$`7`
[1] 0 2 7 8 5
$`8`
[1] 0 3 4 8 4 7 3
$`9`
[1] 0 8 4
$`10`
[1] 0 4 3 9 9 8 7 4 4 5 5 1 1 7 3 9 7 4 4 7 7 6 3 3
Can be modified so that you divide each element by the number of elements in that list.
list(y/length(y))
instead of
list(y)

Resources