R Help - na.approx similar to SuperTrend in Excel - r

Raw Data na.approx desired result
1 1 1
NA 3 4
5 5 5
6 6 6
7 7 7
NA 8 4
NA 9 7
10 10 10
13 11 13
14 12 14
By default, i believe na.approx in R will interpolate NA between two known values; one before and another after NA (the result will be seen as column "na.approx" above). Is there a way I can change this function to interpolate based on next two known values? for eg, first NA to be interpolated using 5 and 6.... but not 1 and 5.

I am not sure if there is an exact equivalent to what you want to do, but you can achieve similar results the following way:
> data <- c(1, NA, 5,6,7,NA,NA,10,13,14)
> ind <- which(is.na(data))
> sapply(rev(ind), function(i) data[i] <<- data[i + 1] - 1)
> data
[1] 1 4 5 6 7 8 9 10 13 14

Related

Sum two sequences in R to yield a third sequence [duplicate]

This question already has answers here:
R - sum each element in a vector with each element of other vector
(2 answers)
Closed 1 year ago.
I have a sequence of numbers in R
A <- c(1,4,2,5,3,6)
I have a second sequence as follows
B <- c(0,6,12)
I would like to sum the elements of the two sequences such that I get the following:
final_output = c(1,4,2,5,3,6, 7,10,8,11,9,12,13,16,14,17,15,18)
I have tried A + B but am getting:
1 10 14 5 9 18
I am unable to get the answer. Could someone guide me?
This creates a list of sequences and then unlist() with merge all of the sequences together.
Does this work:
B<- c(0, 6,12)
A<-c(1,4,2,5,3,6)
unlist(lapply(B, function(x){x+A}))
[1] 1 4 2 5 3 6 7 10 8 11 9 12 13 16 14 17 15 18
vec <- c()
for(i in 1:length(B)){
vec <- c(vec, A + B[i])
}
Using outer -
c(outer(A, B, `+`))
#[1] 1 4 2 5 3 6 7 10 8 11 9 12 13 16 14 17 15 18
Another option using rowSums and expand.grid:
rowSums(expand.grid(A, B))
which gives:
[1] 1 4 2 5 3 6 7 10 8 11 9 12 13 16 14 17 15 18

How to create a table with flexible columns based on variables control in R?

I want to create a tale like:
1 1 6 6 10 10 ...
2 2 7 7 11 11 ...
3 3 8 8 12 12 ...
4 4 9 9 13 13 ...
5 5 14 14 ...
15 15 ...
I want to use variables:
n (repeat) and m(total number of columns) and k(k=the prior columns's end number+1,for example: 6=5+1, and 10=9+1), and different number length of row
to create a table.
I know I can use like:
rep(list(1:5,6:9,10:15), each = 2)),
but how to make them as parameters using a general expression to list list(1:5,6:9,10:15,..use n,m,k expression...).
I tried to use loop for (i in 1:m) etc.. but cannot work it out
finally I want a sequence by using unlist(): 1,2,3,4,5,6,1,2,3,4,5,6......)
Many thanks.
Maybe the code below can help
len <- c(5,4,6)
res <- unlist(unname(rep(split(1:sum(len),
findInterval(1:sum(len),cumsum(len)+1)),
each = 2)))
which gives
> res
[1] 1 2 3 4 5 1 2 3 4 5 6 7 8 9 6 7 8 9 10 11 12 13 14 15 10 11 12 13 14 15
Probably, something like this would be helpful.
#Number of times to repeat
r <- 2
#Length of each sequence
len <- c(5, 4, 6)
#Get the end of the sequence
end <- cumsum(Glen)
#Calculate the start of each sequence
start <- c(1, end[-length(end)] + 1)
#Create a sequence of start and end and repeat it r times
Map(function(x, y) rep(seq(x, y), r), start, end)
#[[1]]
# [1] 1 2 3 4 5 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 6 7 8 9
#[[3]]
# [1] 10 11 12 13 14 15 10 11 12 13 14 15
You could unlist to get it as one vector.
unlist(Map(function(x, y) rep(seq(x, y), r), start, end))

Use values from one vector to change values in another vector in R (for loop)

I need to change certain values in one vector depending on what the values are in the same places on in another vector. Below are my vectors:
r <- (1:20)
a <- c(54,54,54,54,55,55,50,0,0,0,0,0,0,1,1,1,1,1,56,57)
Basically if any of the values in 'a' are greater than or equal to and less than 20 (so any value in a that is 0-20) I want to change that value in 'r' to be itself -1. If the value in 'a' is greater than 20 or less than 0, then I want its value in 'r' to remain the same. So for the 8th spot in 'a' the value is 0 which is greater than/equal to 0 and less than 20 so I want the 8th spot in 'r' (has a value of 8) subtract by 1 (so now its value will be 7). But for the first spot in 'a', the value is 54 which is greater than 20 so the 1st value in r will remain the same. I assumed I needed to write a for loop for this and I started to but it's not doing what I need it to do. This is what I have so far.
for(i in a){
if (i >= 0 && i < 20){
r[i] = r[i]-1
} else {
r[i] = r[i]
}
}
When I run this code it returns r as
[1] -4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 NA NA NA NA NA NA NA NA NA NA NA
[32] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
How can I get it to return the correct result which should look like this:
[1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 14 15 16 17 19 20
Thank you!
Maybe this would help,
r <- (1:20)
a <- c(54,54,54,54,55,55,50,0,0,0,0,0,0,1,1,1,1,1,56,57)
r[a >= 0 & a < 20] <- r[a >= 0 & a < 20] - 1
# [1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 14 15 16 17 19 20
You don't need a loop here the above answer is the most efficient.
We can loop through the sequence
for(i in seq_along(a)) if(a[i] >=0 && a[i] < 20) r[i] <- r[i] -1
r
#[1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 14 15 16 17 19 20
instead of the values of 'a' because r[54] doesn't exist and assigning on that element results in NA
I guess a compact base R solution is to use ifelse
ifelse(a>=0 & a< 20,r-1,r)
which gives
> ifelse(a>=0 & a< 20,r-1,r)
[1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 14 15 16 17 19 20
You can do:
r - (a >= 0 & a < 20)
[1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 14 15 16 17 19 20

r - Extract subsequences with specific time increments

I have a data frame df. It has several columns, two of them are dates and serial_day, corresponding to the date an observation was taken and MATLAB's serial day. I would like to restrict my time series such that the increment (in days) between two consecutive observations is 3 or 4 and separate such blocks by a NA row.
It is known that consecutive daily observations never occur and the case of 2 day separation followed by 2 day separation is rare, so it can be ignored.
In the example, increment is shown for convenience, but it is easily generated using the diff function. So, if the data frame is
serial_day increment
1 4 NA
2 7 3
3 10 3
4 12 2
5 17 5
6 19 2
7 22 3
8 25 3
9 29 4
10 34 5
I would hope to get a new data frame as:
serial_day increment
1 4 NA
2 7 3
3 10 3
4 NA ## Entire row of NAs NA
5 19 NA
6 22 3
7 25 3
8 29 4
9 NA ## Entire row of NAs NA
I can't figure out a way to do this without looping, which is bad idea in R.
First you check in which rows the increment is not equal to 3 or 4. Then you'd replace these rows with a row of NAs:
inds <- which( df$increment > 4 | df$increment < 3 )
df[inds, ] <- rep(NA, ncol(df))
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 5 NA NA
# 6 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA
This may result in multiple consecutive rows of NAs. In order to reduce these consecutive NA-rows to a single NA-row, you'd check where the NA-rows are located with which() and then see whether these locations are consecutive with diff() and remove these rows from df:
NArows <- which(rowSums(is.na(df)) == ncol(df)) # c(4, 5, 6, 10)
inds2 <- NArows[c(FALSE, diff(NArows) == 1)] # c(5, 6)
df <- df[-inds2, ]
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA

How to give a "/" in a column name to a dataframe in R?

I wish to give a "/" (backslash) in a column name in a dataframe. Any idea how?
I tried following to no avail,
tmp1 <- data.frame("Cost/Day"=1:10,"Days"=11:20)
tmp1
Cost.Day Days
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I then tried this, it worked.
tmp <- data.frame(1:10,11:20)
colnames(tmp) <- c("Cost/Day","Days")
tmp
Cost/Day Days
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I would prefer giving the name while constructing the dataframe itself. I tried escaping it but it still didn't work.
tmp2 <- data.frame("Cost\\/Day"=1:10,"Days"=11:20)
tmp2
You can use check.names=FALSE in the data.frame. By default, it is TRUE. And when it is TRUE, the function make.names changes the colnames. ie.
make.names('Cost/Day')
#[1] "Cost.Day"
So, try
dat <- data.frame("Cost/Day"=1:10,"Days"=11:20, check.names=FALSE)
head(dat,2)
# Cost/Day Days
#1 1 11
#2 2 12
The specific lines in data.frame function changing the column names is
--------
if (check.names)
vnames <- make.names(vnames, unique = TRUE)
names(value) <- vnames
--------

Resources