seq can only use a single value in the by parameter. Is there a way to vectorize by, i.e. to use multiple intervals?
Something like this:
seq(1, 10, by = c(1, 2))
would return c(1, 2, 4, 5, 7, 8, 10). Now, this is possible to do this with e.g. seq(1, 10, by = 1)[c(T, T, F)] because it's a simple case, but is there a way to make it generalizable to more complex sequences?
Some examples
seq(1, 100, by = 1:5)
#[1] 1 2 4 7 11 16 17 19 22 26 31...
seq(8, -5, by = c(3, 8))
#[1] 8 5 -3
this looks like a close base R solution?
ans <- Reduce(`+`, rep(1:5, 100), init = 1, accumulate = TRUE)
ans[1:(which.max(ans >= 100) - 1)]
[1] 1 2 4 7 11 16 17 19 22 26 31 32 34 37 41 46 47 49 52 56 61 62 64
[24] 67 71 76 77 79 82 86 91 92 94 97
you would have to inverse a part of it, if you want to calculate 'down'
ans <- Reduce(`+`, rep(c(-3, -8), 20), init = 8, accumulate = TRUE)
ans[1:(which.max(ans <= -5) - 1)]
[1] 8 5 -3
still, you would have tu 'guess' the number of repetitions needed (20 of 100 in the examples above) to create ans.
This doesn't seem possible Maƫl. I suppose it's easy enough to write one?
seq2 <- function(from, to, by) {
vals <- c(0, cumsum(rep(by, abs(ceiling((to - from) / sum(by))))))
if(from > to) return((from - vals)[(from - vals) >= to])
else (from + vals)[(from + vals) <= to]
}
Testing:
seq2(1, 10, by = c(1, 2))
#> [1] 1 2 4 5 7 8 10
seq2(1, 100, by = 1:5)
#> [1] 1 2 4 7 11 16 17 19 22 26 31 32 34 37 41 46 47 49 52 56 61 62 64 67 71
#> [26] 76 77 79 82 86 91 92 94 97
seq2(8, -5, by = c(3, 8))
#> [1] 8 5 -3
Created on 2022-12-23 with reprex v2.0.2
Related
We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...
We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21
Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1
Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)
We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21
You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100
A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21
I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?
Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])
we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )
I have a list which has multiple vectors (total 80) of various lengths. On the x-axis I want the names of these vectors. On the y-axis I want to plot the values corresponding to each vector. How can I do it in R?
One way to do this is to reshape the data using reshape2::melt or some other method. Please try and make a reproducible example. I think this is the gist of what you are after:
set.seed(4)
mylist <- list(a = sample(1:50, 10, T),
b = sample(25:40, 15, T),
c = sample(51:75, 20, T))
mylist
# $a
# [1] 30 1 15 14 41 14 37 46 48 4
#
# $b
# [1] 37 29 26 40 31 32 40 34 40 37 36 40 33 32 35
#
# $c
# [1] 71 63 72 63 64 65 56 72 67 63 75 62 66 60 51 74 57 65 55 73
library(ggplot2)
library(reshape2)
df <- melt(mylist)
head(df)
# value L1
# 1 30 a
# 2 1 a
# 3 15 a
# 4 14 a
# 5 41 a
# 6 14 a
ggplot(df, aes(x = factor(L1), y = value)) + geom_point()
Ok, first of all let me generate some sample data:
A_X01 <- c(34, 65, 23, 43, 22)
A_X02 <- c(2, 4, 7, 8, 3)
B_X01 <- c(24, 45, 94, 23, 54)
B_X02 <- c(4, 2, 4, 9, 1)
C_X01 <- c(34, 65, 876, 45, 87)
C_X02 <- c(123, 543, 86, 87, 34)
Var <- c(3, 5, 7, 2, 3)
DF <- data.frame(A_X01, A_X02, B_X01, B_X02, C_X01, C_X02, Var)
What I want to do is apply an equation to the concurrent columns of A and B for both X01 and X02, with a third column "Var" used in the equation.
So far I have been doing this the following way:
DF$D_X01 <- (DF$A_X01 + DF$B_X01) * DF$Var
DF$D_X02 <- (DF$A_X02 + DF$B_X02) * DF$Var
My desired output is as follows:
A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
1 34 2 24 4 34 123 3 174 18
2 65 4 45 2 65 543 5 550 30
3 23 7 94 4 876 86 7 819 77
4 43 8 23 9 45 87 2 132 34
5 22 3 54 1 87 34 3 228 12
As you'll appreciate this is a lot of lines of code to do something fairly simple. Meaning at present my scripts are rather long (as I have multiple columns in the actual dataset)!
One of the apply functions must be the way to go but I can't seem to get my head around it for concurrent columns. I did think about using lapply but how would I get this to work for the two lists of columns and for the right columns to be added together?
I've looked around and can't seem to find a way to do this which must be a fairly common problem?
Thanks.
EDIT:
Original question was a bit confusing so have updated with a desired output and some extra conditions.
Try this
indx <- gsub("\\D", "", grep("A_X|B_X", names(DF), value = TRUE)) # Retrieving indexes
indx2 <- DF[grep("A_X|B_X", names(DF))] # Considering only the columns of interest
DF[paste0("D_X", unique(indx))] <-
sapply(unique(indx), function(x) rowSums(indx2[which(indx == x)])*DF$Var)
DF
# A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
# 1 34 2 24 4 34 123 3 174 18
# 2 65 4 45 2 65 543 5 550 30
# 3 23 7 94 4 876 86 7 819 77
# 4 43 8 23 9 45 87 2 132 34
# 5 22 3 54 1 87 34 3 228 12
You may also try
indxA <- grep("^A", colnames(DF))
indxB <- grep("^B", colnames(DF))
f1 <- function(x,y,z) (x+y)*z
DF[sprintf('D_X%02d', indxA)] <- Map(f1 , DF[indxA], DF[indxB], list(DF$Var))
DF
# A_X01 A_X02 B_X01 B_X02 C_X01 C_X02 Var D_X01 D_X02
#1 34 2 24 4 34 123 3 174 18
#2 65 4 45 2 65 543 5 550 30
#3 23 7 94 4 876 86 7 819 77
#4 43 8 23 9 45 87 2 132 34
#5 22 3 54 1 87 34 3 228 12
Or you could use mapply
DF[sprintf('D_X%02d', indxA)] <- mapply(`+`, DF[indxA],DF[indxB])*DF$Var
I know that the filter() function in R calculate the moving average. I would like to know if exists a function that return me the moving variance or standard deviation, in order to show it in a plot side by side with the output of filter() function.
Consider the zoo package. For example filter() gives:
> filter(1:100, rep(1/3,3))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] NA 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA
whereas rollmean() in zoo gives:
> rollmean(1:100, k = 3, na.pad = TRUE)
[1] NA 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA
which is the same (for a 3 point moving average in this example).
Whilst zoo doesn't have a rollsd() or rollvar() it does have rollapply(), which works like the apply() functions to apply any R function to the specified window.
> rollapply(1:100, width = 3, FUN = sd, na.pad = TRUE)
[1] NA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[76] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated
or on something more interesting:
> rollapply(vec, width = 3, FUN = sd, na.pad = TRUE)
[1] NA 0.3655067 0.8472871 0.5660495 0.3491970 0.4732417 0.9236859
[8] 0.8075226 1.8725851 1.1930784 0.6329325 1.1412416 0.8430772 0.5808005
[15] 0.3838545 1.1738170 1.1655400 1.3241700 0.6876834 0.1534157 0.4858477
[22] 0.9843506 0.6002713 0.6897541 2.0619563 2.5675788 6.3522039 6.0066864
[29] 6.2618432 5.1704866 2.1360853 2.5602557 1.0408528 1.0316396 4.9441628
[36] 5.0319314 5.7589716 3.2425000 4.8788158 2.0847286 4.5199291 2.5323486
[43] 2.1987149 1.8393000 1.2278639 1.5998965 1.5341485 4.4287108 4.4159166
[50] 4.3224546 3.6959067 4.9826264 5.3134044 8.4084322 9.1249234 7.5506725
[57] 3.8499136 3.9680487 5.6362296 4.9124095 4.3452706 4.0227141 4.5867559
[64] 4.7350394 4.3203807 4.4506799 7.2759499 7.6536424 7.8487654 2.0905576
[71] 4.0056880 5.6209853 1.5551659 1.3615268 2.8469458 2.8323588 1.9848578
[78] 1.1201124 1.4248380 1.7802571 1.4281773 2.5481935 1.8554451 1.0925410
[85] 2.1823722 2.2788755 2.4205378 2.0733741 0.7462248 1.3873578 1.4265948
[92] 0.7212619 0.7425993 1.0696432 2.4520585 3.0555819 3.1000885 1.0945292
[99] 0.3726928 NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated
You can get rid of the warning by using the fill = NA argument, as in
> rollapply(vec, width = 3, FUN = sd, fill = NA)
The TTR package has runSD among others:
> library(TTR)
> ls("package:TTR", pattern="run*")
[1] "runCor" "runCov" "runMAD" "runMax" "runMean"
[6] "runMedian" "runMin" "runSD" "runSum" "runVar"
runSD will be much faster than rollapply because it avoids making many R function calls. For example:
rzoo <- function(x,n) rollapplyr(x, n, sd, fill=NA)
rttr <- function(x,n) runSD(x, n)
library(rbenchmark)
set.seed(21)
x <- rnorm(1e4)
all.equal(rzoo(x,250), rttr(x,250))
# [1] TRUE
benchmark(rzoo(x,250), rttr(x,250))[,1:6]
# test replications elapsed relative user.self sys.self
# 2 rttr(x, 250) 100 0.58 1.000 0.58 0.00
# 1 rzoo(x, 250) 100 54.53 94.017 53.85 0.06
rollapply in the zoo package takes an arbitrary function. It's different from filter in that it excludes the NAs by default.
That being said, though, there's not much sense in loading a package for a function that's so simple to roll yourself (pun intended).
Here's one that's right aligned:
my.rollapply <- function(vec, width, FUN)
sapply(seq_along(vec),
function(i) if (i < width) NA else FUN(vec[i:(i-width+1)]))
set.seed(1)
vec <- sample(1:50, 50)
my.rollapply(vec, 3, sd)
[1] NA NA 7.094599 12.124356 16.522712 18.502252 18.193405 7.234178 8.144528
[10] 14.468356 12.489996 3.055050 20.808652 19.467922 18.009257 18.248288 15.695010 7.505553
[19] 10.066446 11.846237 17.156146 6.557439 5.291503 23.629078 22.590558 21.197484 22.810816
[28] 24.433583 19.502137 16.165808 11.503623 12.288206 9.539392 13.051181 13.527749 19.974984
[37] 19.756855 17.616280 19.347696 18.248288 15.176737 6.082763 10.000000 10.016653 4.509250
[46] 2.645751 1.527525 5.291503 10.598742 6.557439
# rollapply output for comparison
rollapply(vec, width=3, sd, fill=NA, align='right')
[1] NA NA 7.094599 12.124356 16.522712 18.502252 18.193405 7.234178 8.144528
[10] 14.468356 12.489996 3.055050 20.808652 19.467922 18.009257 18.248288 15.695010 7.505553
[19] 10.066446 11.846237 17.156146 6.557439 5.291503 23.629078 22.590558 21.197484 22.810816
[28] 24.433583 19.502137 16.165808 11.503623 12.288206 9.539392 13.051181 13.527749 19.974984
[37] 19.756855 17.616280 19.347696 18.248288 15.176737 6.082763 10.000000 10.016653 4.509250
[46] 2.645751 1.527525 5.291503 10.598742 6.557439
runner function in runner package applies any R function on running windows. With runner one can specify simple window by setting the length k or lag. Below moving sd as suggested by OOP on 4-elements windows.
library(runner)
set.seed(1)
x <- rnorm(20, sd = 1)
runner(x, sd, k = 4, na_pad = TRUE)
#[1] NA NA NA 1.1021597 0.9967429 1.1556947 0.9884053 0.6902835 0.7180483 0.4647160
#[11] 0.7454670 0.7489618 0.9449882 1.5821988 1.4459037 1.3889432 1.3954101 0.6193867 0.5296744 0.4266423
To apply running functions on date-windows one should specify idx. Length of idx should be the same length as x and should be of type Date or integer. Example below illustrates window of size k = 4 lagged by lag = 1. In parentheses index ranges for each window.
idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner::runner(x = 1:15,
k = 5,
lag = 1,
idx = idx,
f = function(x) mean(x))
# [1] NA 1.0 1.5 NA 4.0 4.5 4.5 6.0 NA 9.0 NA 11.0 12.0 12.5 13.5
More info in documentation and vignettes