merge xts objects with suffixes dynamically - r

The task is to dynamically merge multiple xts objects into one big blob with suffixes, where suffixes are essentially the column names for the xts objects.
Sample Data:
library(xts)
a <- data.frame(alpha=1:10, beta=2:11)
xts1 <- xts(x=a, order.by=Sys.Date() - 1:10)
b <- data.frame(alpha=3:12, beta=4:13)
xts2 <- xts(x=b, order.by=Sys.Date() - 1:10)
c <- data.frame(alpha=5:14, beta=6:15)
xts3 <- xts(x=c, order.by=Sys.Date() - 1:10)
Static way of merging:
$> merge.zoo(xts1, xts2, xts3, suffixes=c("A", "B", "C"))
# output
alpha.A beta.A alpha.B beta.B alpha.C beta.C
2022-03-11 10 11 12 13 14 15
2022-03-12 9 10 11 12 13 14
2022-03-13 8 9 10 11 12 13
2022-03-14 7 8 9 10 11 12
2022-03-15 6 7 8 9 10 11
2022-03-16 5 6 7 8 9 10
2022-03-17 4 5 6 7 8 9
2022-03-18 3 4 5 6 7 8
2022-03-19 2 3 4 5 6 7
2022-03-20 1 2 3 4 5 6
I might have more than 3 xts objets and 3 is just an arbitrary demonstration.
I've tried do.call but my do.call attempt failed with/out suffixes because I can't wrap up the 3 xts objects into a data structure that's accepted by do.call as a list (in R's language it should be a vector of 3 items).
do.call Demo:
# do.call with xts objects as separate args and suffixes, works
do.call(merge.zoo, list(xts1, xts2, xts3, suffixes=c("A", "B", "C")))
# do.call with xts objects wrapped up as list and suffixes, failed
# because R takes each element of list as vector and essentially a list of xts objects is a list of 3 lists, each of which has a xts object.
xts.list <- list(xts1, xts2, xts3)
# check data type
class(xts.list[[1]]) # output: xts,zoo
class(xts.list[1]) # output: list
# do.call failed attempt
do.call(merge.zoo, list(xts.list, suffixes=c("A", "B", "C")))
# Error Message
Error in zoo(structure(x, dim = dim(x)), index(x), ...) :
“x” : attempt to define invalid zoo object
In other words, if I can unpack the list into dynamic number of arguments, I'd be able to get this idea to work; however I can't seem to find a way to either unpack arguments in R or some other solutions.
Disclaimer: The ultimate problem I am trying to solve is to be able to plot the time series data in the multi-panel view eventually; ggplot does not work with most of the packages I am using on a daily basis.
Disclaimer 2: merge.xts ignores suffixes (a bug), merge.zoo is the working alternative. For more information take a look here

We can pass everything in a list i.e.
library(zoo)
c(xts.list, list(suffixes=c("A", "B", "C")))
Now, use the merge in do.call
do.call(merge.zoo, c(xts.list, list(suffixes=c("A", "B", "C"))))
-output
alpha.A beta.A alpha.B beta.B alpha.C beta.C
2022-03-11 10 11 12 13 14 15
2022-03-12 9 10 11 12 13 14
2022-03-13 8 9 10 11 12 13
2022-03-14 7 8 9 10 11 12
2022-03-15 6 7 8 9 10 11
2022-03-16 5 6 7 8 9 10
2022-03-17 4 5 6 7 8 9
2022-03-18 3 4 5 6 7 8
2022-03-19 2 3 4 5 6 7
2022-03-20 1 2 3 4 5 6
Note that the first argument to merge is variadic component (... - which can take one or more xts data, where as all the other components are named and that is the reason we are creating the list with names only for those components i.e. suffixes. According to ?merge
merge(...,
all = TRUE,
fill = NA,
suffixes = NULL,
join = "outer",
retside = TRUE,
retclass = "xts",
tzone = NULL,
drop=NULL,
check.names=NULL)
Thus, when we want to append a list i.e. xts.list with another list element, wrap the second named vector in a list and then just concatenate. It is similar to
> c(list(1), list(a = 1, b = 2))
[[1]]
[1] 1
$a
[1] 1
$b
[1] 2
and not the following as this create a nested list
> list(list(1), list(a = 1, b = 2))
[[1]]
[[1]][[1]]
[1] 1
[[2]]
[[2]]$a
[1] 1
[[2]]$b
[1] 2

This is now (finally) fixed in this commit. Thanks for the nudge to get this fixed!
library(xts)
idx <- Sys.Date() - 1:10
x1 <- xts(cbind(alpha = 1:10, beta = 2:11), idx)
x2 <- xts(cbind(alpha = 3:12, beta = 4:13), idx)
x3 <- xts(cbind(alpha = 5:14, beta = 6:15), idx)
suffixes <- LETTERS[1:3]
merge(x1, x2, x3, suffixes = suffixes)
## alpha.A beta.A alpha.B beta.B alpha.C beta.C
## 2022-05-13 10 11 12 13 14 15
## 2022-05-14 9 10 11 12 13 14
## 2022-05-15 8 9 10 11 12 13
## 2022-05-16 7 8 9 10 11 12
## 2022-05-17 6 7 8 9 10 11
## 2022-05-18 5 6 7 8 9 10
## 2022-05-19 4 5 6 7 8 9
## 2022-05-20 3 4 5 6 7 8
## 2022-05-21 2 3 4 5 6 7
## 2022-05-22 1 2 3 4 5 6

Related

New df in R pulling from large existing df

In R, am trying to take this df I have called "gorilla" and create four new dfs by column identifiers. The "gorilla" spreadsheet has a column called "order", and this column has values that are either 1, 2, 3, and 4. I want to create a new df with "1" values only, and another one with "2" values only, etc. What is the best way to do this?
If you do:
list2env(setNames(split(gorilla, gorilla$order), paste0("gorilla", 1:4)),
envir = globalenv())
Then you will have the 4 data frames in your workspace, called gorilla1, gorilla2, gorilla3 and gorilla4
For example, if we have this dataset:
set.seed(100)
gorilla <- data.frame(data = rnorm(10), order = sample(4, 10, TRUE))
gorilla
#> data order
#> 1 -0.50219235 3
#> 2 0.13153117 4
#> 3 -0.07891709 2
#> 4 0.88678481 1
#> 5 0.11697127 4
#> 6 0.31863009 3
#> 7 -0.58179068 3
#> 8 0.71453271 4
#> 9 -0.82525943 2
#> 10 -0.35986213 1
We can do:
list2env(setNames(split(gorilla, gorilla$order), paste0("gorilla", 1:4)),
envir = globalenv())
#> <environment: R_GlobalEnv>
And now we can see we have these objects available:
gorilla1
#> data order
#> 4 0.8867848 1
#> 10 -0.3598621 1
gorilla2
#> data order
#> 3 -0.07891709 2
#> 9 -0.82525943 2
gorilla3
#> data order
#> 1 -0.5021924 3
#> 6 0.3186301 3
#> 7 -0.5817907 3
gorilla4
#> data order
#> 2 0.1315312 4
#> 5 0.1169713 4
#> 8 0.7145327 4
Note though that it is probably best in most circumstances to keep the data frames in a list:
gorillas <- split(gorilla, gorilla$order)
That way, you can just access gorillas[[1]] , gorillas[[2]] etc
An optioin with group_split
library(dplyr)
gorillas <- gorilla %>%
group_split(order)

How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

I have a column:
Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
I would like to split into multiple columns, based on the positions of the column values. For instance, I would like:
Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)
Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period.
You can use the base split to split this vector into vectors that are each 5 items long. You could also use a variable to store this interval length.
Using rep with each = 5, and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. Then split returns a list of vectors.
It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here.
Y <- 1:20
breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#>
#> $`2`
#> [1] 6 7 8 9 10
#>
#> $`3`
#> [1] 11 12 13 14 15
#>
#> $`4`
#> [1] 16 17 18 19 20
Created on 2019-02-12 by the reprex package (v0.2.1)
Not a dplyr solution, but I believe the easiest way would involve using matrices.
foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}
I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind()
We can make use of split (writing the commented code as solution) to split the vector into a list of vectors.
lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5
#$`2`
#[1] 6 7 8 9 10
#$`3`
#[1] 11 12 13 14 15
#$`4`
#[1] 16 17 18 19 20
Here, the gl create a grouping index by specifying the n, k and length parameters where n - an integer giving the number of levels, k - an integer giving the number of replications, and length -an integer giving the length of the result.
In our case, we want to have 'k' as 5.
as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
If we want to have multiple objects in the global environment, use list2env
list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1] 6 7 8 9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20
Or as the OP mentioned dplyr/tidyr in the question, we can use those packages as well
library(tidyverse)
tibble(Y) %>%
group_by(grp = (row_number()-1) %/% 5 + 1) %>%
summarise(Y = list(Y)) %>%
pull(Y)
#[[1]]
#[1] 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 10
#[[3]]
#[1] 11 12 13 14 15
#[[4]]
#[1] 16 17 18 19 20
data
Y <- 1:20

Moving down columns in data frames in R

Suppose I have the next data frame:
df<-data.frame(step1=c(1,2,3,4),step2=c(5,6,7,8),step3=c(9,10,11,12),step4=c(13,14,15,16))
step1 step2 step3 step4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
and what I have to do is something like the following:
df2<-data.frame(col1=c(1,2,3,4,5,6,7,8,9,10,11,12),col2=c(5,6,7,8,9,10,11,12,13,14,15,16))
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
How can I do that? consider that more steps can be included (example, 20 steps).
Thanks!!
We can design a function to achieve this task. df_final is the final output. Notice that bin is an argument that the users can specify how many columns to transform together.
# A function to conduct data transformation
trans_fun <- function(df, bin = 3){
# Calculate the number of new columns
new_ncol <- (ncol(df) - bin) + 1
# Create a list to store all data frames
df_list <- lapply(1:new_ncol, function(num){
return(df[, num:(num + bin - 1)])
})
# Convert each data frame to a vector
dt_list2 <- lapply(df_list, unlist)
# Convert dt_list2 to data frame
df_final <- as.data.frame(dt_list2)
# Set the column and row names of df_final
colnames(df_final) <- paste0("col", 1:new_ncol)
rownames(df_final) <- 1:nrow(df_final)
return(df_final)
}
# Apply the trans_fun
df_final <- trans_fun(df)
df_final
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16
Here is a method using dplyr and reshape2 - this assumes all of the columns are the same length.
library(dplyr)
library(reshape2)
Drop the last column from the dataframe
df[,1:ncol(df)-1]%>%
melt() %>%
dplyr::select(col1=value) -> col1
Drop the first column from the dataframe
df %>%
dplyr::select(-step1) %>%
melt() %>%
dplyr::select(col2=value) -> col2
Combine the dataframes
bind_cols(col1, col2)
This should do the work:
df2 <- data.frame(col1 = 1:(length(df$step1) + length(df$step2)))
df2$col1 <- c(df$step1, df$step2, df$step3)
df2$col2 <- c(df$step2, df$step3, df$step4)
Things to point:
The important thing to see in the first line of the code, is the need for creating a table with the right amount of rows
Calling a columns that does not exist will create one, with that name
Deleting columns in R should be done like this df2$col <- NULL
Are you not just looking to do:
df2 <- data.frame(col1 = unlist(df[,-nrow(df)]),
col2 = unlist(df[,-1]))
rownames(df2) <- NULL
df2
col1 col2
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
7 7 11
8 8 12
9 9 13
10 10 14
11 11 15
12 12 16

(Using a custom function to) Sum above N rows in a datatable (dataframe) by groups

I need a function that sums the above N+1 rows in dataframes (data tables) by groups.
An equivalent function for a vector, would be something like below. (Please forgive me if the function below is inefficient)
Function1<-function(x,N){
y<-vector(length=length(x))
for (i in 1:length(x))
if (i<=N)
y[i]<-sum(x[1:i])
else if (i>N)
y[i]<-sum(x[(i-N):i])
return(y)}
Function1(c(1,2,3,4,5,6),3)
#[1] 1 3 6 10 14 18 # Sums previous (above) 4 values (rows)
I wanted to use this function with sapply, like below..
sapply(X=DF<-data.frame(A=c(1:10), B=2), FUN=Function1(N=3))
but couldn't.. because I could not figure out how to set a default for the x in my function. Thus, I built another function for data.frames.
Function2<-function(x, N)
if(is.data.frame(x)) {
y<-data.frame()
for(j in 1:ncol(x))
for(i in 1:nrow(x))
if (i<=N) {
y[i,j]<-sum(x[1:i,j])
} else if (i>N) {
y[i,j]<-sum(x[(i-N):i,j])}
return(y)}
DF<-data.frame(A=c(1:10), B=2)
Function2(DF, 2)
# V1 V2
1 1 2
2 3 4
3 6 6
4 9 6
5 12 6
6 15 6
7 18 6
8 21 6
9 24 6
10 27 6
However, I still need to perform this by groups. For example, for the following data frame with a character column.
DF<-data.frame(Name=rep(c("A","B"),each=5), A=c(1:10), B=2)
I would like to apply my function by group "Name" -- which would result in.
A 1 2
A 3 4
A 6 6
A 9 6
A 12 6
B 6 2
B 13 4
B 21 6
B 24 6
B 27 6
#Perform function2 separately for group A and B.
I was hoping to use function with the data.table package (by=Groups), but couldn't figure out how.
What would be the best way to do this?
(Also, it would be really nice, if I could learn how to make my Function1 to work in sapply)
With data.table, we group by 'Name', loop through the columns of interest specified in .SDcols (here all the columns are of interest so we are not specifying it) and apply the Function1
library(data.table)
setDT(DF)[, lapply(.SD, Function1, 2), Name]
# Name A B
# 1: A 1 2
# 2: A 3 4
# 3: A 6 6
# 4: A 9 6
# 5: A 12 6
# 6: B 6 2
# 7: B 13 4
# 8: B 21 6
# 9: B 24 6
#10: B 27 6

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

Resources