Grouped multiplication of matrices with unequal dimensions - r

I am new to R and trying to determine how I can do the following:
I have 2 matrices, each row is a date and each column is a number. The second matrix is much longer than the first. I want to create a function that will multiply the first row (say its a January number) by the first 4 rows of second matrix (which are all January numbers as well). So, I'm looking for 4 results. Then I want to move to the second row of the first matrix (February number) and multiply it by the 4 February numbers from the second matrix. Eventually, I am hoping to get to the code that will multiply the first by the second if the month and years match.
First Matrix
Jan 2007 143.75
Feb 2007 140.93
Second Matrix
2007-01-05 12.14
2007-01-12 10.15
2007-01-19 10.40
2007-01-26 11.13
2007-02-02 10.08
2007-02-09 11.10
2007-02-16 10.02
2007-02-23 10.58

Assuming those are both matrices, and that the dates on the left are the row names, you can try something along these lines. Here we match the months of the row names of two matrices and use it to create a vector for the calculation.
idx <- match(format(as.Date(rownames(m2)), "%b"), sub(" .*", "", rownames(m1)))
m2 * m1[idx]
# [,1]
# 2007-01-05 1745.125
# 2007-01-12 1459.062
# 2007-01-19 1495.000
# 2007-01-26 1599.938
# 2007-02-02 1420.574
# 2007-02-09 1564.323
# 2007-02-16 1412.119
# 2007-02-23 1491.039
Data:
m1 <- structure(c(143.75, 140.93), .Dim = c(2L, 1L), .Dimnames = list(
c("Jan 2007", "Feb 2007"), NULL))
m2 <- structure(c(12.14, 10.15, 10.4, 11.13, 10.08, 11.1, 10.02, 10.58
), .Dim = c(8L, 1L), .Dimnames = list(c("2007-01-05", "2007-01-12",
"2007-01-19", "2007-01-26", "2007-02-02", "2007-02-09", "2007-02-16",
"2007-02-23"), NULL))
Note: You haven't given us much information in your post, like whether or not you are doing this for multiple years, whether the dates are the row names or columns, etc. If you are doing this for multiple years, then please post a more representative data example with desired result.

We can try
row.names(m2) <- format(as.Date(row.names(m2)), '%b %Y')
transform(merge(m1, m2, by = "row.names"), new = V1.x * V1.y)
# Row.names V1.x V1.y new
#1 Feb 2007 140.93 10.08 1420.574
#2 Feb 2007 140.93 11.10 1564.323
#3 Feb 2007 140.93 10.02 1412.119
#4 Feb 2007 140.93 10.58 1491.039
#5 Jan 2007 143.75 12.14 1745.125
#6 Jan 2007 143.75 10.15 1459.062
#7 Jan 2007 143.75 10.40 1495.000
#8 Jan 2007 143.75 11.13 1599.938
data
m1 <- structure(c(143.75, 140.93), .Dim = c(2L, 1L),
.Dimnames = list(
c("Jan 2007", "Feb 2007"), NULL))
m2 <- structure(c(12.14, 10.15, 10.4, 11.13, 10.08, 11.1,
10.02, 10.58), .Dim = c(8L, 1L),
.Dimnames = list(c("2007-01-05", "2007-01-12",
"2007-01-19", "2007-01-26", "2007-02-02", "2007-02-09",
"2007-02-16", "2007-02-23"), NULL))

Related

Join time series in R

I would like to create a time series with a monthly interval by extending an already existing time series.
I have "t1" time series:
structure(c(49.25, 49.25, 30, 99.25, 99.25, 100.5, 101,
91.25), .Dim = c(1L, 8L), .Dimnames = list(NULL, c("2021-03-31",
"2022-03-31", "2022-05-31", "2022-09-30", "2022-12-31", "2023-03-31",
"2023-05-31", "2023-09-30")), .Tsp = c(1, 1, 1), class = c("mts",
"ts", "matrix"))
I would like to extend the above series to include monthly observations. How can I do this?
The object in the question is in a strange form. It consists of 9 separate time series with column names given by character dates. First extract the character dates and the values into a zoo object with yearmon time class -- yearmon directly represents a year and month without day. Ensure that it has frequency 12 and convert it to ts class which will have the effect of filling in the missing months. Finally extend it to the desired date.
library(zoo)
z <- zoo(t1[-1], as.yearmon(colnames(t1)[-1]), frequency = 12)
tt <- window(as.ts(z), end = c(2024, 11), extend = TRUE)
tt
giving this ts object:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2021 49.25 NA NA NA NA NA NA NA NA NA
2022 NA NA 49.25 NA 30.00 NA NA NA 99.25 NA NA 99.25
2023 NA NA 100.50 NA 101.00 NA NA NA 91.25 NA NA NA
2024 NA NA NA NA NA NA NA NA NA NA NA
Note that you can use View(as.zoo(tt)) to view tt and can use na.approx(tt, na.rm = FALSE, rule = 2) to fill in internal NAs with interpolated values and trailing NAs with the last non-NA value.
Note
The input is shown in the question as:
t1 <- structure(c(49.25, 49.25, 30, 99.25, 99.25, 100.5, 101, 91.25), .Dim = c(1L, 8L), .Dimnames = list(NULL, c("2021-03-31", "2022-03-31", "2022-05-31", "2022-09-30", "2022-12-31", "2023-03-31", "2023-05-31", "2023-09-30")), .Tsp = c(1, 1, 1), class = c("mts", "ts", "matrix"))

Concat two different Data frames horizontally [duplicate]

I have two lists named h and g.
They each contain 244 dataframes and they look like the following:
h[[1]]
year avg hr sal
1 2010 0.300 31 2000
2 2011 0.290 30 4000
3 2012 0.275 14 600
4 2013 0.280 24 800
5 2014 0.295 18 1000
6 2015 0.330 26 7000
7 2016 0.315 40 9000
g[[1]]
year pos fld
1 2010 A 0.990
2 2011 B 0.995
3 2013 C 0.970
4 2014 B 0.980
5 2015 D 0.990
I want to cbind these two dataframes.
But as you see, they have different number of rows.
I want to combine these dataframes so that the rows with the same year will be combined in one row. And I want the empty spaces to be filled with NA.
The result I expect looks like this:
year avg hr sal pos fld
1 2010 0.300 31 2000 A 0.990
2 2011 0.290 30 4000 B 0.995
3 2012 0.275 14 600 NA NA
4 2013 0.280 24 800 C 0.970
5 2014 0.295 18 1000 B 0.980
6 2015 0.330 26 7000 D 0.990
7 2016 0.315 40 9000 NA NA
Also, I want to repeat this for all the 244 dataframes in each list, h and g.
I'd like to make a new list named final which contains the 244 combined dataframes.
How can I do this...?
All answers will be greatly appreciated :)
I think you should instead use merge:
merge(df1, df2, by="year", all = T)
For your data:
df1 = data.frame(matrix(0, 7, 4))
names(df1) = c("year", "avg", "hr", "sal")
df1$year = 2010:2016
df1$avg = c(.3, .29, .275, .280, .295, .33, .315)
df1$hr = c(31, 30, 14, 24, 18, 26, 40)
df1$sal = c(2000, 4000, 600, 800, 1000, 7000, 9000)
df2 = data.frame(matrix(0, 5, 3))
names(df2) = c("year", "pos", "fld")
df2$year = c(2010, 2011, 2013, 2014, 2015)
df2$pos = c('A', 'B', 'C', 'B', 'D')
df2$fld = c(.99,.995,.97,.98,.99)
cbind is meant to column-bind two dataframes that are in all sense compatible. But what you aim to do is actual merge, where you want the elements from the two data frames not be discarded, and for missing values you get NA instead.
We can use Map with cbind.fill (from rowr) to cbind the corresponding 'data.frame' from 'h' and 'g'.
library(rowr)
Map(cbind.fill, h, g, MoreArgs = list(fill=NA))
Update
Based on the expected output showed, it seems like the OP wanted a merge instead of cbind
f1 <- function(...) merge(..., all = TRUE, by = 'year')
Map(f1, h, g)
#[[1]]
# year avg hr sal pos fld
#1 2010 0.300 31 2000 A 0.990
#2 2011 0.290 30 4000 B 0.995
#3 2012 0.275 14 600 <NA> NA
#4 2013 0.280 24 800 C 0.970
#5 2014 0.295 18 1000 B 0.980
#6 2015 0.330 26 7000 D 0.990
#7 2016 0.315 40 9000 <NA> NA
Or as #Colonel Beauvel mentioned, this can be made compact
Map(merge, h, g, by='year', all=TRUE)
data
h <- list(structure(list(year = 2010:2016, avg = c(0.3, 0.29, 0.275,
0.28, 0.295, 0.33, 0.315), hr = c(31L, 30L, 14L, 24L, 18L, 26L,
40L), sal = c(2000L, 4000L, 600L, 800L, 1000L, 7000L, 9000L)), .Names = c("year",
"avg", "hr", "sal"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7")))
g <- list(structure(list(year = c(2010L, 2011L, 2013L, 2014L, 2015L
), pos = c("A", "B", "C", "B", "D"), fld = c(0.99, 0.995, 0.97,
0.98, 0.99)), .Names = c("year", "pos", "fld"), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5")))
Here is how you could do this with tidyverse tools:
library(tidyverse)
h <- list()
g <- list()
h[[1]] <- tribble(
~year, ~avg, ~hr, ~sal,
2010, 0.300, 31, 2000,
2011, 0.290, 30, 4000,
2012, 0.275, 14, 600,
2013, 0.280, 24, 800,
2014, 0.295, 18, 1000,
2015, 0.330, 26, 7000,
2016, 0.315, 40, 9000
)
g[[1]] <- tribble(
~year, ~pos, ~fld,
2010, "A", 0.990,
2011, "B", 0.995,
2013, "C", 0.970,
2014, "B", 0.980,
2015, "D", 0.990
)
map2(h, g, left_join)
Which produces:
[[1]]
# A tibble: 7 x 6
year avg hr sal pos fld
<dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 2010 0.3 31 2000 A 0.99
2 2011 0.290 30 4000 B 0.995
3 2012 0.275 14 600 NA NA
4 2013 0.28 24 800 C 0.97
5 2014 0.295 18 1000 B 0.98
6 2015 0.33 26 7000 D 0.99
7 2016 0.315 40 9000 NA NA

R: Convert dyamic date format to date class?

I have a data set that contains a simple column consisting of dates, like this:
Dates
1 2012/04/10
2 2012/03/30
3 2012/03/24
4 2012/03/25
5 2012/04/10
6 2012/04/14
7 2012/04/21
My desired output is this:
Dates DateName
1 2012/04/10 April 2012
2 2015/03/30 March 2015
3 2011/03/24 March 2011
4 2016/12/25 December 2016
5 2014/06/10 June 2014
6 2014/05/14 May 2014
7 2018/07/21 August 2018
To do this I used the following code:
dt$Dates <- as.Date(dt$Dates)
dt$DateName <- format(dt$Dates,"%B %Y")
Whilst this works fine, my new column comes out a character class. I wish for this to come out as a date class instead. This is because I cannot sort this column by calendar date. Rather, it sorts alphabetically.
Is there a way to class or re-class my new date format as some sort of date or calander class?
(I'm not necessarily looking for a base-R solution).
(If possible, I would also highly prefer to keep my new format as is).
I have tried the following lines of code and more, but these only return errors.
dt$DateName <- format.Date(dt$Dates,"%B %Y")
dt$DateName <- format.POSIXlt(dt$Dates,"%B %Y")
dt$DateName <- format.difftime(dt$Dates,"%B %Y")
dt$DateName <- as.Date(dt$Dates, format ="%B %Y")
You can convert dates to yearmon class :
dt$month_year <- zoo::as.yearmon(dt$Dates, "%Y/%m/%d")
dt
# Dates month_year
#1 2012/04/10 Apr 2012
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
class(dt$month_year)
#[1] "yearmon"
You can then sort them
dt[order(dt$month_year), ]
# Dates month_year
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#1 2012/04/10 Apr 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
data
dt <- structure(list(Dates = structure(c(4L, 3L, 1L, 2L, 4L, 5L, 6L
), .Label = c("2012/03/24", "2012/03/25", "2012/03/30", "2012/04/10",
"2012/04/14", "2012/04/21"), class = "factor")), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7"))

percent change over several years

I am trying to figure out the total percent change in of all 5 variables from 2011 to 2015. The following function will give me percent change by year, but I am aiming for the overall percentage. How would one rewrite this in R?
pcchange=function(x,lag=1) c(diff(x,lag),rep(NA,lag))/x
> ssample
year 100 100F 100I 100X
1 2011 6632 6 472 11
2 2012 6783 6 513 11
3 2013 7346 7 672 6
4 2014 8017 9 682 10
5 2015 8996 3 815 11
> dput(ssample)
structure(list(year = c(2011, 2012, 2013, 2014, 2015), `100` = c(6632L,
6783L, 7346L, 8017L, 8996L), `100F` = c(6L, 6L, 7L, 9L, 3L),
`100I` = c(472L, 513L, 672L, 682L, 815L), `100X` = c(11L,
11L, 6L, 10L, 11L)), class = "data.frame", .Names = c("year",
"100", "100F", "100I", "100X"), row.names = c(NA, -5L))
Keeping it simple, try
subset(ssample, year == 2015, -1) / subset(ssample, year == 2011, -1) * 100
Using ROC function from quantmod package for simple return calculation
require(quantmod)
apply(ssample[,-1],2,function(x) ROC(x,type="discrete"))
# 100 100F 100I 100X
#[1,] NA NA NA NA
#[2,] 0.02276840 0.0000000 0.08686441 0.0000000
#[3,] 0.08300162 0.1666667 0.30994152 -0.4545455
#[4,] 0.09134223 0.2857143 0.01488095 0.6666667
#[5,] 0.12211550 -0.6666667 0.19501466 0.1000000
As you can see the percentage change varies widely over the years. I suppose what you really require is compound annual growth rate or CAGR i.e. average return over the period defined as geometric mean of returns
annual.growth.rate <- function(a,period_length,m = 1){
FinalValue <- tail(a,1)
InitialValue <- head(a,1)
cagr <- ((FinalValue/InitialValue)^(1/(period_length/m))) -1
return(cagr)
}
num_of_years <- tail(ssample$year,1)-head(ssample$year,1)
apply(ssample[,-1],2,function(x) annual.growth.rate(a=x,period_length = num_of_years ,m = 1))
# 100 100F 100I 100X
#0.07919825 -0.15910358 0.14631481 0.00000000
The packages xts,quantmod and PerformanceAnalytics come in very handy for time series analysis
Here is one possibility for the sample data:
totalChange <- sapply(ssample[ssample$year %in% range(ssample$year), -1],
function(x) pcchange(x))[1,]

R - Adding numbers within a data frame cell together

I have a data frame in which the values are stored as characters. However, many values contain two numbers that need to be added together. Example:
2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
Product 1 3+6 2+10 8 13+2
Product 2 6 4+0 <NA> 5
Product 3 <NA> 5+9 3+1 11
Is there a way to go through the whole data frame and replace all cells containing characters like "3+6" with new values equal to their sum? I assume this would involve coercing the characters to numeric or integers, but I don't know how that would be possible for values with the + sign in them. I would like the example data frame to end up looking like this:
2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
Product 1 9 12 8 15
Product 2 6 4 <NA> 5
Product 3 <NA> 14 4 11
Here's an easier example:
dat <- data.frame(a=c("3+6", "10"), b=c("12", NA), c=c("3+4", "5+6"))
dat
## a b c
## 1 3+6 12 3+4
## 2 10 <NA> 5+6
apply(dat, 1:2, function(x) eval(parse(text=x)))
## a b c
## [1,] 9 12 7
## [2,] 10 NA 11
Using R itself to do the computation with eval and parse does the trick.
Here is one option with gsubfn without using eval(parse. We convert the 'data.frame' to 'matrix' (as.matrix(dat)). We match the numbers ([0-9]+), capture it as a group using parentheses ((..)) followed by +, followed by second set of numbers, and replace it by converting to numeric class and then do the +. The output can be assigned back to the original dataset to get the same structure as in 'dat'.
library(gsubfn)
dat[] <- as.numeric(gsubfn('([0-9]+)\\+([0-9]+)',
~as.numeric(x)+as.numeric(y), as.matrix(dat)))
dat
# 2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
#Product 1 9 12 8 15
#Product 2 6 4 NA 5
#Product 3 NA 14 4 11
Or we can loop the columns with lapply and perform the replacement with gsubfn for each of the columns.
dat[] <- lapply(dat, function(x) as.numeric(gsubfn('([0-9]+)\\+([0-9]+)',
~as.numeric(x)+as.numeric(y), as.character(x))))
data
dat <- structure(list(`2014 Q1 Sales` = structure(c(1L, 2L, NA), .Label = c("3+6",
"6"), class = "factor"), `2014 Q2 Sales` = structure(1:3, .Label = c("2+10",
"4+0", "5+9"), class = "factor"), `2014 Q3 Sales` = structure(c(2L,
NA, 1L), .Label = c("3+1", "8"), class = "factor"), `2014 Q4 Sales` = structure(c(2L,
3L, 1L), .Label = c("11", "13+2", "5"), class = "factor")), .Names = c("2014 Q1 Sales",
"2014 Q2 Sales", "2014 Q3 Sales", "2014 Q4 Sales"), class = "data.frame", row.names = c("Product 1",
"Product 2", "Product 3"))

Resources