R time series with missing dates [duplicate] - r

This question already has answers here:
R ts with missing values
(4 answers)
Closed 4 years ago.
I have a vector of numbers and a corresponding vector of dates (monthly). Some months are missing so I would like to create a time series object that includes NA for the missing dates.
x = c(1,2,3,4)
dates = c('2000-01-01','2000-02-01','2000-04-01','2000-07-01')
Is there an easy way to get a time series object that goes from '2000-01-01' to '2000-07-01' that includes NAs for the missind dates?

You can use padr package to do that
df <- data.frame(x = c(1,2,3,4),
dates = c('2000-01-01','2000-02-01','2000-04-01','2000-07-01'))
library(padr)
df %>%
mutate(dates = as.Date(dates)) %>%
pad()
pad applied on the interval: month
x dates
1 1 2000-01-01
2 2 2000-02-01
3 NA 2000-03-01
4 3 2000-04-01
5 NA 2000-05-01
6 NA 2000-06-01
7 4 2000-07-01

Related

Is there a way to left_join two sequences with various dates so that both appear? [duplicate]

This question already has answers here:
Is there an R dplyr method for merge with all=TRUE?
(2 answers)
Closed 2 years ago.
I am interested in combining two dataframes with differing dates so that for the first sequence, NAs appear where there is no entry and similarly, this happens also for the second sequence.
library(lubridate)
library(dplyr)
dates <- seq(as.Date("2019-01-01"),floor_date(Sys.Date(), "month"),"months")-1
seq <- rep(c(1),length(dates))
df1 <- cbind.data.frame(dates,seq)
dates <- seq(as.Date("2019-01-01"),floor_date(Sys.Date(), "month"),"months")-2
seq <- rep(c(2),length(dates))
df2 <- cbind.data.frame(dates,seq)
DF <- left_join(df1,df2,by="dates")
What I get is the third row of the DF are NAs. Desired output would be:
2018-12-31 1 NA
2018-12-30 NA 2
and so on ...
Thank you!!
You probably are looking for full_join :
library(dplyr)
full_join(df1,df2, by="dates") %>% arrange(dates)
# dates seq.x seq.y
#1 2018-12-30 NA 2
#2 2018-12-31 1 NA
#3 2019-01-30 NA 2
#4 2019-01-31 1 NA
#5 2019-02-27 NA 2
#6 2019-02-28 1 NA
#...
#...

How can I reshape my data frame to be used with cor() in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am varily new to R and data wrangling.
I would like to calculate a correlation matrix using the cor()-function and for that I have to reshape my data frame. It consists of 1500 different articles (StockCodes) of an online store, with a sale history of 365 days per article (Quantity per Date per StockCode).
I need to reshape it in the form that the Dates become the new row names and the StockCodes become the new column names.
Concider this short example:
my_df <- data.frame(Stockcode = c("A","A", "B","B", "C", "C"), Quantity = c(1,5,3,2,1,4), Date = c("2010-12-01","2010-12-02","2010-12-01","2010-12-02","2010-12-01","2010-12-02") )
looking like this:
Stockcode Quantity Date
1 A 1 2010-12-01
2 A 5 2010-12-02
3 B 3 2010-12-01
4 B 2 2010-12-02
5 C 1 2010-12-01
6 C 4 2010-12-02
And I want it to be transformed into:
df_reshaped <- data.frame(A = c(1,5), B = c(3,2), C = c(1,4), row.names = c("2010-12-01","2010-12-02"))
looking like this:
A B C
2010-12-01 1 3 1
2010-12-02 5 2 4
I achieved this with a for-loop (and successfully calculated my correlation matrix), but the loop took "ages" to be executed (approx. 4 hours).
Is there a proper and faster way?
I would highly appreciate any help!
Here is a way using pivot_wider from tidyr:
my_df %>%
pivot_wider(names_from = Stockcode, values_from = Quantity) %>% ## pivot columns in wide format
column_to_rownames(var = "Date") ## convert Date column to row names
# A B C
#2010-12-01 1 3 1
#2010-12-02 5 2 4

R data table rows subtraction [duplicate]

This question already has answers here:
subtract value from previous row by group
(3 answers)
Closed 4 years ago.
I have a data table with 117 objects (rows) and 51 variables (columns). I would like to subtract each row from the previous one and post the results in a new data table.
My data table are a time series of interest rates and I want to calculate the daily difference.
apply(dt, MARGIN = 2, diff)
would calculate, for each column, the difference between each element and the previous one.
Try:
a = data.frame(matrix(c(1,1,1,3,3,3,7,7,7),byrow = T,nrow=3))
apply(a,2,diff)
Let's say you have this as example data:
df <- data.frame(date = as.Date(c("2019-01-03", "2019-01-04", "2019-01-05", "2019-01-06")), value = c(3,5,7,6))
date value
1 2019-01-03 3
2 2019-01-04 5
3 2019-01-05 7
4 2019-01-06 6
Then using dplyr from tidyverse you can do this:
library(tidyverse)
df2 <- df %>%
mutate(difference = lag(value, n=1L) - value)
date value difference
1 2019-01-03 3 NA
2 2019-01-04 5 -2
3 2019-01-05 7 -2
4 2019-01-06 6 1
... you'll just need to decide what to do with that first NA in row index 1.

Turn list with dates into data frame in R [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 5 years ago.
I have troubles with converting a list containing dates into a date.frame, as the dates are converted into integers when using the unlist command.
The list I work on looks similar to this, just with way more data frames:
list(
data.frame(
date = as.POSIXct(Sys.time() + days(seq(0, 4))),
value = c(4,5,1,7,9)),
data.frame(
date = as.POSIXct(Sys.time() + days(seq(5, 9))),
value = c(3,3,5,1,7))
)
What I am looking for a method to convert it into a single data.frame that look like this:
date value
1 2017-07-24 14:30:18 4
2 2017-07-25 14:30:18 5
3 2017-07-26 14:30:18 1
4 2017-07-27 14:30:18 7
5 2017-07-28 14:30:18 9
6 2017-07-29 14:30:18 3
7 2017-07-30 14:30:18 3
8 2017-07-31 14:30:18 5
9 2017-08-01 14:30:18 1
10 2017-08-02 14:30:18 7
We can use bind_rows
library(dplyr)
bind_rows(lst)
Or with base R
do.call(rbind, lst)
Or using data.table
library(data.table)
rbindlist(lst)

How to set dates outwidth a specific time period to NA in R

I have a large dataframe (x) in which one of my columns (Visit_Date) contains a number of dates.
Unfortunately, some data in this column refers to DoB and not Visit_Date. For example, the dates in this column should only from 01/01/2015 to 01/03/2017, but I have dates such as 16/09/1964.
My question is, how can I set all dates prior to 01/01/2015 to NA?
Here is my simple solution.
library(lubridate)
x <- data.frame(
value = 1:5,
Visit_Date = c("01/01/2015","21/02/2015",
"01/03/2015","16/09/1964",
"01/09/2015")
)
x$Visit_Date <- dmy(x$Visit_Date)
index <- x$Visit_Date < dmy("01/01/2015")
x[index,"Visit_Date"] <- NA
x
# value Visit_Date
# 1 1 2015-01-01
# 2 2 2015-02-21
# 3 3 2015-03-01
# 4 4 <NA>
# 5 5 2015-09-01

Resources