subset data getSymbols quantmod - r

subset data e.g. all previous year and store as new object.
mtdl <- na.omit(getSymbols("MTDL.JK", auto.assign = F, src = "yahoo", periodicity = "weekly"))
week.year.mtdl <- mtdl %>%
filter(DATE >= as.Date("2018-01-01") & DATE <= as.Date("2018-12-31"))

Here are a few ways to go about this if you want to use dplyr.
1 transform xts into data.frame
df_mtdl <- data.frame(date = index(mtdl), coredata(mtdl))
week.year.mtdl <- df_mtdl %>%
filter(date >= as.Date("2018-01-01") & date <= as.Date("2018-12-31"))
head(week.year.mtdl)
date MTDL.JK.Open MTDL.JK.High MTDL.JK.Low MTDL.JK.Close MTDL.JK.Volume MTDL.JK.Adjusted
1 2018-01-01 650 650 620 630 78200 609.6684
2 2018-01-08 630 650 610 610 291800 590.3138
3 2018-01-15 610 750 600 700 9390700 677.4093
4 2018-01-22 700 730 640 700 6816200 677.4093
5 2018-01-29 700 745 685 685 119900 662.8934
6 2018-02-05 695 715 630 635 1533000 614.5070
2 use tidyquant. This returns a tibble instead of an xts object. Tidyquant is built on top of quantmod and a lot of other packages.
library(tidyquant)
tq_mtdl <- tq_get("MTDL.JK", complete_cases = TRUE, periodicity = "weekly")
week.year.mtdl <- tq_mtdl %>%
filter(date >= as.Date("2018-01-01") & date <= as.Date("2018-12-31"))
head(week.year.mtdl)
# A tibble: 6 x 7
date open high low close volume adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-01-04 645 645 620 625 137000 605.
2 2018-01-11 620 660 600 645 1460000 624.
3 2018-01-18 645 750 635 660 13683700 639.
4 2018-01-25 680 745 665 685 1359700 663.
5 2018-02-01 700 715 675 700 922200 677.
6 2018-02-08 695 695 630 690 673700 668.
Or use packages timetk (used as part of tidyquant) or tsbox to transform the data from xts to data.frame or tibble.

This will give 2018 points of an xts object
mtdl["2018"]
All of these also work:
subset(mtdl, time(.) >= "2018-01-01" & time(.) <= "2018-12-31")
subset(mtdl, start = "2018-01-01", end = "2018-12-31")
window(mtdl, start = "2018-01-01", end = "2018-12-31")
dates <- seq(as.Date("2008-01-01"), as.Date("2008-12-31"), "day")
window(mtdl, dates)
mtdl[dates] # dates is from above
mtdl[ format(time(mtdl), "%Y") == 2018 ]

Related

How can I sort by date a column that only has month and day with R?

My data frame look a bit like this:
x freq
1-Apr 892
1-Aug 1221
1-Dec 923
1-Feb 880
1-Jan 889
...
And I can´t seem to sort them in order
You could do:
df[order(as.Date(df$x, "%d-%b")), ]
x freq
5 1-Jan 889
4 1-Feb 880
1 1-Apr 92
2 1-Aug 1221
3 1-Dec 923
Data
df <- read.table(text = "
x freq
1-Apr 92
1-Aug 1221
1-Dec 923
1-Feb 880
1-Jan 889",
header = TRUE)
Attempts
Going off of what Alexlok's answer,
df %>%
mutate(x = as.Date(df$x, format = "%d-%b"))
x freq
2020-04-01 92
2020-08-01 1221
2020-12-01 923
2020-02-01 880
2020-01-01 889
However, you can see this adds in the year (e.g., 1-Jan is 2020-01-01).
This post is helpful, but the format changes from date to character. However, you are able to sort by date.
df %>%
mutate(x = format(as.Date(x, format = "%d-%b"), format = "%m-%d")) %>%
arrange(x)
x freq
01-01 889
02-01 880
04-01 92
08-01 1221
12-01 923

Convert HMM /HHMM time column to timestamp in R

I am new here please be gentle ;)
I have two time columns in a dataframe in R that uses the HMM /HHMM format as a numeric. For example, 03:13 would be 313 and 14:14 would be 1414. An example would be sched_arr_time and sched_dep_time in the nycflights13 package.
I need to calculate the time difference in minutes. My SQL knowledge tells me I would substring this with a case when and then glue it back together as a time format somehow but I was hoping there is a more elegant way in R to deal with this?
Many thanks for your help!
This would explain the data:
library(nycflights13)
flights %>% select(sched_dep_time, sched_arr_time)
We can convert to time class with as.ITime after changing the format to HH:MM with str_pad and str_replace, and then take the difference using difftime
library(dplyr)
library(stringr)
library(data.table)
flights %>%
head %>%
select(sched_dep_time, sched_arr_time) %>%
mutate_all(~ str_pad(., width = 4, pad = 0) %>%
str_replace(., '^(..)', '\\1:') %>%
as.ITime) %>%
mutate(diff = difftime(sched_arr_time, sched_dep_time, unit = 'min'))
# A tibble: 6 x 3
# sched_dep_time sched_arr_time diff
# <ITime> <ITime> <drtn>
#1 05:15:00 08:19:00 184 mins
#2 05:29:00 08:30:00 181 mins
#3 05:40:00 08:50:00 190 mins
#4 05:45:00 10:22:00 277 mins
#5 06:00:00 08:37:00 157 mins
#6 05:58:00 07:28:00 90 mins
If we want to add a 'Date' as well, then we
library(lubridate)
flights %>%
head %>%
select(sched_dep_time, sched_arr_time) %>%
mutate_all(~ str_pad(., width = 4, pad = 0) %>%
str_replace("^(..)(..)", "\\1:\\2:00") %>%
str_c(Sys.Date(), ., sep=' ') %>%
ymd_hms) %>%
mutate(diff = difftime(sched_arr_time, sched_dep_time, unit = 'min'))
Here is another option using strptime
as_time <- function(x)
as.POSIXct(strptime(if_else(nchar(x) == 3, paste0("0", x), as.character(x)), "%H%M"))
flights %>%
select(sched_dep_time, sched_arr_time) %>%
mutate(diff_in_mins = difftime(as_time(sched_arr_time), as_time(sched_dep_time), "mins"))
## A tibble: 336,776 x 3
# sched_dep_time sched_arr_time diff_in_mins
# <int> <int> <drtn>
# 1 515 819 184 mins
# 2 529 830 181 mins
# 3 540 850 190 mins
# 4 545 1022 277 mins
# 5 600 837 157 mins
# 6 558 728 90 mins
# 7 600 854 174 mins
# 8 600 723 83 mins
# 9 600 846 166 mins
#10 600 745 105 mins
## … with 336,766 more rows

How to find the % difference between values on 2 rows in a data frame in R

I want to find the difference between the current value and the previous value and display the table as a % difference between them.
The code to find the difference between the 2 consecutive rows was:
abcfin <- abcfin %>% mutate_if(is.numeric, list( ~ . - lag(.)))
The code that I have used to get the result is:
asdfg <- abcfin %>% mutate_if(is.numeric, list(ifelse(lag(.)!=0,(. - lag(.))*100/ lag(.)), 0))
However, I am getting the following error:
Error in -.Date(left, right) : can only subtract from "Date" objects
In addition: Warning message:
In matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, :
data length [5974] is not a sub-multiple or multiple of the number of rows [543]
Kindly let me know the right code statement that I can use to obtain the required results:
Here's another way that might be easier to follow if you're new to R:
library(tidyverse)
date <- seq(as.Date("2015/1/1"), by = "month", length.out = 6)
var1 <- c(723, 983, 437, 732, 173, 537)
var2 <- c(753, 769, 352, 853, 143, 485)
df <- data.frame(date, var1, var2)
df <- df %>% mutate(var1_prev = lag(var1), var2_prev = lag(var2))
df <- df[-1,] #removes unnecessary first row
df <- df %>% mutate(var1_perdiff = (var1 - var1_prev)/var1_prev * 100,
var2_perdiff = (var2 - var2_prev)/var2_prev * 100)
as_tibble(df)
# A tibble: 5 x 7
#date var1 var2 var1_prev var2_prev var1_perdiff var2_perdiff
#<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2015-02-01 983 769 723 753 36.0 2.12
#2 2015-03-01 437 352 983 769 -55.5 -54.2
#3 2015-04-01 732 853 437 352 67.5 142.
#4 2015-05-01 173 143 732 853 -76.4 -83.2
#5 2015-06-01 537 485 173 143 210. 239.
data.table solution:
# set to data.table
library(data.table)
df <- setDT(df)
# your percentage function
perct.fun <- function(x){-100 + (x/shift(x,1,type = "lag"))*100}
# add the new variables
v <- c("var1","var2")
df[order(date), (paste0(v, "_diff")) := lapply(.SD, perct.fun), .SDcols=v]
date var1 var2 var1_diff var2_diff
1: 2015-01-01 723 753 NA NA
2: 2015-02-01 983 769 35.96127 2.124834
3: 2015-03-01 437 352 -55.54425 -54.226268
4: 2015-04-01 732 853 67.50572 142.329545
5: 2015-05-01 173 143 -76.36612 -83.235639
6: 2015-06-01 537 485 210.40462 239.160839

R- create dataset by removing duplicates based on a condition - filter

I have a data frame where for each day, I have several prices.
I would like to modify my data frame with the following code :
newdf <- Data %>%
filter(
if (Data$Date == Data$Echeance) {
Data$Close == lag(Data$Close,1)
} else {
Data$Close == Data$Close
}
)
However, it is not giving me what I want, that is :
create a new data frame where the variable Close takes its normal value, unless the day of Date is equal to the day of Echeance. In this case, take the following Close value.
I added filter because I wanted to remove the duplicate dates, and keep only one date per day where Close satisfies the condition above.
There is no error message, it just doesn't give me the right database.
Here is a glimpse of my data:
Date Echeance Compens. Open Haut Bas Close
1 1998-03-27 00:00:00 1998-09-10 00:00:00 125. 828 828 820 820. 197
2 1998-03-27 00:00:00 1998-11-10 00:00:00 128. 847 847 842 842. 124
3 1998-03-27 00:00:00 1999-01-11 00:00:00 131. 858 858 858 858. 2
4 1998-03-30 00:00:00 1998-09-10 00:00:00 125. 821 821 820 820. 38
5 1998-03-30 00:00:00 1998-11-10 00:00:00 129. 843 843 843 843. 1
6 1998-03-30 00:00:00 1999-01-11 00:00:00 131. 860 860 860 860. 5
Thanks a lot in advance.
Sounds like a use case for ifelse, with dplyr:
library(dplyr)
Data %>%
mutate(Close = ifelse(Date==Echeance, lead(Close,1), Close))
Here an example:
dat %>%
mutate(var_new = ifelse(date1==date2, lead(var,1), var))
# A tibble: 3 x 4
# date1 date2 var var_new
# <date> <date> <int> <int>
# 1 2018-03-27 2018-03-27 10 11
# 2 2018-03-28 2018-01-01 11 11
# 3 2018-03-29 2018-02-01 12 12
The function lead will move the vector by 1 position. Also note that I created a var_new just to show the difference, but you can mutate directly var.
Data used:
dat <- tibble(date1 = seq(from=as.Date("2018-03-27"), to=as.Date("2018-03-29"), by="day"),
date2 = c(as.Date("2018-03-27"), as.Date("2018-01-01"), as.Date("2018-02-01")),
var = 10:12)
dat
# A tibble: 3 x 3
# date1 date2 var
# <date> <date> <int>
# 1 2018-03-27 2018-03-27 10
# 2 2018-03-28 2018-01-01 11
# 3 2018-03-29 2018-02-01 12

How to calculate the sequential date diff in a dataframe and make it as another column for further analysis?

Please before make it as duplicate read carefully my question!
I am new in R and I am trying to figure it out how to calculate the sequential date difference from one row/variable compare to the next row/variable in based on weeks and create another field/column for making a graph accordingly.
There are couple of answer here Q1 , Q2 , Q3 but none specifically talk about making difference in one column sequentially between rows lets say from top to bottom.
Below is the example and the expected results:
Date Var1
2/6/2017 493
2/20/2017 558
3/6/2017 595
3/20/2017 636
4/6/2017 697
4/20/2017 566
5/5/2017 234
Expected
Date Var1 week
2/6/2017 493 0
2/20/2017 558 2
3/6/2017 595 4
3/20/2017 636 6
4/6/2017 697 8
4/20/2017 566 10
5/6/2017 234 12
You can use a similar approach to that in your first linked answer by saving the difftime result as a new column in your data frame.
# Set up data
df <- read.table(text = "Date Var1
2/6/2017 493
2/20/2017 558
3/6/2017 595
3/20/2017 636
4/6/2017 697
4/20/2017 566
5/5/2017 234", header = T)
df$Date <- as.Date(as.character(df$Date), format = "%m/%d/%Y")
# Create exact week variable
df$week <- difftime(df$Date, first(df$Date), units = "weeks")
# Create rounded week variable
df$week2 <- floor(difftime(df$Date, first(df$Date), units = "weeks"))
df
# Date Var1 week week2
# 2017-02-06 493 0.000000 weeks 0 weeks
# 2017-02-20 558 2.000000 weeks 2 weeks
# 2017-03-06 595 4.000000 weeks 4 weeks
# 2017-03-20 636 6.000000 weeks 6 weeks
# 2017-04-06 697 8.428571 weeks 8 weeks
# 2017-04-20 566 10.428571 weeks 10 weeks
# 2017-05-05 234 12.571429 weeks 12 weeks

Resources