reading daily time series data in R using xts, error message - r

Dear Stackoverflow community,
I have been trying to read these set of daily stock market data using xts object and been getting different types of error messages, listed below.
The dataset contains 5030 observations, from 4/01/2000-22/07/2019.
I have checked for NAs in the dataset, and there are none
I have tried changing the format of the dataset from dd/mm/yyyy to yyyy/mm/dd, it doesnt seem to work
i checked to see if I change it to quarterly and then try to read it if it works, and it does.
So I think there is a problem with the code that I am using to read the daily data.
The dataset is the package SystemicR author's dataset called data_stock_returns, and im trying to recreate the results before I try my own dataset.
Below is the dataset and the code I tried.
Would really appreciate it if someone in the community could help out with this problem.
Thank You
Date
SXXP
STJ
ISP
INGA
Index
4/01/2000
0
0
-0.0209
-0.0274
1
5/01/2000
0
-0.02484
-0.0020
-0.00854
2
6/01/2000
0
0.0995
-0.0212
-0.00689
3
7/01/2000
0
0.061
0.02303
0.01961
4
10/01/2000
-0.00147
-0.0456
-0.0172
0.00119
5
..........
........
.......
.......
........
....
22/07/2019
0
-0.0127
0.00124
0.0029756
5030
df_my_data <- read.csv(('C:/Users/s/Desktop/R/intro/data/data_stock_returns.csv'), sep = ";")
str(df_my_data)
'data.frame': 5030 obs. of 74 variables:
$ Index : int 1 2 3 4 5 6 7 8 9 10 ...
$ SXXP : num 0 0 0 0 0 ...
$ STJ : num 0 -0.0248 0.0995 0.0611 -0.0456 ...
$ ISP : num -0.021 -0.0021 -0.0212 0.023 -0.0173 ...
xts(df_my_data, order.by = as.Date(rownames(df_my_data$Date), "%d/%m/%Y"))
df_my_data$Date <- as.Date(df_my_data$Date)
I get the below 2 error message
Error in $<-.data.frame(*tmp*, Date, value = numeric(0)) : replacement has 0 rows, data has 5030
Error in xts(df_my_data, order.by = as.Date(rownames(df_my_data), "%d/%m/%Y")) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
df_my_data$Date_xts <- as.xts(df_my_data[, -1], order.by = (df_my_data$Date))
I get another error message
Error in xts(x, order.by = order.by, frequency = frequency, ...) :
order.by requires an appropriate time-based object
library(SystemicR)
l_result<- f_CoVaR_Delta_CoVaR_i_q(data_stock_returns)

Note that questions to SO should show the data in reproducible form using dput as discussed at the top of the r tag home page.
As this was not done and since the .csv input was not shown we will
assume that the data shown is a data frame df as in the Note at the end. If that is not what you have then you will need to fix the question. If that is what you have then the problems with the code in the question are discussed in the following.
xts
Regarding converting df to an xts object we have these problems with the code in the question:
The use of row names. The data shown in the question does not have row names.
The code in the question is passing the index to both the x and order.by arguments of xts in th e first attempt. It should only be passed to the order.by argument. In the second attempt it has not converted the Date column to Date class.
The code would have worked with minor changes:
library(xts)
xts(df[-1], as.Date(df[[1]], "%d/%m/%Y")) # df in Note at tend
however, we we can avoid picking df apart and instead use the whole object approach by reading it into a zoo object and then converting it to xts.
library(xts)
z <- read.zoo(df, format = "%d/%m/%Y") # df in Note at end
x <- as.xts(z)
f_CoVaR_Delta_CoVaR_i_q
The help file for this function says its argument is a data frame, not an xts object. Using df from the Note at the end we have
library(SystemicR)
df2 <- transform(df, Date = as.Date(Date, "%d/%m/%Y"))
f_CoVaR_Delta_CoVaR_i_q(df2)
giving:
$CoVaR_i_q
[,1] [,2] [,3]
[1,] -0.0018355914 -0.002255029 0.0002579912
[2,] -0.0008255504 -0.001121190 -0.0011822728
$Delta_CoVaR_i_q
[1] -0.001010041 -0.001133839 0.001440264
Note
df <- structure(list(Date = c("4/01/2000", "5/01/2000", "6/01/2000",
"7/01/2000", "10/01/2000"), SXXP = c(0, 0, 0, 0, -0.00147), STJ = c(0,
-0.02484, 0.0995, 0.061, -0.0456), ISP = c(-0.0209, -0.002, -0.0212,
0.02303, -0.0172), INGA = c(-0.0274, -0.00854, -0.00689, 0.01961,
0.00119)), class = "data.frame", row.names = c(NA, -5L))
which looks like this:
> df
Date SXXP STJ ISP INGA
1 4/01/2000 0.00000 0.00000 -0.02090 -0.02740
2 5/01/2000 0.00000 -0.02484 -0.00200 -0.00854
3 6/01/2000 0.00000 0.09950 -0.02120 -0.00689
4 7/01/2000 0.00000 0.06100 0.02303 0.01961
5 10/01/2000 -0.00147 -0.04560 -0.01720 0.00119

Using your first two rows:
df <- data.frame(Date = c('4/01/2000', '5/01/2000'), SXXP=c(0,0), STJ=c(0,-0.02484), ISP=c(-0.0209,-0.0020), INGA=c(-0.0274, -0.00854))
df
Date SXXP STJ ISP INGA
1 4/01/2000 0 0.00000 -0.0209 -0.02740
2 5/01/2000 0 -0.02484 -0.0020 -0.00854
I imagine you'll want to do some further analysis and want SXXP & etc as numeric
ts_working <- xts(x = df[, 2:5], order.by=(as.POSIXlt(df$Date, format = '%d/%m/%Y')))
ts_working
SXXP STJ ISP INGA
2000-01-04 0 0.00000 -0.0209 -0.02740
2000-01-05 0 -0.02484 -0.0020 -0.00854
if you put xts(x=df...
ts_working <- xts(x = df, order.by=(as.POSIXlt(df$Date, format = '%d/%m/%Y'))) ts_working
Date SXXP STJ ISP INGA
2000-01-04 "4/01/2000" "0" " 0.00000" "-0.0209" "-0.02740"
2000-01-05 "5/01/2000" "0" "-0.02484" "-0.0020" "-0.00854"
which is likely not what you want, so subset your df to the $date part, and the df[, want_this:to_this_part]. You've checked for embedded NA(s). The as.POSIXlt is just one of the time formats recognized and makes no particular magic here. And while they 'look' like 'rownames', they're not
str(ts_working)
An ‘xts’ object on 2000-01-04/2000-01-05 containing:
Data: num [1:2, 1:4] 0 0 0 -0.0248 -0.0209 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "SXXP" "STJ" "ISP" "INGA"
Indexed by objects of class: [POSIXlt,POSIXt] TZ:
xts Attributes:
NULL

Related

Pivot_longer on all columns

I am using pivot_longer from tidyr to transform a data frame from wide to long. I wish to use all the columns and maintain rownames in a column as well. The earlier melt function works perfect on this call
w1 <- reshape2::melt(w)
head(w1)
'data.frame': 900 obs. of 3 variables:
$ Var1 : Factor w/ 30 levels "muscle system process",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Var2 : Factor w/ 30 levels "muscle system process",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value: num NA NA NA NA NA NA NA NA NA NA ...
But pivot_longer doesnt
w %>% pivot_longer()
Error in UseMethod("pivot_longer") :
no applicable method for 'pivot_longer' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"
Any suggestion is appreciated
Obviously some data would be helpful, but your problem lies in the fact that you are using pivot_longer() on an object of class matrix and not data.frame
library(tidyr)
# your error
mycars <- as.matrix(mtcars)
pivot_longer(mycars)
Error in UseMethod("pivot_longer") :
no applicable method for 'pivot_longer' applied to an object of class
"c('matrix', 'array', 'double', 'numeric')"
pivot_longer() will work on a data frame
> class(mycars)
[1] "matrix" "array"
> class(mtcars)
[1] "data.frame"
Remember to specify the cols argument, this was not required in reshape2::melt() (more info in the documentation). You want all the columns so cols = everything():
pivot_longer(mtcars, cols = everything())
(Disclaimer: Of course, mtcars is not the best dataset to convert to long format)

How to group daily data into months in a dataframe using dplyr

I have a dataframe containing daily counts of number group members seen present. I am wanting to get a monthly mean of the number of group members seen (produced in a data frame). I've been trying to use dplyr as it is much simpler than creating a new data frame and filling it using a for loop. I'm very new to coding and would like to be able to do this for multiple groups. My dataframe looks like this:
data.frame': 148 obs. of 7 variables:
$ Date : Date, format: "2013-05-01" "2013-05-02" ...
$ Group : chr "WK" "WK" "WK" "WK" ...
$ Session : Factor w/ 12 levels "AM","AM1","AM2",..: 9 1 9 9 1 9 9 1 1 1 ...
$ Group.Members.Seen : num 7 6 8 9 9 6 8 9 4 9 ...
$ Roving.Males : num NA NA NA NA NA NA NA NA NA NA ...
$ Undyed.Group.Members.Seen: num NA NA NA NA NA NA NA NA NA NA ...
$ Non.group.Other : num NA NA NA NA NA NA NA NA NA NA ..
I don't have an observation for every day, and sometimes have multiple observations for a day. In this particular instance, there is only data in the Group.members.seen column, however in other datasets i do have numbers in roving.males, undyed.group.members.seen, and non.group.other columns.
For this particular dataset, I am only wanting to work with the Date and Group.Members.seen columns, as I only have data in those columns. I've used select to select those columns, then have tried to use mutate, group_by, and summarise to get what I want. However, I think the problem is with the dates. Have also tried aggregate but i don't think that is the best.
test <- WK.2013 %>%
select(Date, Group.Members.Seen) %>%
mutate(mo = Date(format="%m"), mean.num.members = mean(Group.Members.Seen)) %>%
group_by(Date(format="%m")) %>%
summarise(mean = mean(Group.Members.Seen))
Error message is saying it cannot find the function "Date", which is probably the beginning of a long string of problems with that code.
You can try lubridate package and round dates to month or year or other units.
library(lubridate)
mydate <- today()
> floor_date(today(),unit = "month")
[1] "2019-07-01"
> floor_date(mydate,unit = "month")
[1] "2019-07-01"
> round_date(mydate,unit = "month")
[1] "2019-08-01"
It's hard to say for sure if this will work without seeing the actual data but could you try the apply.monthly function from the xts package?

Convert delimited string to numeric vector in dataframe

This is such a basic question, I'm embarrassed to ask.
Let's say I have a dataframe full of columns which contain data of the following form:
test <-"3000,9843,9291,2161,3458,2347,22925,55836,2890,2824,2848,2805,2808,2775,2760,2706,2727,2688,2727,2658,2654,2588"
I want to convert this to a numeric vector, which I have done like so:
test <- as.numeric(unlist(strsplit(test, split=",")))
I now want to convert a large dataframe containing a column full of this data into a numeric vector equivalent:
mutate(data,
converted = as.numeric(unlist(strsplit(badColumn, split=","))),
)
This doesn't work because presumably it's converting the entire column into a numeric vector and then replacing a single row with that value:
Error in mutate_impl(.data, dots) : Column converted must be
length 20 (the number of rows) or one, not 1274
How do I do this?
Here's some sample data that reproduces your error:
data <- data.frame(a = 1:3,
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)
Here's the error:
library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18
A straightforward way would be to just use strsplit on the entire column, and lapply ... as.numeric to convert the resulting list values from character vectors to numeric vectors.
x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3
This might help:
library(purrr)
mutate(data, converted = map(badColumn, function(txt) as.numeric(unlist(strsplit(txt, split = ",")))))
What you get is a list column which contains the numeric vectors.
Base R
A=c(as.numeric(strsplit(test,',')[[1]]))
A
[1] 3000 9843 9291 2161 3458 2347 22925 55836 2890 2824 2848 2805 2808 2775 2760 2706 2727 2688 2727 2658 2654 2588
df$NEw2=lapply(df$NEw, function(x) c(as.numeric(strsplit(x,',')[[1]])))
df%>%mutate(NEw2=list(c(as.numeric(strsplit(NEw,',')[[1]]))))

R- using dygraph with csv

following is my ex.csv data input to R.
Date pr pa
1 2015-01-01 6497985 4833118
2 2015-02-01 88289 4305786
3 2015-03-01 0 1149480
4 2015-04-01 0 16706470
5 2015-05-01 0 7025197
6 2015-06-01 0 6752085
also, here is raw data
Date,pr,pa
2015/1/1,6497985,4833118
2015/2/1,88289,4305786
2015/3/1,0,1149480
2015/4/1,0,16706470
2015/5/1,0,7025197
2015/6/1,0,6752085
how can I use R package dygraph with this data?
> str(ex)
'data.frame': 6 obs. of 3 variables:
$ Date: Factor w/ 6 levels "2015/1/1","2015/2/1",..: 1 2 3 4 5 6
$ pr : int 6497985 88289 0 0 0 0
$ pa : int 4833118 4305786 1149480 16706470 7025197 6752085
> dygraph(ex)
Error in dygraph(ex) : Unsupported type passed to argument 'data'.
Please help me.appreciate a lot.
Here are the steps to get it done: First, you need to convert your strings to a Date that is understandable for R. Then convert your data to an xts time series (required by dygraphs). Then plot it with dygraphs.
library(dygraphs)
library(xts)
data<-read.csv("test.csv")
data$Date<- as.Date(data$Date) #convert to date
time_series <- xts(data, order.by = data$Date) #make xts
dygraph(time_series) #now plot

colClasses date and time read.csv

I have some data of the form:
date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5
that comes from a csv file. I am relatively new to R, so I tried
data <-read.csv("sample.csv", header = TRUE, sep = ",")
and using POSIXlt, as well as POSIXct in the colClasses argument, but I cant seem to be able to create one column or 'variable' out of my date and time data. I want to do so, so I can then choose arbitrary timeframes over which to calculate running statistics such as max, min, mean (and then boxplots, etc.).
I also thought that I might convert it to a time series and get around it that way,
dataTS <-ts(data)
but have yet been able to use the start, end, and frequency to my advantage. Thanks for your help.
You can't do this upon reading the data in to R using the colClasses argument because the data span two "columns" in the CSV file. Instead, load the data and process the date and time columns into a single POSIXlt variable:
dat <- read.csv(textConnection("date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5"))
dat <- within(dat, Datetime <- as.POSIXlt(paste(date, time),
format = "%Y%m%d %H:%M:%S"))
[I presume it is year month day??, If not use "%Y%d%m %H:%M:%S"]
Which gives:
> head(dat)
date time val1 val2 Datetime
1 20090503 0:05:12 107.25 1 2009-05-03 00:05:12
2 20090503 0:05:17 108.25 20 2009-05-03 00:05:17
3 20090503 0:07:45 110.25 5 2009-05-03 00:07:45
4 20090503 0:07:56 106.25 5 2009-05-03 00:07:56
> str(dat)
'data.frame': 4 obs. of 5 variables:
$ date : int 20090503 20090503 20090503 20090503
$ time : Factor w/ 4 levels "0:05:12","0:05:17",..: 1 2 3 4
$ val1 : num 107 108 110 106
$ val2 : int 1 20 5 5
$ Datetime: POSIXlt, format: "2009-05-03 00:05:12" "2009-05-03 00:05:17" ...
You can now delete date and `time if you wish:
> dat <- dat[, -(1:2)]
> head(dat)
val1 val2 Datetime
1 107.25 1 2009-05-03 00:05:12
2 108.25 20 2009-05-03 00:05:17
3 110.25 5 2009-05-03 00:07:45
4 106.25 5 2009-05-03 00:07:56

Resources