R- using dygraph with csv - r

following is my ex.csv data input to R.
Date pr pa
1 2015-01-01 6497985 4833118
2 2015-02-01 88289 4305786
3 2015-03-01 0 1149480
4 2015-04-01 0 16706470
5 2015-05-01 0 7025197
6 2015-06-01 0 6752085
also, here is raw data
Date,pr,pa
2015/1/1,6497985,4833118
2015/2/1,88289,4305786
2015/3/1,0,1149480
2015/4/1,0,16706470
2015/5/1,0,7025197
2015/6/1,0,6752085
how can I use R package dygraph with this data?
> str(ex)
'data.frame': 6 obs. of 3 variables:
$ Date: Factor w/ 6 levels "2015/1/1","2015/2/1",..: 1 2 3 4 5 6
$ pr : int 6497985 88289 0 0 0 0
$ pa : int 4833118 4305786 1149480 16706470 7025197 6752085
> dygraph(ex)
Error in dygraph(ex) : Unsupported type passed to argument 'data'.
Please help me.appreciate a lot.

Here are the steps to get it done: First, you need to convert your strings to a Date that is understandable for R. Then convert your data to an xts time series (required by dygraphs). Then plot it with dygraphs.
library(dygraphs)
library(xts)
data<-read.csv("test.csv")
data$Date<- as.Date(data$Date) #convert to date
time_series <- xts(data, order.by = data$Date) #make xts
dygraph(time_series) #now plot

Related

reading daily time series data in R using xts, error message

Dear Stackoverflow community,
I have been trying to read these set of daily stock market data using xts object and been getting different types of error messages, listed below.
The dataset contains 5030 observations, from 4/01/2000-22/07/2019.
I have checked for NAs in the dataset, and there are none
I have tried changing the format of the dataset from dd/mm/yyyy to yyyy/mm/dd, it doesnt seem to work
i checked to see if I change it to quarterly and then try to read it if it works, and it does.
So I think there is a problem with the code that I am using to read the daily data.
The dataset is the package SystemicR author's dataset called data_stock_returns, and im trying to recreate the results before I try my own dataset.
Below is the dataset and the code I tried.
Would really appreciate it if someone in the community could help out with this problem.
Thank You
Date
SXXP
STJ
ISP
INGA
Index
4/01/2000
0
0
-0.0209
-0.0274
1
5/01/2000
0
-0.02484
-0.0020
-0.00854
2
6/01/2000
0
0.0995
-0.0212
-0.00689
3
7/01/2000
0
0.061
0.02303
0.01961
4
10/01/2000
-0.00147
-0.0456
-0.0172
0.00119
5
..........
........
.......
.......
........
....
22/07/2019
0
-0.0127
0.00124
0.0029756
5030
df_my_data <- read.csv(('C:/Users/s/Desktop/R/intro/data/data_stock_returns.csv'), sep = ";")
str(df_my_data)
'data.frame': 5030 obs. of 74 variables:
$ Index : int 1 2 3 4 5 6 7 8 9 10 ...
$ SXXP : num 0 0 0 0 0 ...
$ STJ : num 0 -0.0248 0.0995 0.0611 -0.0456 ...
$ ISP : num -0.021 -0.0021 -0.0212 0.023 -0.0173 ...
xts(df_my_data, order.by = as.Date(rownames(df_my_data$Date), "%d/%m/%Y"))
df_my_data$Date <- as.Date(df_my_data$Date)
I get the below 2 error message
Error in $<-.data.frame(*tmp*, Date, value = numeric(0)) : replacement has 0 rows, data has 5030
Error in xts(df_my_data, order.by = as.Date(rownames(df_my_data), "%d/%m/%Y")) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
df_my_data$Date_xts <- as.xts(df_my_data[, -1], order.by = (df_my_data$Date))
I get another error message
Error in xts(x, order.by = order.by, frequency = frequency, ...) :
order.by requires an appropriate time-based object
library(SystemicR)
l_result<- f_CoVaR_Delta_CoVaR_i_q(data_stock_returns)
Note that questions to SO should show the data in reproducible form using dput as discussed at the top of the r tag home page.
As this was not done and since the .csv input was not shown we will
assume that the data shown is a data frame df as in the Note at the end. If that is not what you have then you will need to fix the question. If that is what you have then the problems with the code in the question are discussed in the following.
xts
Regarding converting df to an xts object we have these problems with the code in the question:
The use of row names. The data shown in the question does not have row names.
The code in the question is passing the index to both the x and order.by arguments of xts in th e first attempt. It should only be passed to the order.by argument. In the second attempt it has not converted the Date column to Date class.
The code would have worked with minor changes:
library(xts)
xts(df[-1], as.Date(df[[1]], "%d/%m/%Y")) # df in Note at tend
however, we we can avoid picking df apart and instead use the whole object approach by reading it into a zoo object and then converting it to xts.
library(xts)
z <- read.zoo(df, format = "%d/%m/%Y") # df in Note at end
x <- as.xts(z)
f_CoVaR_Delta_CoVaR_i_q
The help file for this function says its argument is a data frame, not an xts object. Using df from the Note at the end we have
library(SystemicR)
df2 <- transform(df, Date = as.Date(Date, "%d/%m/%Y"))
f_CoVaR_Delta_CoVaR_i_q(df2)
giving:
$CoVaR_i_q
[,1] [,2] [,3]
[1,] -0.0018355914 -0.002255029 0.0002579912
[2,] -0.0008255504 -0.001121190 -0.0011822728
$Delta_CoVaR_i_q
[1] -0.001010041 -0.001133839 0.001440264
Note
df <- structure(list(Date = c("4/01/2000", "5/01/2000", "6/01/2000",
"7/01/2000", "10/01/2000"), SXXP = c(0, 0, 0, 0, -0.00147), STJ = c(0,
-0.02484, 0.0995, 0.061, -0.0456), ISP = c(-0.0209, -0.002, -0.0212,
0.02303, -0.0172), INGA = c(-0.0274, -0.00854, -0.00689, 0.01961,
0.00119)), class = "data.frame", row.names = c(NA, -5L))
which looks like this:
> df
Date SXXP STJ ISP INGA
1 4/01/2000 0.00000 0.00000 -0.02090 -0.02740
2 5/01/2000 0.00000 -0.02484 -0.00200 -0.00854
3 6/01/2000 0.00000 0.09950 -0.02120 -0.00689
4 7/01/2000 0.00000 0.06100 0.02303 0.01961
5 10/01/2000 -0.00147 -0.04560 -0.01720 0.00119
Using your first two rows:
df <- data.frame(Date = c('4/01/2000', '5/01/2000'), SXXP=c(0,0), STJ=c(0,-0.02484), ISP=c(-0.0209,-0.0020), INGA=c(-0.0274, -0.00854))
df
Date SXXP STJ ISP INGA
1 4/01/2000 0 0.00000 -0.0209 -0.02740
2 5/01/2000 0 -0.02484 -0.0020 -0.00854
I imagine you'll want to do some further analysis and want SXXP & etc as numeric
ts_working <- xts(x = df[, 2:5], order.by=(as.POSIXlt(df$Date, format = '%d/%m/%Y')))
ts_working
SXXP STJ ISP INGA
2000-01-04 0 0.00000 -0.0209 -0.02740
2000-01-05 0 -0.02484 -0.0020 -0.00854
if you put xts(x=df...
ts_working <- xts(x = df, order.by=(as.POSIXlt(df$Date, format = '%d/%m/%Y'))) ts_working
Date SXXP STJ ISP INGA
2000-01-04 "4/01/2000" "0" " 0.00000" "-0.0209" "-0.02740"
2000-01-05 "5/01/2000" "0" "-0.02484" "-0.0020" "-0.00854"
which is likely not what you want, so subset your df to the $date part, and the df[, want_this:to_this_part]. You've checked for embedded NA(s). The as.POSIXlt is just one of the time formats recognized and makes no particular magic here. And while they 'look' like 'rownames', they're not
str(ts_working)
An ‘xts’ object on 2000-01-04/2000-01-05 containing:
Data: num [1:2, 1:4] 0 0 0 -0.0248 -0.0209 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "SXXP" "STJ" "ISP" "INGA"
Indexed by objects of class: [POSIXlt,POSIXt] TZ:
xts Attributes:
NULL

Invalid trim argument in plotting counts over dates in R

I am trying to apply the answer to my prior question on plotting with dates in the x axis to the COVID data in the New York Times but I get an error message:
require(RCurl)
require(foreign)
require(tidyverse)
counties = read.csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv", sep =",",header = T)
Philadelphia <- counties[counties$county=="Philadelphia",]
Philadelphia <- droplevels(Philadelphia)
rownames(Philadelphia) <- NULL
with(as.data.frame(Philadelphia),plot(date,cases,xaxt="n"))
axis.POSIXct(1,at=Philadelphia$date,
labels=format(Philadelphia$date,"%y-%m-%d"),
las=2, cex.axis=0.8)
# Error in format.default(structure(as.character(x), names = names(x), dim = dim(x), :
# invalid 'trim' argument
The structure of the data includes already a date format:
> str(Philadelphia)
'data.frame': 21 obs. of 6 variables:
$ date : Factor w/ 21 levels "2020-03-10","2020-03-11",..: 1 2 3 4 5 6 7 8 9 10 ...
$ county: Factor w/ 1 level "Philadelphia": 1 1 1 1 1 1 1 1 1 1 ...
$ state : Factor w/ 1 level "Pennsylvania": 1 1 1 1 1 1 1 1 1 1 ...
$ fips : int 42101 42101 42101 42101 42101 42101 42101 42101 42101 42101 ...
$ cases : int 1 1 1 3 4 8 8 10 17 33 ...
$ deaths: int 0 0 0 0 0 0 0 0 0 0 ...
I tried changing the axis call to
axis.Date(1,Philadelphia$date, at=Philadelphia$date,
labels=format(Philadelphia$date,"%y-%m-%d"),
las=2, cex.axis=0.8)
without success.
I wonder if it has to do with the strange horizontal lines in the plot (as opposed to points):
The 'invalid trim argument' error comes from format (it is the default second argument because you haven't explicitly specified the parameter).
I'm not entirely sure what you're doing here but I would change date to a Date object before plotting the data. You'll also want to use %Y instead of %y I believe.
library(dplyr)
counties = read.csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv", sep =",",header = T)
Philadelphia <- counties[counties$county=="Philadelphia",] %>%
mutate(date = as.POSIXct(date, format = '%Y-%m-%d'))
with(Philadelphia, plot(date,cases))

How to merge two data frames with non overlapping dates?

I have a data set with the following variables:
steps: Number of steps taking in a 5-minute interval
date: The date on which the measurement was taken in YYYY-MM-DD format
interval: Identifier for the 5-minute interval in which measurement was taken (288 intervals per day)
The main data set:
> head(activityData, 3)
steps date interval
1 1.7169811 2012-10-01 0
2 0.3396226 2012-10-01 5
3 0.1320755 2012-10-01 10
> str(activityData)
'data.frame': 17568 obs. of 3 variables:
$ steps : num 1.717 0.3396 0.1321 0.1509 0.0755 ...
$ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
$ interval: num 0 5 10 15 20 25 30 35 40 45 ...
The data set has a range of two months.
I had to divided it by weekdays and weekend days. I did it with the following functions:
> dataAs.xtsWeekday <- dataAs.xts[.indexwday(dataAs.xts) %in% 1:5]
> dataAs.xtsWeekend <- dataAs.xts[.indexwday(dataAs.xts) %in% c(0, 6)]
After doing this I had to make some calculation, at which I failed so I decided to export the files and read them in, again.
After I imported the data again, I made the calculation I wanted, and I tried to merge the 2 datasets, but did not succeed.
First data set:
> head(weekdays, 3)
X steps date interval daytype
1 1 37.3826 2012-10-01 0 weekday
2 2 37.3826 2012-10-01 5 weekday
3 3 37.3826 2012-10-01 10 weekday
> str(weekdays)
'data.frame': 12960 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ steps : num 37.4 37.4 37.4 37.4 37.4 ...
$ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
$ daytype : chr "weekday" "weekday" "weekday" "weekday" ...
Second data set:
> head(weekend, 3)
X steps date interval daytype
1 1 0 2012-10-06 0 weekend
2 2 0 2012-10-06 5 weekend
3 3 0 2012-10-06 10 weekend
> str(weekend)
'data.frame': 4608 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ steps : num 0 0 0 0 0 0 0 0 0 0 ...
$ date : chr "2012-10-06" "2012-10-06" "2012-10-06" "2012-10-06" ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
$ daytype : chr "weekend" "weekend" "weekend" "weekend" ...
Now I would like to merge the 2 data sets (weekdays, weekends) by date, but the problem is that I don't have any common dates or anything else common.
The final data set should have 4 columns and 17568 observations.
The columns should be:
steps: Number of steps taking in a 5-minute interval
date: The date on which the measurement was taken in YYYY-MM-DD format
interval: Identifier for the 5-minute interval in which measurement was taken
daytype: weekends days or normal weekdays.
I tried with:
merge
join(plyr)
union
Everywhere I looked all the data sets had a common ID or a common column in both data sets, not like in my case.
I also looked here, but I did not understand much and at many others, but they had nothing in common with my data set.
The other option I thought about was to add a column to the original data set and call it
"ID" and redo everything that I did so far; thing that I'll have to do if I don't find a way around this problem.
I would like some advice on how to proceed or what to try next.
Since you mentioned that your final data set should have 17568 (=4608+12960) observations/rows, I assume you want to stack the two data.frames over each other (and possibly order them by date afterwards). This is done by using rbind().
finaldata <- rbind(weekdays, weekend)
If you want to remove column X:
finaldata$X <- NULL
To convert your date column to actual dates:
finaldata$date <- as.Date(finaldata$date, format="%Y-%m-%d")
To order the whole data by date:
finaldata <- finaldata[order(finaldata$date),]

Histogram of Weekdays by Year R

I have a .csv file that I have loaded into R using the following basic command:
lace <- read.csv("lace for R.csv")
It pulls in my data just fine. Here is the str of the data:
str(lace)
'data.frame': 2054 obs. of 20 variables:
$ Admission.Day : Factor w/ 872 levels "1/1/2013","1/10/2011",..: 231 238 238 50 59 64 64 64 67 67 ...
$ Year : int 2010 2010 2010 2011 2011 2011 2011 2011 2011 2011 ...
$ Month : int 12 12 12 1 1 1 1 1 1 1 ...
$ Day : int 28 30 30 3 4 6 6 6 7 7 ...
$ DayOfWeekNumber : int 3 5 5 2 3 5 5 5 6 6 ...
$ Day.of.Week : Factor w/ 7 levels "Friday","Monday",..: 6 5 5 2 6 5 5 5 1 1 ...
What I am trying to do is create three (3) different histograms and then plot them all together on one. I want to create a histogram for each year, where the x axis or labels will be the days of the week starting with Sunday and ending on Saturday.
Firstly how would I go about creating a histogram out of Factors, which the days of the week are in?
Secondly how do I create a histogram for the days of the week for a given year?
I have tried using the following post here but cannot get it working. I use the Admission.Day as the variable and get an error message:
dat <- as.Date(lace$Admission.Day)
Error in charToDate(x) : character string is not in a standard unambiguous format
Thank you,
Expanding on the comment above: the problem seems to be with importing dates, rather than making the histogram. Assuming there is an excel workbook "lace for R.xlsx", with a sheet "lace":
## Not tested...
library(XLConnect)
myData <- "lace for R.xlsx" # NOTE: need path also...
wb <- loadWorkbook(myData)
lace <- readWorksheet(wb, sheet="lace")
lace$Admission.Day <- as.Date(lace$Admission.Day)
should provide dates that work with all the R date functions. Also, the lubridate package provides a number of functions that are more intuitive to use than format(...).
Then, as an example:
library(lubridate) # for year(...) and wday(...)
library(ggplot2)
# random dates around Jun 1, across 5 years...
set.seed(123)
lace <- data.frame(date=as.Date(rnorm(1000,sd=50)+365*(0:4),origin="2008/6/1"))
lace$year <- factor(year(lace$date))
lace$dow <- wday(lace$date, label=T)
# This creates the histograms...
ggplot(lace) +
geom_histogram(aes(x=dow, fill=year)) + # fill color by year
facet_grid(~year) + # facet by year
theme(axis.text.x=element_text(angle=90)) # to rotate weekday names...
Produces this:

colClasses date and time read.csv

I have some data of the form:
date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5
that comes from a csv file. I am relatively new to R, so I tried
data <-read.csv("sample.csv", header = TRUE, sep = ",")
and using POSIXlt, as well as POSIXct in the colClasses argument, but I cant seem to be able to create one column or 'variable' out of my date and time data. I want to do so, so I can then choose arbitrary timeframes over which to calculate running statistics such as max, min, mean (and then boxplots, etc.).
I also thought that I might convert it to a time series and get around it that way,
dataTS <-ts(data)
but have yet been able to use the start, end, and frequency to my advantage. Thanks for your help.
You can't do this upon reading the data in to R using the colClasses argument because the data span two "columns" in the CSV file. Instead, load the data and process the date and time columns into a single POSIXlt variable:
dat <- read.csv(textConnection("date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5"))
dat <- within(dat, Datetime <- as.POSIXlt(paste(date, time),
format = "%Y%m%d %H:%M:%S"))
[I presume it is year month day??, If not use "%Y%d%m %H:%M:%S"]
Which gives:
> head(dat)
date time val1 val2 Datetime
1 20090503 0:05:12 107.25 1 2009-05-03 00:05:12
2 20090503 0:05:17 108.25 20 2009-05-03 00:05:17
3 20090503 0:07:45 110.25 5 2009-05-03 00:07:45
4 20090503 0:07:56 106.25 5 2009-05-03 00:07:56
> str(dat)
'data.frame': 4 obs. of 5 variables:
$ date : int 20090503 20090503 20090503 20090503
$ time : Factor w/ 4 levels "0:05:12","0:05:17",..: 1 2 3 4
$ val1 : num 107 108 110 106
$ val2 : int 1 20 5 5
$ Datetime: POSIXlt, format: "2009-05-03 00:05:12" "2009-05-03 00:05:17" ...
You can now delete date and `time if you wish:
> dat <- dat[, -(1:2)]
> head(dat)
val1 val2 Datetime
1 107.25 1 2009-05-03 00:05:12
2 108.25 20 2009-05-03 00:05:17
3 110.25 5 2009-05-03 00:07:45
4 106.25 5 2009-05-03 00:07:56

Resources