I am new to R and I am trying to use the Hansard Library.
Is there any way I could export the results of any of the queries as Json rather than a tibble?
library(hansard)
library(tibble)
#example query
z <- mp_vote_record(172, "aye", start_date = "2017-01-01", end_date = "2017-05-03")
print(z)
Giving the output:
# A tibble: 38 x 5
about title uin date_value date_datatype
<chr> <chr> <chr> <dttm> <chr>
1 722300 Early Parl~ CD:20~ 2017-04-19 00:00:00 POSIXct
2 714865 Pension Sc~ CD:20~ 2017-03-29 00:00:00 POSIXct
3 714866 Pension Sc~ CD:20~ 2017-03-29 00:00:00 POSIXct
4 714868 Pension Sc~ CD:20~ 2017-03-29 00:00:00 POSIXct
5 713962 Bus Servic~ CD:20~ 2017-03-27 00:00:00 POSIXct
6 713963 Bus Servic~ CD:20~ 2017-03-27 00:00:00 POSIXct
7 714005 Bus Servic~ CD:20~ 2017-03-27 00:00:00 POSIXct
8 710264 Reproducti~ CD:20~ 2017-03-13 00:00:00 POSIXct
9 708770 Children a~ CD:20~ 2017-03-07 00:00:00 POSIXct
10 708773 Children a~ CD:20~ 2017-03-07 00:00:00 POSIXct
# ... with 28 more rows
You can just transform the tibble into json using jsonlite package. An example using the built-in data set iris:
library(dplyr)
library(jsonlite)
mydata <- as_tibble(iris)
toJSON(mydata)
Related
I am trying to separate the events from streamflow data. I have hourly data. I have run the code
dailyMQ <- data.frame(Date=seq(from=as.Date("01.01.2000", format="%d.%m.%Y"),
to=as.Date("01.01.2004", format="%d.%m.%Y"), by="days"),
discharge=rbeta(1462,2,20)*100)
for daily data. But I am trying for hourly data but getting errors.
Could anyone suggest me how to write a code for hourly data?
Thanks
Date format can't directly be split into hours.
You could use POSIXct datetime format:
HourlyMQ <- data.frame(Date=seq(from=as.POSIXct("01.01.2019", format="%d.%m.%Y"), to=as.POSIXct("11.12.2019", format="%d.%m.%Y"),by="hours"),discharge=rbeta(8257,2,20))
HourlyMQ
#> Date discharge
#> 1 2019-01-01 00:00:00 0.2452214482
#> 2 2019-01-01 01:00:00 0.0620291334
#> 3 2019-01-01 02:00:00 0.0608788870
#> 4 2019-01-01 03:00:00 0.0697449808
#> 5 2019-01-01 04:00:00 0.0302780135
Let's say I have a csv file. For example, this one, https://www.misoenergy.org/planning/generator-interconnection/GI_Queue/gi-interactive-queue/#
If I do
miso_queue <- read_csv_arrow("GI Interactive Queue.csv", as_data_frame = FALSE, timestamp_parsers = "%m/%d/%Y")
miso_queue %>% collect()
# A tibble: 3,343 x 24
`Project #` `Request Status` `Queue Date` `Withdrawn Date` `Done Date` `Appl In Service ~` `Transmission ~` County State
<chr> <chr> <dttm> <dttm> <dttm> <dttm> <chr> <chr> <chr>
1 E002 Done 2013-09-12 20:00:00 NA 2003-12-12 19:00:00 NA Entergy Point~ LA
2 E291 Done 2012-05-14 20:00:00 NA 2013-10-21 20:00:00 2015-12-31 19:00:00 Entergy NA TX
3 G001 Withdrawn 1995-11-07 19:00:00 NA NA NA American Transm~ Brown~ WI
4 G002 Done 1998-11-30 19:00:00 NA NA NA LG&E and KU Ser~ Trimb~ KY
It seems like it's assuming the file is in GMT and then converts the GMT representation of the date to my local time zone (Eastern).
I can do Sys.setenv(TZ="GMT") before I load the file and then that avoids the offset issue.
Sys.setenv(TZ="GMT")
miso_queue <- read_csv_arrow("GI Interactive Queue.csv", as_data_frame = FALSE, timestamp_parsers = "%m/%d/%Y")
miso_queue %>% collect()
# A tibble: 3,343 x 24
`Project #` `Request Status` `Queue Date` `Withdrawn Date` `Done Date` `Appl In Service ~` `Transmission ~` County State
<chr> <chr> <dttm> <dttm> <dttm> <dttm> <chr> <chr> <chr>
1 E002 Done 2013-09-13 00:00:00 NA 2003-12-13 00:00:00 NA Entergy Point~ LA
2 E291 Done 2012-05-15 00:00:00 NA 2013-10-22 00:00:00 2016-01-01 00:00:00 Entergy NA TX
3 G001 Withdrawn 1995-11-08 00:00:00 NA NA NA American Transm~ Brown~ WI
4 G002 Done 1998-12-01 00:00:00 NA NA NA LG&E and KU Ser~ Trimb~ KY
While setting my session tz to GMT isn't really too onerous, I'm wondering if there's a way to have it either assume the file is the same as my local time zone and just keep it that way or if it wants to assume it's GMT in the file then just keep it in GMT regardless of my local timezone.
It seems like it's assuming the file is in GMT and then converts the GMT representation of the date to my local time zone (Eastern).
Actually, the timezone conversion you are seeing just happens when you print. You can see this if you save the data frame to a variable and print it before and after you change your current timezone:
miso_queue <- read_csv_arrow("GI Interactive Queue.csv", as_data_frame = FALSE, timestamp_parsers = "%m/%d/%Y")
df <- miso_queue %>% collect()
Sys.setenv(TZ="US/Pacific")
test[,"Queue Date"]
# # A tibble: 3,343 × 1
# `Queue Date`
# <dttm>
# 1 2013-09-12 17:00:00
# 2 2012-05-14 17:00:00
# 3 1995-11-07 16:00:00
# 4 1998-11-30 16:00:00
# 5 1998-11-30 16:00:00
# 6 1998-11-30 16:00:00
# 7 1999-02-14 16:00:00
# 8 1999-02-14 16:00:00
# 9 1999-07-29 17:00:00
# 10 1999-08-12 17:00:00
# # … with 3,333 more rows
Sys.setenv(TZ="GMT")
test[,"Queue Date"]
# # A tibble: 3,343 × 1
# `Queue Date`
# <dttm>
# 1 2013-09-13 00:00:00
# 2 2012-05-15 00:00:00
# 3 1995-11-08 00:00:00
# 4 1998-12-01 00:00:00
# 5 1998-12-01 00:00:00
# 6 1998-12-01 00:00:00
# 7 1999-02-15 00:00:00
# 8 1999-02-15 00:00:00
# 9 1999-07-30 00:00:00
# 10 1999-08-13 00:00:00
# # … with 3,333 more rows
However, in the example you showed there is no time data, so you might be better off reading that column as a date instead of a timestamp. Unfortunately right now I think Arrow only lets you parse as a date right now if you provide the schema for the whole table. One alternative would be to parse the date columns after reading.
I have a tibble with a date and return column, that looks as follows:
> head(return_series)
# A tibble: 6 x 2
date return
<chr> <dbl>
1 2002-01 0.0292
2 2002-02 0.0439
3 2002-03 0.0240
4 2002-04 0.00585
5 2002-05 -0.0169
6 2002-06 -0.0686
I first add the day to the date column with the following code:
return_series$date <- as.Date(as.yearmon(return_series$date))
# A tibble: 6 x 2
date return
<date> <dbl>
1 2002-01-01 0.0292
2 2002-02-01 0.0439
3 2002-03-01 0.0240
4 2002-04-01 0.00585
5 2002-05-01 -0.0169
6 2002-06-01 -0.0686
My goal is to convert the return_series tibble to xts data to use it for further analysis with the PerformanceAnalytics package. But when I use the command as.xts I receive the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
How can I change the format to xts or is there an other possibility to work with the PerformanceAnalytics package instead of converting to xts?
Thank you very much for your help!
You need to follow the xts documentation more closely:
> tb <- as_tibble(data.frame(date=as.Date("2002-01-01") + (0:5)*30,
+ return=rnorm(6)))
> tb
# A tibble: 6 × 2
date return
<date> <dbl>
1 2002-01-01 0.223
2 2002-01-31 -0.352
3 2002-03-02 0.149
4 2002-04-01 1.42
5 2002-05-01 -1.04
6 2002-05-31 0.507
>
> x <- xts(tb[,-1], order.by=as.POSIXct(tb[[1]]))
> x
return
2001-12-31 18:00:00 0.222619
2002-01-30 18:00:00 -0.352288
2002-03-01 18:00:00 0.149319
2002-03-31 18:00:00 1.421967
2002-04-30 19:00:00 -1.035087
2002-05-30 19:00:00 0.507046
>
An xts object prefers a POSIXct datetime object, which you can convert from a Date object. For a (closely-related) zoo object you could keep Date.
I have a dataset with a date-time vector (format is m/d/y h:m) that looks like this:
june2018_2$datetime
[1] "6/1/2018 1:00" "6/1/2018 2:00" "6/1/2018 3:00" "6/1/2018 4:00"
And I have 61 other variables that are all numeric (with some already missing values indicated with 'NA'). My date time vector is missing some hourly slots and I want to make the date-time vector full and fill in missing spots in the other 61 variables with 'NA'. I tried to use what's already out there but I can't seem to find some code or function that works for what I'm specifically working with. Any tips?
If your datetime is not in POSIXct then could be mutated. With complete you can fill in rows by the hour. Other columns in the data frame will be NA.
library(tidyverse)
df %>%
mutate(datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M")) %>%
complete(datetime = seq(from = first(datetime), to = last(datetime), by = "hours"))
For example, if you have test data:
set.seed(123)
df <- data.frame(
datetime = c("6/1/2018 1:00", "6/1/2018 3:00", "6/1/2018 5:00", "6/1/2018 9:00"),
var1 = sample(10,4)
)
The output would be:
# A tibble: 9 x 2
datetime var1
<dttm> <int>
1 2018-06-01 01:00:00 3
2 2018-06-01 02:00:00 NA
3 2018-06-01 03:00:00 10
4 2018-06-01 04:00:00 NA
5 2018-06-01 05:00:00 2
6 2018-06-01 06:00:00 NA
7 2018-06-01 07:00:00 NA
8 2018-06-01 08:00:00 NA
9 2018-06-01 09:00:00 8
My dataframe has timestamp with and without seconds, and a random use of 0 in front of months and hours, i.e. 01 or 1
library(tidyverse)
df <- data_frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 01:03', '12/30/2015 23:04:25'))
cust timestamp
A 5/31/2016 1:03:12
A 05/25/2016 01:06
B 6/16/2016 01:03
B 12/30/2015 23:04:25
How to extract hours into a separate column? The desired output:
cust timestamp hours
A 5/31/2016 1:03:12 1
A 05/25/2016 01:06 1
B 6/16/2016 9:03 9
B 12/30/2015 23:04:25 23
I prefer the answer with tidyverse and mutate, but my attempt fails to extract hours correctly:
df %>% mutate(hours=strptime(timestamp, '%H') %>% as.character() )
# A tibble: 4 × 3
cust timestamp hours
<chr> <chr> <chr>
1 A 5/31/2016 1:03:12 2016-10-31 05:00:00
2 A 05/25/2016 01:06 2016-10-31 05:00:00
3 B 6/16/2016 01:03 2016-10-31 06:00:00
4 B 12/30/2015 23:04:25 2016-10-31 12:00:00
Try this:
library(lubridate)
df <- data.frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 09:03', '12/30/2015 23:04:25'))
df %>% mutate(hours=hour(strptime(timestamp, '%m/%d/%Y %H:%M')) %>% as.character() )
cust timestamp hours
1 A 5/31/2016 1:03:12 1
2 A 05/25/2016 01:06 1
3 B 6/16/2016 09:03 9
4 B 12/30/2015 23:04:25 23
Here is a solution that appends 00 for the seconds when they are missing, then converts to a date using lubridate and extracts the hours using format. Note, if you don't want the 00:00 at the end of the hours, you can just eliminate them from the output format in format:
df %>%
mutate(
cleanTime = ifelse(grepl(":[0-9][0-9]:", timestamp)
, timestamp
, paste0(timestamp, ":00")) %>% mdy_hms
, hour = format(cleanTime, "%H:00:00")
)
returns:
cust timestamp cleanTime hour
<chr> <chr> <dttm> <chr>
1 A 5/31/2016 1:03:12 2016-05-31 01:03:12 01:00:00
2 A 05/25/2016 01:06 2016-05-25 01:06:00 01:00:00
3 B 6/16/2016 01:03 2016-06-16 01:03:00 01:00:00
4 B 12/30/2015 23:04:25 2015-12-30 23:04:25 23:00:00
Your timestamp is a character string (), you need to format is as a date (with as.Date for example) before you can start using functions like strptime.
You are going to have to go through some string manipulations to have properly formatted data before you can convert it to dates. Prepend a zero to months with a single digit and append :00 to hours with missing seconds. Use strsplit() and other regex functions. Afterwards do as.Date(df$timestamp,format = '%m/%d/%Y %H:%M:%S'), then you will be able to use strptime to extract the hours.