Changing time zone in Highchart - r

I am plotting a chart using highcharter function. You can notice the timestamp starts from June 29th. But i when plot it , the graph shows data plotting from June 28,18.30. How do i change this time zone??
> head(d)
timestamps x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
47948 2017-06-29 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 48.5 1210.87
47949 2017-06-29 00:01:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 49.2 1213.91
47950 2017-06-29 00:02:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 49.0 1213.59
47951 2017-06-29 00:03:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 50.0 1214.28
47952 2017-06-29 00:04:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 50.0 1212.13
47953 2017-06-29 00:05:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 49.8 1216.06
library(highcharter)
highchart() %>%
hc_title(text = "A nice chart") %>%
hc_add_series_times_values(d$timestamps,
d$x12, name = "x12")
Any help is appreciated. Thank you.

This is how I managed to deactivate UTC in highcharter.
hcGopts <- getOption("highcharter.global")
hcGopts$useUTC <- FALSE
options(highcharter.global = hcGopts)
The global options are not directly accessible from R. From JavaScript, it would be like this:
Highcharts.setOptions({
global: {
useUTC: false
}
});

Related

Aggregating hourly data into monthly in R while omitting NAs [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 2 years ago.
I have some data gathered from a weather buoy:
station longitude latitude time wd wspd gst wvht dpd apd mwd bar
42001 -89.658 25.888 1975-08-13T22:00:00Z 23 4.1 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-13T23:00:00Z 59 3.1 NaN NaN NaN NaN NaN 1017.3
42001 -89.658 25.888 1975-08-14T00:00:00Z 30 5.2 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-14T01:00:00Z 70 2 NaN NaN NaN NaN NaN 1017.8
42001 -89.658 25.888 1975-08-14T02:00:00Z 87 5.7 NaN NaN NaN NaN NaN 1018.2
42001 -89.658 25.888 1975-08-14T03:00:00Z 105 5.6 NaN NaN NaN NaN NaN 1018.6
42001 -89.658 25.888 1975-08-14T04:00:00Z 116 5.8 NaN NaN NaN NaN NaN 1018.7
42001 -89.658 25.888 1975-08-14T05:00:00Z 116 5 NaN NaN NaN NaN NaN 1018.5
42001 -89.658 25.888 1975-08-14T06:00:00Z 123 4.5 NaN NaN NaN NaN NaN 1018.1
42001 -89.658 25.888 1975-08-14T07:00:00Z 137 4.1 NaN NaN NaN NaN NaN 1017.9
42001 -89.658 25.888 1975-08-14T08:00:00Z 151 3.6 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T09:00:00Z 153 3.5 NaN NaN NaN NaN NaN 1017.6
42001 -89.658 25.888 1975-08-14T10:00:00Z 180 3.5 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T11:00:00Z 189 2.8 NaN NaN NaN NaN NaN 1018
42001 -89.658 25.888 1975-08-14T12:00:00Z 183 1.7 NaN NaN NaN NaN NaN 1018.3
42001 -89.658 25.888 1975-08-14T13:00:00Z 172 0.7 NaN NaN NaN NaN NaN 1018.8
42001 -89.658 25.888 2001-11-18T11:00:00Z 38 7.3 8.8 1.1 6.67 4.51 69 1021
42001 -89.658 25.888 2001-11-18T12:00:00Z 29 7.9 9.3 1.01 5.88 4.42 57 1021.4
42001 -89.658 25.888 2001-11-18T13:00:00Z 29 7.4 8.3 1.02 7.14 4.42 65 1022.1
42001 -89.658 25.888 2001-11-18T14:00:00Z 23 8 9.5 0.97 5.56 4.48 55 1022.6
42001 -89.658 25.888 2001-11-18T15:00:00Z 16 7.6 8.9 1 6.67 4.5 64 1023.2
42001 -89.658 25.888 2001-11-18T16:00:00Z 26 8.9 10.2 0.94 4.17 4.49 29 1023.1
42001 -89.658 25.888 2001-11-18T17:00:00Z 26 8.5 10.2 0.98 4.55 4.48 36 1022.7
42001 -89.658 25.888 2001-11-18T18:00:00Z 17 7.8 9.1 1.07 4.76 4.56 30 1021.9
42001 -89.658 25.888 2001-11-18T19:00:00Z 24 8.1 9.1 1.07 4.55 4.6 29 1021
42001 -89.658 25.888 2001-11-18T20:00:00Z 18 8.3 11.1 1.21 6.25 4.6 69 1020
42001 -89.658 25.888 2001-11-18T21:00:00Z 30 8 9.4 1.2 6.67 4.72 77 1019.8
42001 -89.658 25.888 2001-11-18T22:00:00Z 39 8.2 9.6 1.32 6.67 4.8 76 1019.8
42001 -89.658 25.888 2001-11-18T23:00:00Z 32 8.5 9.6 1.21 6.67 4.63 71 1019.7
42001 -89.658 25.888 2001-11-19T00:00:00Z 38 8.9 10.3 1.28 6.25 4.6 72 1019.8
42001 -89.658 25.888 2001-11-19T01:00:00Z 48 8.3 9.6 1.26 6.67 4.53 71 1020.2
42001 -89.658 25.888 2001-11-19T02:00:00Z 54 10.1 11.6 1.28 6.67 4.59 65 1021.1
42001 -89.658 25.888 2001-11-19T03:00:00Z 60 3 4.7 1.29 5.88 4.58 72 1021.5
42001 -89.658 25.888 2001-11-19T04:00:00Z 77 0.8 1.7 1.25 6.67 4.92 63 1021.2
42001 -89.658 25.888 2001-11-19T05:00:00Z 153 2.1 3 1.21 6.67 4.91 64 1021
42001 -89.658 25.888 2001-11-19T06:00:00Z 20 2.2 5.5 1.18 6.25 4.92 65 1020.6
42001 -89.658 25.888 2001-11-19T07:00:00Z 158 6.2 9.7 1.31 6.67 5.22 67 1020.3
42001 -89.658 25.888 2001-11-19T08:00:00Z 162 7.4 9 1.26 6.67 5.42 73 1020.1
42001 -89.658 25.888 2001-11-19T09:00:00Z 218 4.8 6.2 1.2 7.69 4.98 65 1019.9
How could I create a data frame from aggregating the data (using the mean) on a monthly basis while leaving out the NaN values? The start of the data has numerous rows with NaN, but for several years there are values in those rows.
I've tried:
DF2 <- transform(buoy1, time = substring(time, 1, 7))
aggregate(as.numeric(wd) ~ time, DF2[-1,], mean, na.rm=TRUE))
Which generates
401 2010-09 109.20556
402 2010-10 107.42473
403 2010-11 130.67222
404 2010-12 135.75000
405 2011-01 156.11306
406 2011-02 123.33931
407 2011-03 137.29744
408 2011-04 119.85139
409 2011-05 148.65276
410 2011-06 104.74722
411 2011-07 88.16393
412 2011-09 106.60229
413 2011-10 93.32527
414 2011-11 149.52712
415 2011-12 123.09005
416 2012-01 145.38731
417 2012-02 115.40288
418 2012-03 127.44415
419 2012-04 133.02503
420 2012-05 122.34683
421 2012-06 146.95265
422 2012-07 133.58199
423 2012-08 149.08356
Is there a more efficient way to aggregate across all the columns at once?
Something like
DF2[,5:20] <- sapply(DF2[,5:20], as.numeric, na.rm=TRUE)
monthAvg <- aggregate(DF2[, 5:20], cut(time, "month"),mean)
But then I get:
Error in cut.default(time, "month") : 'x' must be numeric
Here is a base R solution
d <- within(buoy1[-1:-3], time <- format(as.POSIXct(time), "%Y-%m"))
aggregate(. ~ time, d, mean, na.rm = TRUE, na.action = NULL)
# "." means anything other than the RHS, which is `time` in this case
Output
time wd wspd gst wvht dpd apd mwd bar
1 1975-08 118.37500 3.806250 NaN NaN NaN NaN NaN 1018.000
2 2001-11 58.04348 6.882609 8.452174 1.157391 6.186957 4.690435 61.04348 1021.043
You could create a new column with year and month information and take mean of multiple columns using across.
library(dplyr)
df %>%
group_by(time = format(as.POSIXct(time), '%Y-%m')) %>%
summarise(across(gst:bar, mean, na.rm = TRUE)) -> result
result

Get text file to useble dataframe R

I am trying to read a .txt file from the internet and get it into a useable form in R. Seems like it should be easy, but I am struggling:
Data is from Berkeley Earth:
b_earth_url <- 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
I have tried the following:
read.table(b_earth_url, sep = '\t', comment.char = '%', row.names = NULL)
or:
b_earth_data <- readLines(b_earth_url)[!grepl('%', readLines(b_earth_url))]
data.frame(b_earth_data, stringsAsFactors = F)
I have tried a few other options, but can't get past a dataframe with a single variable, containing a fixed width chr vector.
I have tried extract(), separate() and strsplit(), and can't get any of them to work. I don't think I know how to use a fixed width separator for sep =
The separator is whitespace (blanks) not tabs:
out <- read.table(b_earth_url, comment.char = '%')
head(out)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
# 1 1850 1 -0.781 0.382 NaN NaN NaN NaN NaN NaN NaN NaN
# 2 1850 2 -0.260 0.432 NaN NaN NaN NaN NaN NaN NaN NaN
# 3 1850 3 -0.399 0.348 NaN NaN NaN NaN NaN NaN NaN NaN
# 4 1850 4 -0.696 0.296 NaN NaN NaN NaN NaN NaN NaN NaN
# 5 1850 5 -0.690 0.320 NaN NaN NaN NaN NaN NaN NaN NaN
# 6 1850 6 -0.392 0.228 -0.529 0.147 NaN NaN NaN NaN NaN NaN

Scrape Wikipedia Using Python, Beautiful Soup

I have some challenges with a wiki table and hope someone who has done it before can give me advice. From the wikitable mw-collapsible table I need to get the data into a pandas data frames. (The code does not work). I am not sure how to get this going. In this initial attempt to pull data it ValueError: Length of values does not match length of index. Will appreciate your help!
import urllib.request
url = "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_South_Africa"
page = urllib.request.urlopen(url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
# use the 'find_all' function to bring back all instances of the 'table' tag in the HTML and store in 'all_tables' variable
all_tables=soup.find_all("table")
all_tables
right_table=soup.find('table', class_='wikitable mw-collapsible')
right_table
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]
G=[]
H=[]
I=[]
J=[]
K=[]
L=[]
M=[]
N=[]
O=[]
P=[]
Q=[]
U=[]
for row in right_table.findAll('tr'):
cells=row.findAll('td')
if len(cells)==17:
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find(text=True))
D.append(cells[3].find(text=True))
E.append(cells[4].find(text=True))
F.append(cells[5].find(text=True))
G.append(cells[6].find(text=True))
H.append(cells[7].find(text=True))
I.append(cells[8].find(text=True))
J.append(cells[9].find(text=True))
K.append(cells[10].find(text=True))
L.append(cells[11].find(text=True))
M.append(cells[12].find(text=True))
N.append(cells[13].find(text=True))
P.append(cells[14].find(text=True))
Q.append(cells[15].find(text=True))
U.append(cells[16].find(text=True))
import pandas as pd
df=pd.DataFrame(A,columns=['DATE'])
df['EC']=B
df['FS']=C
df['GAU']=D
df['KJN']=F
df['LIM']=G
df['MPU']=H
df['NW']=I
df['NC']=J
df['WC']=K
df['NEW']=L
df['TOTAL']=M
df['NEW']=N
df['TOTAL']=O
df['REC']=P
df['TESTED']=Q
df['REF']=U
df
Aweful lot of work to get into a dataframe when pandas has the read_html() function to do precisely that (actually uses beautifulsoup under the hood).
.read_html() will return a list of dataframes (Ie the <table> tags in the html). It's just a matter of pulling out the one you want.
import pandas as pd
url = "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_South_Africa"
dfs = pd.read_html(url)
df = dfs[3]
Output:
print (df.to_string())
Date EC FS GP KZN LP MP NW NC WC Confirmed Deaths Rec Tested Ref
Date EC FS GP KZN LP MP NW NC WC New Total New Total Rec Tested Ref
0 2020-03-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN 181 [22]
1 2020-03-05 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN [2]
2 2020-03-06 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 1.0 NaN NaN NaN NaN NaN
3 2020-03-07 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 2.0 NaN NaN NaN NaN [11]
4 2020-03-08 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 3.0 NaN NaN NaN NaN [23]
5 2020-03-09 NaN NaN NaN 4.0 NaN NaN NaN NaN NaN 4.0 7.0 NaN NaN NaN NaN [24]
6 2020-03-10 NaN NaN 2.0 1.0 NaN NaN NaN NaN NaN 3.0 10.0 NaN NaN NaN 239 [25]
7 2020-03-11 NaN NaN 2.0 NaN NaN NaN NaN NaN 1.0 3.0 13.0 NaN NaN NaN 645 [12][26]
8 2020-03-12 NaN 0.0 1.0 1.0 NaN 1.0 NaN NaN NaN 3.0 16.0 NaN NaN NaN 848 [27][28][29]
9 2020-03-13 NaN NaN 4.0 2.0 NaN NaN NaN NaN 2.0 8.0 24.0 NaN NaN NaN 924 [30][31]
10 2020-03-14 NaN NaN 7.0 1.0 NaN NaN NaN NaN 6.0 14.0 38.0 NaN NaN NaN 1017 [32][33]
11 2020-03-15 NaN NaN 7.0 1.0 NaN NaN NaN NaN 5.0 13.0 51.0 NaN NaN NaN 1476 [34][3][35]
12 2020-03-16 NaN NaN 7.0 NaN 1.0 1.0 NaN NaN 2.0 11.0 62.0 NaN NaN NaN 2405 [17][36]
13 2020-03-17 NaN NaN 14.0 4.0 NaN NaN NaN NaN 5.0 23.0 85.0 NaN NaN NaN 2911 [18][37]
14 2020-03-18 NaN NaN 16.0 3.0 NaN 2.0 NaN NaN 10.0 31.0 116.0 NaN NaN NaN 3070 [38][19][39]
15 2020-03-19 NaN NaN 15.0 3.0 NaN 1.0 NaN NaN 15.0 34.0 150.0 NaN NaN NaN 4832 [40][41][42]
16 2020-03-20 NaN 7.0 33.0 1.0 NaN NaN NaN NaN 11.0 52.0 202.0 NaN NaN 2 6438 [43][44]
17 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
18 Cases 0.0 7.0 109.0 24.0 1.0 5.0 0.0 0.0 56.0 NaN NaN NaN including local transmission including local transmission including local transmission including local transmission

How to concatenate datetime instance to date in pandas?

I have a dataset which has a timestamp. Now I cannot take timestamp data into regression model as it would not allow that. Hence I wanted to concatenate the time stamp data, into particular dates and group the rows which fall on the same date. How do I go about doing that?
Example data set
print(processed_df.head())
date day isWeekend distance time
15 2016-07-06 14:43:53.923 Tuesday False 0.000 239.254
17 2016-07-07 09:24:53.928 Wednesday False 0.000 219.191
18 2016-07-07 09:33:02.291 Wednesday False 0.000 218.987
37 2016-07-14 22:03:23.355 Wednesday False 0.636 205.000
46 2016-07-14 23:51:49.696 Wednesday False 0.103 843.000
Now I would like the date to be index and all Wednesday rows can be combined to form a single row adding the distance and time.
My attempt on same.
print(new_df.groupby('date').mean().head())
distance time
date
2016-07-06 14:43:53.923 0.0 239.254
2016-07-07 09:24:53.928 0.0 219.191
2016-07-07 09:33:02.291 0.0 218.987
2016-07-07 11:28:26.920 0.0 519.016
2016-07-08 11:59:02.044 0.0 398.971
Which has failed.
Desired output
distance time
date
2016-07-06 0.0 239.254
2016-07-07 0.0 957.194
2016-07-08 0.0 398.971
I think you need groupby by dt.date:
#cast if dtype is not datetime
df.date = pd.to_datetime(df.date)
print (df.groupby([df.date.dt.date])['distance', 'time'].mean())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-14 0.3695 524.000
Another solution with resample, but then need remove NaN rows by dropna:
print (df.set_index('date').resample('D')['distance', 'time'].mean())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-08 NaN NaN
2016-07-09 NaN NaN
2016-07-10 NaN NaN
2016-07-11 NaN NaN
2016-07-12 NaN NaN
2016-07-13 NaN NaN
2016-07-14 0.3695 524.000
print (df.set_index('date').resample('D')['distance', 'time'].mean().dropna())
distance time
date
2016-07-06 0.0000 239.254
2016-07-07 0.0000 219.089
2016-07-14 0.3695 524.000

Time Series parsing - PI Data

I have a very large time series data set in the following format.
"Tag.1","1/22/2015 11:59:54 PM","570.29895",
"Tag.1","1/22/2015 11:59:56 PM","570.29895",
"Tag.1","1/22/2015 11:59:58 PM","570.29895",
"Tag.1","1/23/2015 12:00:00 AM","649.67133",
"Tag.2","1/22/2015 12:00:02 AM","1.21",
"Tag.2","1/22/2015 12:00:04 AM","1.21",
"Tag.2","1/22/2015 12:00:06 AM","1.21",
"Tag.2","1/22/2015 12:00:08 AM","1.21",
"Tag.2","1/22/2015 12:00:10 AM","1.21",
"Tag.2","1/22/2015 12:00:12 AM","1.21",
I would like to separate this out into a data frame with a common column for the time stamp and one column each for the tags.
Date.Time, Tag.1, Tag.2, Tag.3...
1/22/2015 11:59:54 PM,570.29895,
Any suggestions would be appreciated!
Maybe something like this:
cast(df,V2~V1,mean,value='V3')
V2 Tag.1 Tag.2
1 1/22/2015 11:59:54 PM 570.2989 NaN
2 1/22/2015 11:59:56 PM 570.2989 NaN
3 1/22/2015 11:59:58 PM 570.2989 NaN
4 1/22/2015 12:00:02 AM NaN 1.21
5 1/22/2015 12:00:04 AM NaN 1.21
6 1/22/2015 12:00:06 AM NaN 1.21
7 1/22/2015 12:00:08 AM NaN 1.21
8 1/22/2015 12:00:10 AM NaN 1.21
9 1/22/2015 12:00:12 AM NaN 1.21
10 1/23/2015 12:00:00 AM 649.6713 NaN
cast is a part of reshape package
Bests,
ZP

Resources