I have bike data that looks like this - the dimensions of the data frame are large.
> dim(All_2014)
[1] 994367 10
> head(All_2014)
X bikeid end.station.id start.station.id diff.time stoptime starttime
1 1 16379 285 356 338387 2014-01-02 15:22:28 2014-01-06 13:22:15
2 2 16379 361 146 47631 2014-01-09 22:45:34 2014-01-10 11:59:25
3 3 16379 268 327 5089 2014-01-10 12:35:22 2014-01-10 14:00:11
4 4 16379 398 324 715924 2014-01-22 14:34:55 2014-01-30 21:26:59
5 5 15611 536 445 716031 2014-01-02 15:30:44 2014-01-10 22:24:35
6 6 15611 348 433 68544 2014-01-12 14:03:01 2014-01-13 09:05:25
midtime Hour Day
1 2014-01-04 14:22:21 14 Saturday
2 2014-01-10 05:22:29 5 Friday
3 2014-01-10 13:17:46 13 Friday
4 2014-01-26 18:00:57 18 Sunday
5 2014-01-06 18:57:39 18 Monday
6 2014-01-12 23:34:13 23 Sunday
My aim is to create a heat map using ggplot2 (or another package if it is better suited) that looks like this one, where day of the week is on the y-axis and hour is on the x-axis (the hour does not have to be in AM/PM, it can remain as is on the 24-hour scale.:
The fill of the boxes is a percentage that represents the amount of rides taken within a given hour-interval/the total rides on that day of the week. I have managed to get this far with the data, but would like to know the easiest way to find percentages and then, how to create a heat map with them.
Using dplyr to do the calculations, and ggplot2 to do the chart:
library(dplyr)
library(ggplot2)
## First siimulate some data
rider_num <- 1:10000
days <- factor(c("Sun", "Mon", "Tues", "Wed", "Thur", "Fri", "Sat"),
levels = rev(c("Sun", "Mon", "Tues", "Wed", "Thur", "Fri", "Sat")),
ordered = TRUE)
day <- sample(days, 10000, TRUE,
c(0.3, 0.5, 0.8, 0.8, 0.6, 0.5, 0.2))
hour <- round(rbeta(10000, 1, 2, 6) * 23)
df <- data.frame(rider_num, hour, day)
## Use dplyr functions to summarize on days and hours to get the
## percentage of riders per hour each day:
df2 <- df %>%
group_by(day, hour) %>%
summarise(n=n()) %>%
mutate(percent_of_riders=n/sum(n)*100)
## Plot using ggplot and geom_tile, tweaking colours and theme elements
## to your liking:
ggplot(df2, aes(hour, day)) +
geom_tile(aes(fill = percent_of_riders), colour = "white") +
scale_fill_distiller(palette = "YlGnBu", direction = 1) +
scale_x_discrete(breaks = 0:23, labels = 0:23) +
theme_minimal() +
theme(legend.position = "bottom", legend.key.width = unit(2, "cm"),
panel.grid = element_blank()) +
coord_equal()
Using #andyteucher's df2, here's a lattice approach:
library(lattice)
library(RColorBrewer)
levelplot(percent_of_riders~hour+day, df2,
aspect='iso', xlab='', ylab='', border='white',
col.regions=colorRampPalette(brewer.pal(9, 'YlGnBu')),
at=seq(0, 12, length=100), # specify breaks for the colour ramp
scales=list(alternating=FALSE, tck=1:0, x=list(at=0:23)))
One simple way to replace missing data (e.g. Sunday at midnight) with zero is to pass an xtabs object to levelplot instead:
levelplot(xtabs(percent_of_riders ~ hour+day, df2), aspect='iso', xlab='', ylab='',
col.regions=colorRampPalette(brewer.pal(9, 'YlGnBu')),
at=seq(0, 12, length=100),
scales=list(alternating=FALSE, tck=1:0),
border='white')
You can also use d3heatmap for interactivity:
library(d3heatmap)
xt <- xtabs(percent_of_riders~day+hour, df2)
d3heatmap(xt[7:1, ], colors='YlGnBu', dendrogram = "none")
Related
I have a facet plot that I need to place a rectangle in or highlight 3 specific facets. Facets 5, 6, and 10. See Below:
I found some code referring to "geom_rect" that seems like it may work but it won't show up, also doesn't give me any error message. Here is the code:
weekly_TS_PDF<- ggplot(TS_stack, aes(x= TS_log, y = TS_depth, color= sentiment)) +
scale_y_reverse(limits= c(16,2), breaks= seq(16,2)) +
geom_rect(data = data.frame(Week = 5), aes(xmin = -65, xmax = -55, ymin = 1, ymax = 16), alpha = .3, fill="grey", inherit.aes = F) +
geom_point() + facet_grid(.~ Week) + geom_hline(data = week_avg_15E, aes(yintercept = x), linetype = "solid") +
ylab("Target Depth (m)") + xlab("Mean Target Strength (dB)") + ggtitle("Mean TS by Depth by Week (12 hour resolution)") +
guides(color=guide_legend("Year"))
Reprex data:
X TS_depth Group.1 x TS_log Date_time AMPM Week sentiment
1 1 9.593093 2020-12-01 18:00:00 5.390264e-07 -62.68390 2020-12-01 18:00:00 PM 5 Year 1
2 2 9.550032 2020-12-02 06:00:00 4.022841e-07 -63.95467 2020-12-02 06:00:00 AM 6 Year 1
3 3 9.677069 2020-12-02 18:00:00 6.277191e-07 -62.02235 2020-12-02 18:00:00 PM 7 Year 1
4 4 9.679256 2020-12-03 06:00:00 3.501608e-07 -64.55732 2020-12-03 06:00:00 AM 8 Year 1
5 5 9.606380 2020-12-03 18:00:00 6.698625e-07 -61.74014 2020-12-03 18:00:00 PM 9 Year 1
6 6 9.548408 2020-12-04 06:00:00 4.464622e-07 -63.50215 2020-12-04 06:00:00 AM 10 Year 1
I just need to highlight or put a rectangle in facets 5,6, and 10. Any help is appreciated.
There exist several functions in R in libraries xts and zoo, which try to aggregate financial OHLC(V) data from lower to higher granularities, as well as the newcomer tibbletime::to_period, which performs the same task for a tibble. All of them, however, suffer from the same inefficiency: When aggregating by, let us say, one hour, they take the round times as the start and end points of the intervals, i.e. boundaries would be 8 AM, 9 AM, 10 AM,... If I have data with 15 min candles, how can I aggregate OHLC(V), so that it is aggregated by 1 H intervals, not by the round times?
Time <- seq(from = as.POSIXct("2018-12-28 12:00:00"), to = as.POSIXct("2019-01-02 13:30:00"), by = 900)
Price_Data <- tibble::tibble(Time = Time,
Open = 100 + rnorm(n = length(Time)),
High = 100 + rnorm(n = length(Time)),
Low = 100 + rnorm(n = length(Time)),
Close = 100 + rnorm(n = length(Time)),
Volume = rpois(n = length(Time), lambda = 5000))
tail(Price_Data)
1 2019-01-02 12:15:00 99.7 5074
2 2019-01-02 12:30:00 99.9 4925
3 2019-01-02 12:45:00 101. 5070
4 2019-01-02 13:00:00 98.6 4919
5 2019-01-02 13:15:00 98.6 4925
6 2019-01-02 13:30:00 99.5 5046
How can I aggragate the above tibble to 30M, 1H, 2H and 4H, so that the groups will of the desired length? For example, the last group in aggregating by 1H would take the 4 candles from 12:45:00 to 13:30:00, 2H from 11:45:00, ...
I have tried
purrr::map(c("30 M","1 H","2 H","4 H")), function(Period) Price_Data %>%
na.omit() %>% tibbletime::tbl_time(., index = Time) %>%
tibbletime::collapse_by(Period, side = "end", clean = T) %>%
dplyr::group_by(Time) %>%
dplyr::mutate(Open = dplyr::first(Open),
High = max(High),
Low = min(Low),
Close = dplyr::last(Close),
Volume = sum(Volume)) %>%
dplyr::slice(n = n()) %>% dplyr::ungroup())
with various combinations of parameters, but nothing produces the desired result. Also, grouping by the number of candles in specific interval does not help, as real world data has gaps.
I am new to ggplot and is trying to plot two lines using it. But my x-axis appeared to be very weird, and now i want to remove it. Here is my code.
ggplot(BJ11, aes(Date, mean,group=1)) +
geom_line(aes(color = "stateair daily values")) +
geom_line(data = bjvalue2,
aes(color = "CNEMC values"))
Here are my data:
> head(BJ11)
Date min max mean
1 2015-01-01 6 154 54.58333
2 2015-01-02 12 157 63.54167
3 2015-01-03 147 322 209.25000
4 2015-01-04 106 360 201.16667
5 2015-01-05 9 186 90.87500
6 2015-01-06 10 121 43.16667
> head(bjvalue2)
Date mean
1 2015-01-01 43
2 2015-01-02 52
3 2015-01-03 150
4 2015-01-04 176
5 2015-01-05 92
6 2015-01-06 40
what should i do to remove both the thick black axis above "Date" and the x-axis?
ggplot(BJ11, aes(Date, mean, group=1))+
geom_line(aes(color = "stateair daily values"))+
geom_line(data = bjvalue2, aes(color = "CNEMC values"))+
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.line.x = element_blank())
Another option would be to fix your x-axis instead of removing it. I know it is not your question, but ggplot is very good in handling date-axis, so I'm wondering if you have dates as characters? If you have then eg. library(lubridate) with ymd() can be used.
I'm guessing your group=1 could be omitted for simplicity.
Also, in your last line I'd personally prefer defining the x and y axis inside the aes to make sure R handles the data the way you want.
I am trying to replace the x axis of a histogram with its month, the data looks similar to:
library(tidyverse)
library(lubridate)
library(okcupiddata) # the example data
df <- profiles %>% as_tibble() %>%
select(last_online) %>%
mutate(month = month(last_online, label = TRUE, abbr = FALSE),
day = yday(last_online))
# A tibble: 59,946 x 3
last_online month day
<dttm> <dbl> <dbl>
1 2012-06-28 20:30:00 June 180
2 2012-06-29 21:41:00 June 181
3 2012-06-27 09:10:00 June 179
4 2012-06-28 14:22:00 June 180
5 2012-06-27 21:26:00 June 179
now I want to create a histogram with the days of the year
df %>%
ggplot(aes(x = day, fill = ..count..)) +
geom_histogram(bins = 365) +
scale_y_log10()
I want to replace the day-axis with it assigned month variable. I tried to use scale_x_discrete(labels = month), but this is just deleting the axis.
I assume I need to perform a larger transformation or programming, but I hope there is already a function that can quickly be applied.
I ultimately want to create a radial plot (adding + coord_polar()) with the month as a break, similar to this:
I have 2 dataframes like this
Timestamp <- c("2018-01-25 01:03:46","2018-01-25 10:09:36","2018-01-25 11:28:47")
Type <- c("Measurement","Measurement","Measurement")
Comment <- c("Positive","Negative","Positive")
df1 <- data.frame(Timestamp,Type,Comment)
df1$Timestamp <- as.POSIXct(df1$Timestamp,format="%Y-%m-%d %H:%M:%S")
Timestamp <- c("2018-01-25 00:03:46","2018-01-25 01:03:46","2018-01-25 08:28:47","2018-01-25 09:09:36","2018-01-25 10:28:47",
"2018-01-25 11:03:46","2018-01-25 12:09:36","2018-01-25 14:28:47","2018-01-25 17:09:36","2018-01-25 18:28:47")
Type <- c("Measurement","Measurement","Measurement","Measurement","Measurement",
"Measurement","Measurement","Measurement","Measurement","Measurement")
Length <- c(35,39,38,33,29,31,34,36,33,37)
df2 <- data.frame(Timestamp,Type,Length)
df2$Timestamp <- as.POSIXct(df2$Timestamp,format="%Y-%m-%d %H:%M:%S")
I am trying to plot df2 with the "comment" column of df1 overlayed in the plot.
When I do a merge of df1 & df2 like this
library(dplyr)
df_join <- left_join(df2, df1, by=c("Timestamp","Type"),all.x=T)
I get
Timestamp Type Length Comment
2018-01-25 00:03:46 Measurement 35 <NA>
2018-01-25 01:03:46 Measurement 39 Positive
2018-01-25 08:28:47 Measurement 38 <NA>
2018-01-25 09:09:36 Measurement 33 <NA>
2018-01-25 10:28:47 Measurement 29 <NA>
2018-01-25 11:03:46 Measurement 31 <NA>
2018-01-25 12:09:36 Measurement 34 <NA>
2018-01-25 14:28:47 Measurement 36 <NA>
2018-01-25 17:09:36 Measurement 33 <NA>
2018-01-25 18:28:47 Measurement 37 <NA>
And so when I plot it, only the "positive" comment gets displayed in the plot.
library(ggplot2)
library(ggrepel)
library(scales)
ggplot(data = df_join,aes(x=Timestamp,y=Length)) +
geom_line(aes(y = Length, colour = "Length"),size = 0.5) +
geom_point(alpha=0.6, position=position_jitter(w=0.05, h=0.0), size=1) +
geom_text_repel(aes(Timestamp, Length, label = Comment)) +
scale_x_datetime(breaks = date_breaks("2 hours"),labels = date_format("%m/%d %H:%M")) +
theme(legend.position="none")
I would like to get the other comments (with their timestamps) in df1 in the same plot of df2. Is there a better way of doing it than merging the 2 dataframes?