ggplot2 Reorder Data that is in minutes:seconds Format - r

The first five entries (out of twenty) of my dataset:
>head(data)
Name SDC
<chr> <Period>
1 Feuerman 1M 37S
2 Solis 1M 52S
3 Osborne 1M 47S
4 Frizzell 1M 58S
5 Moran 1M 59S
Also:
> dput(head(data))
structure(list(Name = c("Feuerman", "Solis", "Osborne", "Frizzell",
"Moran", "Seth"), Deadlift = c(320, 250, 340, 250, 250, 200),
Medicine_Ball = c(11.6, 8.8, 12.5, 9.2, 9.7, 9.1), HRP = c(46,
39, 36, 33, 42, 31), SDC = new("Period", .Data = c(37, 52,
47, 58, 59, 15), year = c(0, 0, 0, 0, 0, 0), month = c(0,
0, 0, 0, 0, 0), day = c(0, 0, 0, 0, 0, 0), hour = c(0, 0,
0, 0, 0, 0), minute = c(1, 1, 1, 1, 1, 2)), Leg_Tuck = c(20,
13, 4, 10, 13, 13), Run = new("Period", .Data = c(48, 59,
10, 53, 0, 29), year = c(0, 0, 0, 0, 0, 0), month = c(0,
0, 0, 0, 0, 0), day = c(0, 0, 0, 0, 0, 0), hour = c(0, 0,
0, 0, 0, 0), minute = c(13, 12, 17, 16, 0, 16)), Total = c(570,
508, 513, 470, 410, 452), Pass_Fail = structure(c(1L, 1L,
2L, 1L, 2L, 1L), .Label = c("Pass", "Fail"), class = "factor"),
Date = structure(c(18522, 18522, 18522, 18522, 18522, 18522
), class = "Date")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
As you can see, SDC is in minutes:seconds format. I achieved this using ms(data$SDC) to change the column type. I am trying to create a plot using geom_col that orders SDC from lowest to highest times. I am facing two problems:
When using the reorder command, the times are not properly reordered (see plot below).
The axes are being formatted by hour:minute:second but I want it to be formatted in only minute:second format (also see plot below).
Here is my code to generate the plot:
ggplot(data=data,
aes(x=reorder(Name, -SDC), y=SDC, fill=Pass_Fail)) +
scale_y_time(limits=c(0,200)) +
scale_fill_manual(values=c('#00BFC4', '#F8766D')) +
labs(x='Soldier', y='Sprint Drag Carry Time', fill='Passed/Failed ACFT', title='Sprint Drag Carry Scores') +
geom_col() +
geom_text(size=3, aes(label = SDC), hjust=-0.04) +
coord_flip() +
theme_classic()
It produces the following plot:
As you can see, the reordering is incorrect and the axes are not formatted the way I want them to be. Thanks in advance for your help.

I think reorder have trouble working with Period object. We can arrange the factor levels according to the value of SDC to get bars in increasing orders.
We can pass custom function for y-axis to get only minutes and seconds in labels.
library(tidyverse)
data %>%
arrange(SDC) %>%
mutate(Name = factor(Name, levels = unique(Name))) %>%
ggplot() + aes(x=Name, y=SDC, fill=Pass_Fail) +
scale_y_time(limits=c(0,200),
labels = function(x) sprintf('%02s:%02s', minute(x),second(x))) +
scale_fill_manual(values=c('#00BFC4', '#F8766D')) +
labs(x='Soldier', y='Sprint Drag Carry Time',
fill='Passed/Failed ACFT', title='Sprint Drag Carry Scores') +
geom_col() +
geom_text(size=3, aes(label = SDC), hjust=-0.04) +
coord_flip() +
theme_classic()

Related

How to change time-order on x-axis?

I have a question to this specific code:
ggplot(MAGS)+
geom_col(aes(x = Photo.time#hour + Photo.time#minute/60, y = Number.of.Animals), lwd = 1) + ylab("total amount") +
scale_x_continuous(breaks = seq(0,24,4), name = "time", labels = c( "0:00", "4:00", "8:00", "12:00", "16:00", "20:00", "23:59")) +
theme_bw() + theme_classic()
scale_y_continuous(breaks = seq(0,10,2), name = "total amount", labels = c( "o","2","4","6","8","10"))
With this code I created the attached plot. This plot is okay but I guess it would look better if I changed the x axis so that it starts with 12:00pm, has 00:00am in the middle and ends with 11:59am. Kind of like in the attached plot but flipped. The data set comes from a nocturnal animal with high activity around midnight so it would be better to have 00:00 in the center of the x axis.
I tried several things but i always ended up with a mess. I can't figure out where my mistake is.
Thank you very much for helping :)
I tried several things to rearrange the x axis but I can't find the problem.
I'd suggest making a helper column that is ordered the way you want -- in this case I have added 24 hours to the first 12 hours of the day, to make hours 0:12 appear in hours 24:36, and then adjusting the labeling accordingly.
df1 <- data.frame(x = seq(0, 24 - 1/60, 1/60), y = 1:1440)
df1$x_order = df1$x + ifelse(df1$x < 12, 24, 0)
ggplot(df1, aes(x_order, y)) +
geom_col() +
scale_x_continuous(breaks = 12 + seq(0,24,4), name = "time",
labels = c("12:00", "16:00", "20:00", "00:00am", "4:00", "8:00", "12:00"))
EDIT - Based on the sample data you added in a comment, I've made some fake data that shows the overall pattern you have in your full data:
structure(list(Photo.time = new("Period",
.Data = c(51, 52, 54, 55, 56, 58, 0, 58, 56, 57),
year = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
month = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
day = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
hour = c(1, 2, 3, 4, 5, 6, 7, 12, 15, 20),
minute = c(48, 48, 48, 49, 49, 49, 58, 49, 0, 0)),
Number.of.Animals = c(10L, 8L, 6L, 5L, 2L, 1L, 0L, 2L, 6L, 10L)),
class = "data.frame", row.names = c(NA, 10L)) |>
ggplot() +
geom_col(aes(x = Photo.time#hour + Photo.time#minute/60, y = Number.of.Animals), lwd = 1) + ylab("total amount") +
scale_x_continuous(breaks = seq(0,24,4), name = "time", labels = c( "0:00", "4:00", "8:00", "12:00", "16:00", "20:00", "23:59")) +
theme_bw() + theme_classic()
I had trouble manipulating your time data, so I converted to decimal hours and applied my adjustment from above:
structure(list(Photo.time = new("Period", .Data = c(51, 52, 54, 55, 56, 58, 0, 58, 56, 57), year = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), day = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), hour = c(1, 2, 3, 4, 5, 6, 7, 12, 15, 20), minute = c(48, 15, 48, 30, 49, 49, 58, 49, 0, 0)), Number.of.Animals = c(10L, 8L, 6L, 5L, 2L, 1L, 0L, 2L, 6L, 10L)), class = "data.frame", row.names = c(NA, 10L)) |>
mutate(time_hr_dec = as.numeric(Photo.time)/(60*60),
time_hr_dec2 = time_hr_dec + ifelse(time_hr_dec < 12, 24, 0)) |>
ggplot() +
geom_col(aes(x = time_hr_dec2, y = Number.of.Animals), lwd = 1) + ylab("total amount") +
scale_x_continuous(breaks = 12 + seq(0,24,4), name = "time",
labels = c("12:00", "16:00", "20:00", "00:00am", "4:00", "8:00", "12:00")) +
theme_bw() + theme_classic()
Yay, look! It has the expected shape and labels.

Format numbers on axes and tooltips in ggplotly

I have used sprintf and formatC to take a double value and round it to two decimal places. However, when I go to use it in ggplot and ggplotly, it makes my visuals act out.
Dput:
structure(list(Date = structure(c(18328, 18329, 18330, 18331,
18332, 18333), class = "Date"), State = c("Louisiana", "Louisiana",
"Louisiana", "Louisiana", "Louisiana", "Louisiana"), variablename1 = c(0,
0, 1, 1, 6, 14), variablename2 = c(5, 5, 5, 11, 37, 37), death = c(0,
0, 0, 0, 0, 0), variablename3 = c(5, 5, 6, 12, 43, 51), variablename4 = c(0,
0, 0, 0, 0, 0), variablename5 = c(0, 0, 0, 0, 0, 0), variablename6 = c(0,
0, 0, 6, 26, 0), variablename7 = c(0, 0, 1, 0, 5, 8), variablename8 = c(0,
5, 1, 6, 31, 8), Percent = c(0, 0, 16.6666666666667, 8.33333333333333,
13.953488372093, 27.4509803921569)), row.names = c(NA, -6L), groups = structure(list(
State = "Louisiana", .rows = list(1:6)), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Edit: In my quest of finding tricks, I have found the 'non-short' way of doing things more efficient. Just use round(variable, 2) to round a value to the second decimal place and use that. For now.
This can be easily achieved via scales::percent or scales::percent_format. As an example I made a line plot of Percent by Date where I use scales::percent_format to format y-axis labels and scales::percent to format the percent value in the tooltip:
library(ggplot2)
library(plotly)
p <- df %>%
ungroup() %>%
ggplot(aes(Date, Percent, color = State, group = State,
text = sprintf("State: %s<br>Date: %s<br>Percent: %s",
State, Date, scales::percent(Percent, scale = 1, accuracy = .01)))) +
geom_line() +
scale_y_continuous(labels = scales::percent_format(scale = 1, accuracy = .01))
ggplotly(p, tooltip = 'text')

Error in if (is.double(data$x) && !has_groups(data) && any(data$x != data$x[1L])) { : missing value where TRUE/FALSE needed

I'm trying to use ggplot, and am hoping to create a boxplot that has four categories on the x axis for suspension data (low, lowish, highish, high) and farms on the y-axis.
I have I think broken the suspension column into four groups. But ggplot is upset with me. Here is the error:
```
Error in if (is.double(data$x) && !has_groups(data) && any(data$x != data$x[1L])) { : missing value where TRUE/FALSE needed
```
Here is my code:
```{r}
# To break suspension_rate_total_pct data into groups for clearer visualization, I found the min, and max
merged_data$suspension_rate_total_pct <-
as.numeric(merged_data$suspension_rate_total_pct)
max(merged_data$suspension_rate_total_pct, na.rm=TRUE)
min(merged_data$suspension_rate_total_pct, na.rm=TRUE)
low_suspension <- merged_data$suspension_rate_total_pct > 0 & merged_data$suspension_rate_total_pct < 0.5
low_ish_suspension <- merged_data$suspension_rate_total_pct > 0.5 & merged_data$suspension_rate_total_pct < 1
high_ish_suspension <- merged_data$suspension_rate_total_pct > 1 & merged_data$suspension_rate_total_pct < 1.5
high_suspension <- merged_data$suspension_rate_total_pct > 1.5 & merged_data$suspension_rate_total_pct < 2
ggplot(merged_data, aes(x = suspension_rate_total_pct , y = farms_pct)) +
geom_boxplot()
```
Here is the Data:
merged_data <- structure(list(schid = c("1030642", "1030766", "1030774", "1030840",
"1130103", "1230150"), enrollment = c(159, 333, 352, 430, 102,
193), farms = c(132, 116, 348, 406, 68, 130), foster = c(2, 0,
1, 8, 1, 4), homeless = c(14, 0, 8, 4, 1, 4), migrant = c(0,
0, 0, 0, 0, 0), ell = c(18, 12, 114, 45, 7, 4), suspension_rate_total = c(NA,
20, 0, 0, 95, 5), suspension_violent = c(NA, 9, 0, 0, 20, 2),
suspension_violent_no_injury = c(NA, 6, 0, 0, 47, 1), suspension_weapon = c(NA,
0, 0, 0, 8, 0), suspension_drug = c(NA, 0, 0, 0, 9, 1), suspension_defiance = c(NA,
1, 0, 0, 9, 1), suspension_other = c(NA, 4, 0, 0, 2, 0),
farms_pct = c(0.830188679245283, 0.348348348348348, 0.988636363636364,
0.944186046511628, 0.666666666666667, 0.673575129533679),
foster_pct = c(0.0125786163522013, 0, 0.00284090909090909,
0.0186046511627907, 0.00980392156862745, 0.0207253886010363
), migrant_pct = c(0, 0, 0, 0, 0, 0), ell_pct = c(0.113207547169811,
0.036036036036036, 0.323863636363636, 0.104651162790698,
0.0686274509803922, 0.0207253886010363), homeless_pct = c(0.0880503144654088,
0, 0.0227272727272727, 0.00930232558139535, 0.00980392156862745,
0.0207253886010363), suspension_rate_total_pct = c(NA, 2,
1, 1, 2, 2)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
If you can, please help me appease ggplot so that it will give me with beautiful visualization. Currently, this feels like a one-sided, emotional rollercoaster of a relationship.
Just a short answer, i am sure you can figure out the rest by yourself, (otherwise post a followup question.)
Since the data you provided has some NA's in the first row in several columns, i can only demonstrate you the principle on how to get your desired result by using the merged_data$homless value as group-input for our boxplots , the data (y-value) will be still Farms .
# first we create our groups of low, middle & high amount of homeless
merged_data2<- merged_data %>% mutate(homelessgroup= ifelse(homeless < 4, "low",
ifelse(homeless <= 8, "middle",
ifelse(homeless > 8, "high",NA ))))
## then we plot the data using ggplot
ggplot(merged_data2,aes(y=farms,fill=homelessgroup))+geom_boxplot()
I think you can just use cut() with your data to partition into 4 groups. Then you can use that variable with the plot
merged_data <- transform(merged_data,
group = cut(
suspension_rate_total_pct,
c(0, .5, 1, 1.5, 2),
include.lowest = TRUE,
labels = c("low", "lowish", "highish", "high")))
ggplot(merged_data, aes(x = group , y = farms_pct)) +
geom_boxplot()

Plotting with ggplot and chron - label change to %H:%M:%S

I am working with some chron data of the class 'times' and am plotting it in a scatter plot. However, I want the labeling to be in %H:%M:%S format for the variable on the x axis Tim.V. Simply adding scale_x_continuous(labels = "%H:%M:%S") to the code below does not seem to do the trick. I don't need to convert the data in any way - just the format of the label on the x axis. Any insight on how to do this? It seems like it should be simple.
doeplotnet <- ggplot(division, aes(x =Tim.V, y = Age)) + geom_point() + scale_x_reverse()
Sample Data (Age is numeric and Tim.V is 'times')
Age Tim.V
40 00:33:08
36 00:59:27
29 01:05:33
52 00:49:14
49 01:08:00
44 00:30:45
You can also use lubridate::ymd_hms to convert to a datetime with a dummy date, and plot that with ggplot2:
library(tidyverse); library(lubridate)
mydata3 <- mydata2 %>%
mutate(time3 = lubridate::ymd_hms(paste(
"2000-01-01", hour(time2), minute(time2), second(time2))))
ggplot(mydata3, aes(x=time3, y=pending, color=server, group=tradedate)) +
geom_point() +
facet_wrap(~ tradedate)
Sample data used:
mydata2 <-
data_frame(time2 = new(
"Period",
.Data = c(23, 23, 42, 42, 24, 24, 42, 42),
year = c(0, 0, 0, 0, 0, 0, 0, 0),
month = c(0, 0, 0, 0, 0, 0, 0, 0),
day = c(0,
0, 0, 0, 0, 0, 0, 0),
hour = c(14, 14, 14, 14, 14, 14, 14, 14),
minute = c(5, 5, 5, 5, 6, 6, 6, 6)
),
pending = runif(8),
server = "server1",
tradedate = rep(ymd(c(20190101, 20190102)), 4)
)
This works well:
library(chron)
library(ggplot2)
division$Tim.V <- times(division$Tim.V)
breaks2 <- seq(min(division$Tim.V), max(division$Tim.V), length.out = 5)
labels2 <- times(breaks2)
doeplotnet <- ggplot(division, aes(x = as.numeric(Tim.V), y = Age)) + geom_point() +
scale_x_reverse(labels = labels2, breaks = breaks2)
doeplotnet
division <- read.table(text= "Age Tim.V
40 00:33:08
36 00:59:27
29 01:05:33
52 00:49:14
49 01:08:00
44 00:30:45", stringsAsFactors=TRUE, header = TRUE)

ggplot2: how do I add a second plot line

Ok,
[R3.4.2 + ggplot2]
Using the data example listed below, how do I add a second data plot? I tried this example which I found on this site;
library(ggplot2]
** This is part of the origanl code ****
rpt<-read.csv(file="rpt.csv,header=T)
rpt1<-read.csv(file="rpt1.csv,header=T)
*** code starts here *****
ggplot(rpt,aes(JulianDate,w)) + geom_line(aes(color="First line")) +
geom_line(data=rpt1, aes(color="Second line")) + labs(color="Legend text")
The first plot has x=rpt$JulianDate, y=rpt1$w; and the second plot has x1= rpt1$JDAy and y2=rpt1$wolf)
The data (use dget(_) to read it):
structure(list(
JDay = c(57023, 57024, 57027, 57028, 57029, 57031, 57032, 57035, 57037),
Obs = c(1, 1, 1, 1, 1, 1, 1, 1, 1),
w = c(71, 105, 64, 44, 45, 38, 66, 49, 28),
WStd = c(0, 0, 0, 0, 0, 0, 0, 0, 0),
wolf = c(91.59, 135.45, 82.56, 56.76, 58.05, 49.02, 85.14, 63.21, 36.12),
Adj = c(0, 0, 0, 0, 0, 0, 0, 0, 0)),
.Names = c("JDay", "Obs", "w", "WStd", "wolf", "Adj"),
class = "data.frame",
row.names = c(NA, -9L))
In your comment, you say that rpt and rpt1 have the same data. Therefore, I think this is what you are asking for
library(ggplot2)
ggplot(rpt, aes(x=JDay)) +
geom_line(aes(y=w, color="First line")) +
geom_line(aes(y=wolf, color="Second line")) +
labs(color="Legend text")

Resources