How to make ggplot2 exclude zero values in this plot? - r

I have the following R codes running in RStudio, with the output also shown below:
df2 %>%
ggplot(aes(
x = JANUARY,
y = value,
fill = JANUARY,
group = year
)) +
geom_col(
position = position_dodge(.65),
width = .5
) +
geom_text(aes(
y = value + max(value) * .03,
label = round(value * 100) %>% str_c('%')
),
position = position_dodge(.65)
) +
geom_text(aes(
y = y_pos,
label = str_remove(year, 'X')
),
color = 'white',
angle = 90,
fontface = 'bold',
position = position_dodge(0.65)
) +
scale_y_continuous(
breaks = seq(0, .9, .1),
labels = function(x) round(x * 100) %>% str_c('%')
) +
scale_fill_manual(values = c(
rgb(47, 85, 151, maxColorValue = 255),
rgb(84, 130, 53, maxColorValue = 255),
rgb(244, 177, 131, maxColorValue = 255),
rgb(112, 48, 160, maxColorValue = 255),
rgb(90, 48, 100, maxColorValue = 255)
)) +
theme(
plot.title = element_text(hjust = .5),
panel.background = element_blank(),
panel.grid.major.y = element_line(color = rgb(.9, .9, .9)),
axis.ticks = element_blank(),
legend.position = 'none'
) +
xlab('') +
ylab('') +
ggtitle('Month of JANUARY (as at 01 January)')
Output is:
As you can see, the value "0%" under "D-Final" is causing the labels inside the bars to disappear below the x-axis.
I want to remove the "0%" and get the labels back into position inside the bars. How can I modify my codes to achieve this?
Data (df2) added:
JANUARY year value y_pos
<fct> <chr> <dbl> <dbl>
1 D-150 X2016 0.26 0.12
2 D-90 X2016 0.49 0.21
3 D-60 X2016 0.63 0.265
4 D-30 X2016 0.73 0.325
5 D-Final X2016 0.81 0
6 D-150 X2017 0.28 0.12
7 D-90 X2017 0.5 0.21
8 D-60 X2017 0.64 0.265
9 D-30 X2017 0.77 0.325
10 D-Final X2017 0.82 0
11 D-150 X2018 0.33 0.12
12 D-90 X2018 0.51 0.21
13 D-60 X2018 0.62 0.265
14 D-30 X2018 0.77 0.325
15 D-Final X2018 0.78 0
16 D-150 X2019 0.24 0.12
17 D-90 X2019 0.42 0.21
18 D-60 X2019 0.53 0.265
19 D-30 X2019 0.65 0.325
20 D-Final X2019 0 0

It's not really about 0%, at least this point. Position of the labels is predefined and given by y_pos, so you may simply alter it yourself with, e.g.,
df2$y_pos[df2$JANUARY == "D-Final"] <- 0.4
As to remove 0%, the first line could be replaced by
df2 %>% filter(value > 0.01) %>%
This gives
Apparently y_pos was defined with
df2 %>% group_by(JANUARY) %>% mutate(y_pos = min(value) / 2)
Hence, as to avoid this issue, in this case (since all the other value by group are similar) you may instead use
df2 %>% group_by(JANUARY) %>% mutate(y_pos = max(value) / 2)

in the first line you can filter df2 to get all non-zero values using df2 %>% filter(value!=0)

Related

Proper x axis scale with years only

I have a grid with two plots, each one consist of two time series of mean values: one come from an elaboration with R df5 the other one mmzep is not (I received this dataset already calculated).
library(dplyr)
library(lubridate)
df5 <- data.frame(df$Date, df$Price)
colnames(df5)<- c("date","price")
df5$date <- as.Date(df5$date,"%Y/%m/%d")
df5$price<- as.numeric(gsub(",",".",df5$price))
colnames(mmzep)<- c("date","Mar","Apr")
Then, I created other two dfs from df5 , I tried to group in only one df, but I was not able to do it.
meanM <- df5 %>%
mutate(Month = month(date), Year = year(date)) %>%
filter(month(df5$date) %in% 3 & year(df5$date) %in% 2010:2019) %>%
group_by(Year, Month) %>%
summarise_all(list(mean=mean, sd=sd), na.rm=TRUE) %>%
na.omit()
Year Month date_mean price_mean date_sd price_sd
<dbl> <dbl> <date> <dbl> <dbl> <dbl>
1 2010 3 2010-03-23 1082. 5.48 685.
2 2012 3 2012-03-27 858. 2.74 333.
3 2015 3 2015-03-16 603. 8.86 411.
4 2017 3 2017-03-15 674. 9.65 512.
5 2018 3 2018-03-16 318. 9.09 202.
6 2019 3 2019-03-14 840. 9.42 329.
meanA <- df5 %>%
mutate(Month = month(date), Year = year(date)) %>%
filter(month(df5$date) %in% 4 & year(df5$date) %in% 2010:2019) %>%
group_by(Year, Month) %>%
summarise_all(list(mean=mean, sd=sd), na.rm=TRUE) %>%
na.omit()
Year Month date_mean price_mean date_sd price_sd
<dbl> <dbl> <date> <dbl> <dbl> <dbl>
1 2010 4 2010-04-18 361. 9.00 334.
2 2011 4 2011-04-14 527. 8.36 312.
3 2012 4 2012-04-15 726. 8.80 435.
4 2013 4 2013-04-16 872. 8.50 521.
5 2014 4 2014-04-09 668. 5.34 354.
6 2015 4 2015-04-15 689. 8.80 436.
7 2017 4 2017-04-15 806. 8.80 531.
8 2018 4 2018-04-15 727. 8.80 291.
9 2019 4 2019-04-15 600. 8.94 690.
#mmzep
date Mar Apr
<dbl> <dbl> <dbl>
1 2010 793. 540
2 2011 650 378.
3 2012 813. 612.
4 2013 755. 717
5 2014 432. 634
6 2015 474. 782.
7 2016 590 743.
8 2017 544. 628
9 2018 249. 781
10 2019 547. 393
I plot the dfs
g5 = ggplot() +
geom_point(data=meanM, aes(x = (Year), y = (price_mean)),size = 3, colour="gray40") +
geom_point(data=mmzep, aes(x= (date), y=(Mar)), size =3, colour = "red") +
geom_line(data=meanM, aes(group = 1, x = (Year), y = (price_mean)), colour="gray40") +
geom_line(data=mmzep, aes(x = (date), y = (Mar)), colour="red") +
stat_smooth(data=meanM,aes(group = 1, x = (Year), y = (price_mean)),
method = "lm", size = 1, se = FALSE, formula = y ~ x,
colour = "black") +
stat_smooth(data=mmzep, aes(x = (date), y = (Mar)),
method = "lm", size = 1, se = FALSE, formula = y ~ x,
colour = "red3") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 1500)) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks.length = unit(-0.25, "lines"),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
axis.text.x = element_text(margin = margin(t = 0.25, unit = "cm")),
axis.text.y = element_text(margin = margin(r = 0.25, unit = "cm"))) +
labs(y = expression(March),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
I plot g5 and g6 in the same way, than the grid, to obtain this:
enter image description here
As you can see the x axis is not correct, I tried scale_x_date(breaks="year", labels=date_format("%Y")) , scale_x_discrete(labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019")), scale_x_continuous in different ways.
I also tried mmzep$date <- as.Date(mmzep$date,"%Y") but I saw the R needs a day (in my case a day and a month?) mmzep$date <- as.Date(paste("01", mmzep$date, sep="/"), "%d/%m/%Y") , but R substitutes the years with NA. I think that the errors is in the the way R see the date in mmzep, but I don't understand how can I made R recognized the correct object.
Anyone have any suggestion? Thanks in advance!
There are a few ways to do this. In your data, your year values are stored as type double. This tells ggplot that you have a continuous variable. If you want to leave your data as is, then the solution is
+ scale_x_continuous(breaks = seq(2010, 2020, 2))
# or something else that expressly lists the years you want to see on the axis.
You cannot use scale_x_date without your year data being converted to a date. You can do that with, for example
MeanM$Year <- as.Date(paste(MeanM$Year, "01", "01", sep = "/"))
Then you can use
+ scale_x_date(date_labels = "%Y")
Or you can convert your years into discrete data with factor. You cannot use scale_x_discrete on a continuous variable.
MeanM$Year <- factor(MeanM$Year)
And then use
+ scale_x_discrete()
Try this approach tested on MeanM without using mmzep which we do not have data. The issue is that as you are using multiple geom the functions are adding strange labels to axis. Changing all x-axis variables to factor can alleviate the issue. In the case of mmzep with aes(x= (date),..) also be careful on formating the date as year with a code like this aes(x= factor(format(date,'%Y')) so that all labels fit well into axis. Here the code:
#Code
ggplot() +
geom_point(data=meanM, aes(x = factor(Year), y = (price_mean)),size = 3, colour="gray40") +
geom_line(data=meanM, aes(group = 1, x = factor(Year), y = (price_mean)), colour="gray40") +
stat_smooth(data=meanM,aes(group = 1, x = factor(Year), y = (price_mean)),
method = "lm", size = 1, se = FALSE, formula = y ~ x,
colour = "black") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 1500)) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks.length = unit(-0.25, "lines"),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
axis.text.x = element_text(margin = margin(t = 0.25, unit = "cm")),
axis.text.y = element_text(margin = margin(r = 0.25, unit = "cm"))) +
labs(y = expression(March),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
Output:
Some data used:
#Data
meanM <- structure(list(Year = c(2010L, 2012L, 2015L, 2017L, 2018L, 2019L
), Month = c(3L, 3L, 3L, 3L, 3L, 3L), date_mean = c("23/03/2010",
"27/03/2012", "16/03/2015", "15/03/2017", "16/03/2018", "14/03/2019"
), price_mean = c(1082L, 858L, 603L, 674L, 318L, 840L), date_sd = c(5.48,
2.74, 8.86, 9.65, 9.09, 9.42), price_sd = c(685L, 333L, 411L,
512L, 202L, 329L), Year2 = structure(1:6, .Label = c("2010",
"2012", "2015", "2017", "2018", "2019"), class = "factor")), row.names = c(NA,
-6L), class = "data.frame")

Ordering a subset of columns by date r

I have a data frame which part of the columns are not in the correct order (they are dates). See:
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
"1988.01.19" = c(0.2, 0.22,0.32),
"1986.06.19" = c(0.18, 0.21,0.23),
"1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
> data1989
date_fire Foresttype meanSolarRad meanRainfall meanTemp 1988.01.01 1986.06.03 1986.10.19 1988.01.19 1986.06.19 1987.10.19 1988.01.19 1986.06.19 1987.10.19
1 1987-02-01 oak 500 600 14 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
2 1987-07-03 pine 550 300 15 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
3 1988-01-01 oak 450 450 12 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
I would like to order the columns by increasing date, and keep the first 5 columns the same. Keep in mind that in my original dataset I have 30 initial columns to be kept the same.
As commented, try to avoid wide formatted data with columns that contain data elements such as dates, category values, other indicators. Instead use long-formatted, tidy data where ordering is much easier including aggregation, merging, plotting, and modeling.
Specifically, consider reshape to melt dates into one field such as quarter with value. Then order quarter column easily:
# RESHAPE WIDE TO LONG
long_data1989 <- reshape(data1989, varying = names(data1989)[6:ncol(data1989)],
times = names(data1989)[6:ncol(data1989)],
v.names = "value", timevar = "quarter", ids = NULL,
new.row.names = 1:1E4, direction = "long")
# ORDER DATES AND RESET row.names
long_data1989 <- `row.names<-`(with(long_data1989, long_data1989[order(date_fire, quarter),]),
NULL)
long_data1989
Online Demo
If you wanted to use dplyr here is an alternative. Note each colname would have to be unique. In you df there were some duplicate ones
library(dplyr)
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
# "1988.01.19" = c(0.2, 0.22,0.32),
# "1986.06.19" = c(0.18, 0.21,0.23),
# "1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
# Sort date column names. replace 6 with first date column
sorted_colnames = sort(names(data1989)[6:ncol(data1989)])
# Sort columns. Replace 5 with last non-date column
data1989 %>%
select(1:5, sorted_colnames)
We can convert the column names that are dates to Date class, do the order and then use that as column index
i1 <- grep('^\\d{4}\\.\\d{2}\\.\\d{2}$', names(data1989))
data1989[c(seq_len(i1[1]-1), order(as.Date(names(data1989)[i1], "%Y.%m.%d")) + i1[1]-1)]
# date_fire Foresttype meanSolarRad meanRainfall meanTemp 1986.06.03 1986.06.19 1986.06.19.1 1986.10.19 1987.10.19
#1 1987-02-01 oak 500 600 14 0.560 0.100 0.18 0.80 0.15
#2 1987-07-03 pine 550 300 15 0.447 0.550 0.21 NA 0.12
#3 1988-01-01 oak 450 450 12 0.750 0.811 0.23 0.83 0.78
# 1987.10.19.1 1988.01.01 1988.01.19 1988.01.19.1
#1 0.21 0.500 0.75 0.20
#2 0.24 0.589 0.65 0.22
#3 0.25 0.660 0.75 0.32
Base R solution (similar to #Parfaits):
# Reshape dataframe wide --> long:
df_long <-
reshape(data1989,
direction = "long",
varying = which(!(is.na(as.Date(names(data1989), "%Y.%m.%d")))),
idvar = which(is.na(as.Date(names(data1989), "%Y.%m.%d"))),
v.names = "value",
times = na.omit(as.Date(names(data1989), "%Y.%m.%d")),
timevar = "date_surveyed",
new.row.names = 1:(nrow(data1989)*length(na.omit(as.Date(names(data1989),
"%Y.%m.%d")))))
# Order the data frame and reset the index:
ordered_df_long <- data.frame(df_long[with(df_long, order(date_fire, date_surveyed)),],
row.names = NULL)

How to fix my geom_text labels so that they fit inside the bars of my column chart?

I have the following R codes running in RStudio:
df2<-df1 %>%
gather(year, value, X2016:X2019) %>%
mutate(Mth = Mth %>% fct_rev() %>% fct_relevel('January')) %>%
group_by(Mth) %>%
mutate(y_pos = min(value) / 2)
df2$Mth <- as.character(df2$Mth)
df2$Mth <- factor(df2$Mth, levels=unique(df2$Mth))
df2$AsAt2 = factor(df2$AsAt, levels=c("D-150", "D-120", "D-90", "D-60", "D-30"))
g1<-df2 %>% filter(value!=0)%>%
ggplot(aes(
x = Mth,
y = value,
fill = Mth,
group = year
)) +
geom_col(
position = position_dodge(.65),
width = .5
) +
geom_text(aes(
y = value + max(value) * .03,
label = round(value * 100) %>% str_c('%')
),
position = position_dodge(.65), size=3.5
) +
geom_text(aes(
y = y_pos,
label = str_remove(year, 'X')
),
color = 'white',
angle = 90,
fontface = 'bold',
position = position_dodge(0.65), size=3.5
) +
scale_y_continuous(
breaks = seq(0, .9, .1),
labels = function(x) round(x * 100) %>% str_c('%')
) +
scale_fill_manual(values = c(
rgb(47, 85, 151, maxColorValue = 255),
rgb(255, 51, 51, maxColorValue = 255),
rgb(84, 130, 53, maxColorValue = 255),
rgb(244, 177, 131, maxColorValue = 255),
rgb(112, 48, 160, maxColorValue = 255)
)) +
theme(
plot.title = element_text(hjust = .5),
panel.background = element_blank(),
panel.grid.major.y = element_line(color = rgb(.9, .9, .9)),
axis.ticks = element_blank(),
legend.position = 'none'
) +
xlab('') +
ylab('') +
ggtitle('January to May')
g1 + facet_grid(rows = vars(AsAt2))
The output is shown below:
My issue is that the Year labels inside the bar disappear in some cases. I have tried reducing the font size to 3.5 but I am getting the same issue.
Is there a way to ensure that the labels fit inside all the bars?
Note: I am also adding an extract of the tibble df2.
>df2
# A tibble: 100 x 6
# Groups: Mth [5]
Mth AsAt year value y_pos AsAt2
<fct> <chr> <chr> <dbl> <dbl> <fct>
1 January D-150 X2016 0.26 0.12 D-150
2 February D-150 X2016 0.25 0 D-150
3 March D-150 X2016 0.27 0 D-150
4 April D-150 X2016 0.290 0 D-150
5 May D-150 X2016 0.27 0 D-150
6 January D-120 X2016 0.38 0.12 D-120
7 February D-120 X2016 0.25 0 D-120
8 March D-120 X2016 0.36 0 D-120
9 April D-120 X2016 0.35 0 D-120
10 May D-120 X2016 0.31 0 D-120
I will answer my own question here. I tried fiddling with the following codes:
df2<-df1 %>%
gather(year, value, X2016:X2019) %>%
mutate(Mth = Mth %>% fct_rev() %>% fct_relevel('January')) %>%
group_by(Mth) %>%
mutate(y_pos = min(value) / 2)
and replaced the last line with the following:
mutate(y_pos = max(value) / 6.0)
The 6.0 was after some trial and error starting with 3.0.

ggplot: error bars do not appear when dodging

I'm having a horrible time getting errors bars to plot correctly. Is something involving the overlap function(dodging) causing trouble?
Data:
mean mean_b se se.1 seb seb.1 ID
1 0.52 0.20 0.137 0.137 0.015 0.015 1
2 0.17 0.20 0.062 0.062 0.016 0.016 2
3 0.46 0.60 0.078 0.078 0.006 0.006 3
4 0.34 0.11 0.134 0.134 0.005 0.005 4
5 0.22 0.10 0.066 0.066 0.004 0.004 5
6 0.62 0.14 0.083 0.083 0.003 0.003 6
7 0.11 0.29 0.133 0.133 0.065 0.065 7
8 0.51 0.44 0.113 0.113 0.026 0.026 8
9 0.41 0.50 0.082 0.082 0.009 0.009 9
# grab data for data A
df_m <- data[ , c(7, 1, 3, 4)]
df_m$comp <- "Initial Occupancy"
names(df_m) <- c("ID", "avg", "lower", "upper", "comp")
# grab data for data B
df_f <- data[ , c(7, 2, 5, 6)]
df_f$comp <- "Equilibrium Occupancy"
names(df_f) <- c("ID", "avg", "lower", "upper", "comp")
# bind the data together
df <- rbind(df_m, df_f)
# plot
ggplot(data = df, aes(x = ID, y = avg, ymin = lower, ymax = upper, colour = comp)) +
geom_point(position = position_dodge(width = 0.4)) +
geom_errorbar(position = position_dodge(width = 0.4), width = .3) +
coord_flip() +
scale_colour_manual(values = c("blue", "red")) +
theme_bw() +
theme(panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
data=read.csv()
# grab data for males
df_m <- data[ , c(12, 1, 3)]
df_m$comp <- "Initial Occupancy"
names(df_m) <- c("ID", "avg", "se", "comp")
df_m
# grab data for females
df_f <- data[ , c(12, 2, 5)]
df_f$comp <- "Equilibrium Occupancy"
names(df_f) <- c("ID", "avg", "se", "comp")
df_f
# bind the data together
df <- rbind(df_m, df_f)
# plot
ggplot(data = df, aes(x = ID, y = avg, ymin = avg-se, ymax = avg+se, colour = comp)) +
geom_point(position = position_dodge(width = 0.4),pch=21) +
geom_errorbar( position = position_dodge(width = 0.4), width = .3) +
coord_flip() +
scale_colour_manual(values = c("blue", "red")) +
#theme_classic()
theme_bw() +
theme(panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
Thank you jlhoward!

ggplot chart, x=date, y= value, value

I have this testdata :
date cpu_user cpu_id test1 test2 test3 test4
1 1386716402 U U U U U 31
2 1386716702 0 0.06 99.95 0.02 91.93 29
3 1386717002 0.01 0.04 99.97 0.03 19.46 29
4 1386717302 0.01 0.05 99.96 0.04 92.54 29
5 1386717602 0 0.04 99.97 0.04 U 29
6 1386717902 0 0.05 99.96 0.02 99.86 29
I want for example a freqpoly chart with date at x and the other(cpu_uder, cpu_id, ....) at y. Have someone an idea?
Thanks and best Regards!
d <- read.table(text=readClipboard(), header=TRUE, stringsAsFactors = T,
na.strings = 'U')
df <- melt(d, id.var='date')
ggplot(aes(x=date, y=value), data = df) +
geom_bar(aes(fill = variable), stat = 'identity', position = 'dodge')
or
ggplot(aes(x=factor(date), y=value), data = df) +
geom_bar(stat = 'identity', position = 'dodge') +
facet_grid(variable~., scales = 'free_y', drop = F) +
theme(axis.text.x = element_text(angle = 45, vjust = 1.1, hjust = 1.05))

Resources