Hi I have this input file
http://www.mediafire.com/file/a4yda7zmwvpd9zi/data.xlsx/file
MIN and MAX columns are Date type in the xls file and also after as.Date the Class is Date, the type is double as it should be.. but when I run the following code
library(ggplot2)
library(readxl)
out <- read_xlsx("C:/data.xlsx")
out
out$MIN <- as.Date(out$MIN)
out$MAX <- as.Date(out$MAX)
class(out$MIN)
#out$MIN <- as.Date(out$MIN, format = "%d/%m/%Y")
library(dplyr)
out %>%
group_by(SEX) %>%
tidyr::gather(key, value, -SEX) %>%
ggplot(aes(SEX, value)) +
geom_line(aes(color = SEX)) +
coord_flip() +
scale_y_date(breaks = c(out$MIN, out$MAX)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
out
I keep getting the error date_trans works with objects of class Date only
I tried different formats.. even changed the excel original data types to different date formats but keep getting the same error..
You could try:
out %>%
tidyr::gather(key, value, -one_of(c("ID", "SEX"))) %>%
ggplot(aes(ID, value, group = ID)) +
geom_line(aes(color = SEX)) +
coord_flip() +
scale_y_date(breaks = seq(min(out$MIN), max(out$MAX), by = "year")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Though I doubt the usefulness of such charts, and how you'd solve the scale for dates (I made a sequence of years from the minimum to maximum), but that's up to you.
Related
With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
I am trying to fetch some code from webpage https://fgeerolf.com/data/oecd/ULC_QUA.html#labour_income_share_(real_ulc)_(total_economy) and replicate the code, but I encountered the error message "function and object not found".
The code given is:
ULC_QUA %>%
filter(MEASURE == "IXOBTE",
SUBJECT == "ULQBBU99",
LOCATION == "DEU") %>%
left_join(ULC_QUA_var$SECTOR, by = c("SECTOR" = "id")) %>%
rename(SECTOR_desc = label) %>%
year_to_date %>%
arrange(SECTOR_desc) %>%
ggplot() +
geom_line(aes(x = date, y = obsValue, color = SECTOR_desc, linetype = SECTOR_desc)) +
scale_color_manual(values = viridis(9)[1:8]) +
theme_minimal() +
scale_x_date(breaks = seq(1920, 2025, 2) %>% paste0("-01-01") %>% as.Date,
labels = date_format("%y")) +
theme(legend.position = c(0.8, 0.3),
legend.title = element_blank()) +
scale_y_continuous(breaks = seq(0, 200, 10)) +
ylab("Labour Income Share (Real ULC) (Total Economy)") + xlab("")
And I got
Error in year_to_date(.) : could not find function "year_to_date"
Eventually I want to generate the following plot:
First of all, I think I need to read the original data from the source but I don't know the location and how to import the data. Is there anyway I can replicate the plot without any further information?
Any help would be much appreciated!
I have a ggplot with facets and colors. The colors are related to "ID" and the columns of the facets are related to "Type". One ID is always in the same Type but there are a different numbers of IDs in each Type. I would like to reset the colors with each column of the facets to have a bigger difference in the colors.
ggplot(data = plt_cont_em, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(ID))) +
facet_grid(Schadstoff_ID ~ Type, scales = "free_y")
Now it looks like:
I understand, that I have to introduce a dummy var for the color. But is there an easy way of numerating the IDs in each Type, starting in each Type with 1?
Since the data is confidential, I created dummy data that shows the same problem.
ID<-c()
Type<-c()
Jahr<-c()
Schadstoff<-c()
Monat<-c()
Konz<-c()
for (i in 1:25){
#i = ID
#t = Type
t<-sample(c("A","B","C"),1)
for (j in 1:5){
#j = Schadstoff
if(runif(1)<0.75){
for(k in 2015:2020){
#k = Jahr
for(l in 1:12){
#l = Monat
if(runif(1)<0.9){
ID<-c( ID,i)
Type<-c( Type,t)
Jahr<-c( Jahr,k)
Schadstoff<-c( Schadstoff,j)
Monat<-c( Monat,l)
Konz<-c( Konz,runif(1))
}
}
}
}
}
}
tmp<- data.frame(ID,Type, Jahr, Schadstoff, Monat, Konz)
tmp<-tmp %>% group_by( Type) %>% mutate( Color=row_number())
p<-ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color=factor(Color)), size=0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
p
Problem still exists, that the grouping doesn't work and Color is unique for each line.
Using dplyr you can group_by Type and create a new column with the dense_rank of the ID inside each group:
plt_cont_em %>%
group_by(Type) %>%
mutate(Type_ID = dense_rank()) %>%
ggplot() +
...
This will 'rank' each ID from smallest to biggest inside the group, keeping records with the same ID with the same value.
You will probabily then want to exclude the legend, as it'll have little sense.
library(dplyr)
library(ggplot2)
# Using provided random data
tmp <- tmp %>%
group_by(Type) %>%
mutate(Color = dense_rank(ID))
ggplot(data = tmp, aes(x = Jahr, y = Konz)) +
geom_point(aes(color = factor(Color)), size = 0.8) +
facet_grid(Schadstoff ~ Type, scales = "free") +
theme_light() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Created on 2020-04-02 by the reprex package (v0.3.0)
I'd like to make a plot with every day labeled on the x axis.
Here is my data
my_data <- read.table(text="day value
11/15/19 0.23633
11/16/19 0.28485
11/17/19 0.63127
11/18/19 0.15434
11/19/19 0.47964
11/20/19 0.65967
11/21/19 0.48741
11/22/19 0.84541
11/23/19 0.10123
11/24/19 0.78169
11/25/19 0.23189
11/26/19 0.86665
11/27/19 0.55184
11/28/19 0.81410
11/29/19 0.25821
11/30/19 0.23576
12/1/19 0.46397
12/2/19 0.55764
12/3/19 0.95645
12/4/19 0.63954
12/5/19 0.76766
12/7/19 0.74505
12/8/19 0.65515
12/9/19 0.58222
12/10/19 0.17294", header=TRUE, stringsAsFactors=FALSE)
Here is my code
my_data %>%
ggplot(aes(day, value)) +
geom_line() +
scale_x_continuous(breaks = seq(1, nrow(my_data)),
labels = my_data$day)
It gives me this error: Error in as.Date.numeric(value) : 'origin' must be supplied
I'd like to make it so that every day is represented on the x axis and by default it only does a few of the days that are included in this range of data.
Try to use scale_x_date instead of scale_x_continuous
my_data %>%
ggplot(aes(x = mdy(day), value)) +
geom_line() +
scale_x_date(date_breaks = "1 day")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
You can use lubridate to convert the data to proper date format:
library(lubridate)
my_data %>%
mutate(day = mdy(day)) %>%
ggplot(aes(day, value)) +
geom_line() +
scale_x_date(date_breaks = "1 day") +
theme(axis.text.x = element_text(angle=90, vjust=0.5))
I am looking to create a cycle plot of hours within months. I am hoping it will look something like the plot bellow. I am aiming for the plot to indicate mean temperature for each month with a horizontal line, and then within each month have the graph show the temperature fluctuations across the typical day of that month. I was trying to use monthplot() but it doesn't seem to be working:
library(nycflights13)
tempdata <- weather %>% group_by(hour)
monthplot(tempdata, labels = NULL, ylab = "temp")
It keeps saying argument is not numeric or logical: returning NA but I am not sure where the code is going wrong.
Hope that this ggplot2 solution will work:
library(nycflights13)
library(ggplot2)
library(dplyr)
# Prepare data
tempdata <- weather %>%
group_by(month, day) %>%
summarise(temp = mean(temp, na.rm = TRUE))
meanMonth <- tempdata %>%
group_by(month) %>%
summarise(temp = mean(temp, na.rm = TRUE))
# Plot using ggplot2
ggplot(tempdata, aes(day, temp)) +
geom_hline(data = meanMonth, aes(yintercept = temp)) +
geom_line() +
facet_grid(~ month, switch = "x") +
labs(x = "Month",
y = "Temperature") +
theme_classic() +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line.x = element_blank())
temp has a missing value which causes an error. You also need to set the times and phase arguments.
library(nycflights13)
# Find mean value : this automatically removes the observation with missing data that was causing an error
tempdata <- aggregate(temp ~ month + day, data=weather, mean)
with(tempdata, monthplot(temp, times=day , phase=month, ylab = "temp"))