How do I correctly connect data points ggplot - r

I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.

Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

ggplot geom_col: making certain axis count integers rather than summing

I am currently making a hate crime case study. For my plot I am using one zip-code as my y-axis and plotting how many crimes and what group is being targeted on the x-axis using geom-col. The problem is my y-axis is adding the zip-codes together rather than counting each frequency of how many times the zip-code shows up. Here is my dataset looks like:
structure(list(ID = 1:5, CRIME_TYPE = c("VANDALISM", "ASSAULT", "VANDALISM", "ASSAULT",
"OTHER"), BIAS_MOTIVATION_GROUP = c("ANTI-BLACK ",
"ANTI-BLACK ", "ANTI-FEMALE HOMOSEXUAL (LESBIAN) ",
"ANTI-MENTAL DISABILITY ", "ANTI-JEWISH "),
ZIP_CODE = c(40291L, 40219L, 40243L, 40212L, 40222L
)), row.names = c(NA, 5L), class = "data.frame")
Here is my code:
library(ggplot2)
df <- read.csv(file = "LMPD_OP_BIAS.csv", header = T)
library(tidyverse)
hate_crime <- df %>%
filter(ZIP_CODE == "40245")
hate_crime_plot <- hate_crime %>%
ggplot(., aes(x = BIAS_MOTIVATION_GROUP, y = ZIP_CODE, fill =
BIAS_MOTIVATION_GROUP)) +
geom_col() + labs(x = "BIAS_MOTIVATION_GROUP", fill = "BIAS_MOTIVATION_GROUP") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_plot)
hate_crime_ploter <- hate_crime %>%
ggplot(., aes(x = UOR_DESC, y = ZIP_CODE, fill =
UOR_DESC)) +
geom_col() + labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_ploter)
For full data visit here: visit site to download data set
Alright, I think you've got a couple issues here. What's happening in your code is you're asking ggplot to make a bar plot with a categorical variable (BIAS_MOTIVATION_GROUP and UOR_DESC) on the x-axis and a continuous variable (ZIP_CODE) on the y-axis. Since there are more than one row per x-y combination, ggplot adds things together by x value, which is what you'd expect out of a bar plot. Long story short, I wonder if what you actually want is a histogram here. Your dataset (hate_crime) only has one value of ZIP_CODE, so I'm not sure what plotting ZIP on the y-axis is supposed to visualize. A histogram would look like this:
hate_crime %>%
ggplot(., aes(x = UOR_DESC, , fill = UOR_DESC)) +
geom_histogram(stat = "count") +
labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
If, instead, you're trying to visualize how often each ZIP code shows up in each category, you'd have to approach things differently. Perhaps you're looking for something like this?
df %>%
ggplot(aes(x = UOR_DESC, fill = factor(ZIP_CODE))) +
geom_histogram(stat = "count") +
theme(axis.text.x=element_text (angle =45, hjust =1))

Show only data labels for every N day and highlight a line for a specific variable in R ggplot

I'm trying to figure out two problems in R ggplot:
Show only data labels for every N day/data point
Highlight (make the line bigger and/or dotted) for a specific variable
My code is below:
gplot(data = sales,
aes(x = dates, y = volume, colour = Country, size = ifelse(Country=="US", 1, 0.5) group = Country)) +
geom_line() +
geom_point() +
geom_text(data = sales, aes(label=volume), size=3, vjust = -0.5)
I can't find out a way how to space the data labels as currently they are being shown for each data point per every day and it's very hard to read the plot.
As for #2, unfortunately, the size with ifelse doesn't work as 'US' line is becoming super huge and I can't change that size not matter what I specify in the first parameter of ifelse.
Would appreciate any help!
As no data was provided the solution is probably not perfect, but nonetheless shows you the general approach. Try this:
sales_plot <- sales %>%
# Create label
# e.g. assuming dates are in Date-Format labels are "only" created for even days
mutate(label = ifelse(lubridate::day(dates) %% 2 == 0, volume, ""))
ggplot(data = sales_plot,
# To adjust the size: Simply set labels. The actual size is set in scale_size_manual
aes(x = dates, y = volume, colour = Country, size = ifelse(Country == "US", "US", "other"), group = Country)) +
geom_line() +
geom_point() +
geom_text(aes(label = label), size = 3, vjust = -0.5) +
# Set the size according the labels
scale_size_manual(values = c(US = 2, other = .9))

How to graph two sets of data with lines and two *different* point symbols with *distinguishable* data point symbols in legend?

I have been trying to plot a graph of two sets of data with different point symbols and connecting lines with different colors using the R package ggplot2, but for the life of me, I have not been able to get the legend correctly distinguish between the two curves by showing the associated data point symbol for each curve.
I can get the legend to show different line colors. But I have not been able to make the legend to show different data point symbols for each set of data.
The following code:
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
library("ggplot2")
print(
ggplot(data = df, aes(df$thrd_cnt, y=df$runtime, color=)) +
geom_line(aes(y=df$runtime4, color = "4 cores")) +
geom_point(aes(y=df$runtime4, color = "4 cores"), fill = "white",
size = 3, shape = 21) +
geom_line(aes(y=df$runtime8, color = "8 cores")) +
geom_point(aes(y=df$runtime8, color = "8 cores"), fill = "white",
size = 3, shape = 23) +
xlab("Number of Threads") +
ylab(substitute(paste("Execution Time, ", italic(milisec)))) +
scale_x_continuous(breaks=c(1,2,4,8,16)) +
theme(legend.position = c(0.3, 0.8)) +
labs(color="# cores")
)
## save a pdf and a png
ggsave("runtime.pdf", width=5, height=3.5)
ggsave("runtime.png", width=5, height=3.5)
outputs this graph:
plot
But the data point symbols in the legend are not distinguishable. The legend shows the same symbol for both graphs (which is formed of both data point symbols on top of each other).
One possible solution is to define the number of threads as a factor, then I might be able to get the data point symbols on the legend right, but still I don't know how to do that.
Any help would be appreciated.
As noted, you need to gather the data into a long format so you can map the cores variable to colour and shape. To keep the same choices of shape and fill as in your original plot, use scale_shape_manual to set the shape corresponding to each level of cores. Note that you need to set the name for both the colour and shape legends in labs() to ensure they coincide and don't produce two legends. I also used mutate so that the levels of cores don't confusingly include the word runtime.
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
library(tidyverse)
ggplot(
data = df %>%
gather(cores, runtime, runtime4, runtime8) %>%
mutate(cores = str_c(str_extract(cores, "\\d"), " cores")),
mapping = aes(x = thrd_cnt, y = runtime, colour = cores)
) +
geom_line() +
geom_point(aes(shape = cores), size = 3, fill = "white") +
scale_x_continuous(breaks = c(1, 2, 4, 8, 16)) +
scale_shape_manual(values = c("4 cores" = 21, "8 cores" = 23)) +
theme(legend.position = c(0.3, 0.8)) +
labs(
x = "Number of Threads",
y = "Execution Time (millisec)",
colour = "# cores",
shape = "# cores"
)
Created on 2018-04-10 by the reprex package (v0.2.0).
or shape is fine too, and if you're doing more stuff with df, might make sense to convert and keep it in long, 'tidy' format.
library("ggplot2")
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
df <- df %>% gather("runtime", "millisec", 2:3)
ggplot(data = df, aes(x = thrd_cnt, y = millisec, color = runtime, shape =
runtime)) + geom_line() + geom_point()
after gathering into a "long" formatted data frame, you pass colour and shape (pch) to the aesthetics arguments:
library(tidyverse)
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
df %>% gather(key=run, value=time, -thrd_cnt) %>%
ggplot(aes(thrd_cnt, time, pch=run, colour=run)) + geom_line() + geom_point()
(Notice how brief the code is, compared to the original post)

R: prevent break in line showing time series data using ggplot geom_line

Using ggplot2 I want to draw a line that changes colour after a certain date. I expected this to be be simple, but I get a break in the line at the point the colour changes. Initially I thought this was a problem with group (as per this question; this other question also looked relevant but wasn't quite what I needed). Having messed around with the group aesthetic for 30 minutes I can't fix it so if anybody can point out the obvious mistake...
Code:
require(ggplot2)
set.seed(1111)
mydf <- data.frame(mydate = seq(as.Date('2013-01-01'), by = 'day', length.out = 10),
y = runif(10, 100, 200))
mydf$cond <- ifelse(mydf$mydate > '2013-01-05', "red", "blue")
ggplot(mydf, aes(x = mydate, y = y, colour = cond)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()
If you set group=1, then 1 will be used as the group value for all data points, and the line will join up.
ggplot(mydf, aes(x = mydate, y = y, colour = cond, group=1)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()

Resources