I have a point plot with two different points on each category and I want to create a line segment joining the two points on each row.
items %>%
group_by(category) %>%
summarise(med_buy_price = mean(buy_value, na.rm = TRUE),
med_sell_price = mean(sell_value, na.rm = TRUE)) %>%
pivot_longer(cols = c("med_buy_price", "med_sell_price"),
names_to = "measure",
values_to = "value") %>%
ggplot(aes(x = value, y = category)) +
geom_point(aes(color = measure), size = 3)
For creating a line segment, you need to have start and endpoints for the segment. Thus, you can stay with the wide format, so no pivot_longer needed.
Then create individual geom_point for sell and buy value and a geom_segment combining both points.
This code will work:
library(ggplot2)
library(dplyr)
library(tibble)
library(tidyr)
items <- tribble(
~category, ~buy_value, ~sell_value,
"Wallpaper", 2000, 5200,
"Usables", 500, 12500,
"Umbrellas", 200, 1800
)
items %>%
group_by(category) %>%
summarise(med_buy_price = mean(buy_value, na.rm = TRUE),
med_sell_price = mean(sell_value, na.rm = TRUE)) %>%
ggplot() +
geom_point(aes(x = med_buy_price, y = category), size = 3, color = "red")+
geom_point(aes(x = med_sell_price, y = category), size = 3, color = "green")+
geom_segment(aes(x = med_buy_price, xend = med_sell_price, y = category, yend = category))
If you do not insist on using geom_point you could try geom_errorbar which simplifies thing a little bit
items %>%
group_by(category) %>%
summarise(med_buy_price = mean(buy_value, na.rm = TRUE),
med_sell_price = mean(sell_value, na.rm = TRUE)) %>%
ggplot(aes(xmin=med_buy_price,xmax=med_sell_price, y = category)) +
geom_errorbar(width=0.1)
Related
I want to do an interactive scatterplot where I can
highlight individual points
a tooltip shows me the id
search for specific id with a selectize widget
I tried for some time with plotly and ended up with this code
library(tidyverse)
library(plotly)
set.seed(1)
dat <- tibble(id = LETTERS[1:10],
trt = factor(rep(0:1, 5)),
x = rnorm(10),
y = x + rnorm(10, sd = 0.2)) %>%
highlight_key(~id)
dat %>%
{ggplot(., aes(x = x, y = y, group = id, color = trt)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed")} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
It took my very long to understand that the order of geoms seems to be important
## no color, geom order reversed
## selectize.js widget is completely missing
dat %>%
{ggplot(., aes(x = x, y = y, group = id)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_point()} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
## color by trt, geom order reversed
## selectize.js widget only works for data where t = 0
dat %>%
{ggplot(., aes(x = x, y = y, group = id, color = trt)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_point()} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
Can somebody explain this strange behavior? What if I would like to reverse the order of geoms i.e. hline ploted behind points?
I'm using ggplot2 to do an histogram for two weight variables in my dataframe. The dataframe has two columns, a column with case name caso and a value column named peso. I have 3000 cases for each, and when I put the histograms side by side with facet_wrapoption they show correctly:
df |>
pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
ggplot(aes(x = peso, colour= caso, fill = caso))+
geom_histogram(alpha = 0.4) +
facet_wrap(~caso)
But when I try to overlap the two histograms on the same frame, the first one seems to have double cases number, the histograms are unequal in size:
df |>
pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
ggplot(aes(x = peso, colour= caso, fill = caso))+
geom_histogram(alpha = 0.4)
I don't know what I'm doing wrong. Any advice? Thanks in advance!
Juan
The default of geom_histogram is to stack multiple series. The "identity" position scheme should fix this:
library(tidyverse)
df <- data.frame(peso = rnorm(1000, 250, 10),
peso2 = rnorm(1000, 260, 10))
df %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, fill = name)) +
geom_histogram(position = "identity", alpha = 0.5)
# geom_histogram(position = position_identity(), alpha = 0.5) # alternate syntax
From the help for ?geom_histogram, it looks like the "Usage" section shows that as the default. FWIW, geom_freqpoly defaults to "identity."
geom_histogram(
mapping = NULL,
data = NULL,
stat = "bin",
position = "stack", #### HERE
...,
binwidth = NULL,
bins = NULL,
na.rm = FALSE,
orientation = NA,
show.legend = NA,
inherit.aes = TRUE
)
I have some troubles with my code. I'm very very beginner in R, so I would like some help. I have a dataframe and I need to make an hist chart and then highlight some points. But I cannot understand how to find those points in my dataset. Here is and example of what I have.
x <- c("a","b","c","d","f","g","h","i","j","k")
y <- c(197421,77506,130474,18365,30470,22518,70183,15378,29747,11148)
z <- data.frame(x,y)
hist(z$y)
For example, how can I find in the hist where is the "a" and "h" value placed? and in a barplot? I tried the function points, but I cannot find the coordinates. Please let me know how could I make that . Thanks in advance.
Here is a way with dplyr and ggplot2. The approach is to cut the y variable into bins and then use summarise to create the counts and the labels.
library(dplyr)
library(ggplot2)
z %>%
mutate(bins = cut(y, seq(0, 200000, 50000))) %>%
group_by(bins) %>%
summarise(xes = paste0(x, collapse = ", "),
count = n()) %>%
ggplot() +
geom_bar(aes(x = bins, y = count), stat = "identity", color = "black", fill = "grey") +
geom_text(aes(x = bins, y = count + 0.5, label = xes)) +
xlab("y")
Here is a more complicated way that makes a plot that looks more like what hist() produces.
z2 <- z %>%
mutate(bins = cut(y, seq(0, 200000, 50000))) %>%
group_by(bins) %>%
summarise(xes = paste0(x, collapse = ", "),
count = n()) %>%
separate(bins, into = c("start", "end"), sep = ",") %>%
mutate(across(start:end, ~as.numeric(str_remove(., "\\(|\\]"))))
ggplot() +
geom_histogram(data = z, aes(x = y), breaks = seq(0, 200000, 50000),
color = "black", fill = "grey") +
geom_text(data = z2, aes(x = (start + end) / 2, y = count + 0.5, label = xes))
I've spent the past few days looking through so many forums and sites, so I hope you can help.
You can find the data I've been using here, as well as the three model predictions.
I'm predicting subjective well-being (i.e. positive affect, negative affect, and life satisfaction) from last night's person-centered sleep satisfaction. I came up with three models that I now want to plot next to each other. The problem is that facet_wrap puts the models next to each other alphabetically and not how I want them (positive affect, negative affect, and life satisfaction).
You can view my current graph here
This is my code to get the graph going:
library("afex")
library("tidyverse")
library("tidylog")
theme_set(theme_bw(base_size = 15))
library("sjPlot")
d3 <- read.csv("d3.csv")
d3 <- d3 %>%
group_by(ID) %>%
mutate(SD_person_centred = sleepDur - mean(sleepDur, na.rm = TRUE)) %>%
mutate(sleep_satisfaction_person_centred = Sleep_quality_open - mean(Sleep_quality_open, na.rm = TRUE)) %>%
mutate(MS_person_centred = mid_sleep_modified - mean(mid_sleep_modified, na.rm = TRUE)) %>%
mutate(MS_person_freeday_centred = abs(mid_sleep_modified -
mean(mid_sleep_modified[Routine_work_day_open == "No"], na.rm = TRUE))) %>%
mutate(MS_person_mctq_centred = abs(mid_sleep_modified - MCTQ_MSF_number)) %>%
mutate(sleep_onset_person_centred = Sleep_Onset_open - mean(Sleep_Onset_open, na.rm = TRUE)) %>%
mutate(sleep_efficiency_person_centred = SleepEfficiency_act - mean(SleepEfficiency_act, na.rm = TRUE)) %>%
ungroup
m_p_sls_1 <- readRDS("m_p_sls_1.rds")
m_n_sls_1 <- readRDS("m_n_sls_1.rds")
m_s_sls_1 <- readRDS("m_s_sls_1.rds")
tmp <- get_model_data(m_p_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp$DV <- "positive_affect"
tmp2 <- get_model_data(m_n_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp2$DV <- "negative_affect"
tmp3 <- get_model_data(m_s_sls_1$full_model, type = "pred", terms = "sleep_satisfaction_person_centred")
tmp3$DV <- "life_satisfaction"
tmp <- bind_rows(tmp, tmp2, tmp3)
tmp
tmp$DV
Here I change tmp$DV into a factor as this was the solution I found online. However, this did not change anything:
tmp$DV <- factor(tmp$DV, levels=c("positive_affect","negative_affect","life_satisfaction"))
levels(tmp$DV)
This is my code for the graph:
variable_names <- list(
"positive_affect" = "positive affect" ,
"negative_affect" = "negative affect",
"life_satisfaction" = "life satisfaction"
)
variable_labeller <- function(variable,value){
return(variable_names[value])
}
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~DV, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
When I give the factor of tmp$DV a different name, i.e. tmp$facet and add this to my code, I do get the right order, but the scales are not free on the y-axis anymore. Please have a look here.
tmp$facet <- factor(tmp$DV, levels=c("positive_affect", "negative_affect", "life_satisfaction"))
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~facet, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
When I change pivot_longer to facet in the first row, I get the same graph as the one before.
Sorry for the long post, but I tried to be as clear as possible. Please let me know if I wasn't.
I'd appreciate any kind of hints. Thanks a lot for your time.
All the best,
Anita
Just got the answer from my colleague Henrik Singmann, in case anybody was wondering:
d3 %>%
pivot_longer(cols="positive_affect":"life_satisfaction", names_to = "DV", values_to = "Score") %>%
mutate(DV = factor(DV, levels=c("positive_affect","negative_affect","life_satisfaction"))) %>%
ggplot(aes(x = sleep_satisfaction_person_centred, y = Score)) +
geom_ribbon(data = tmp, aes(x = x, ymin = conf.low, ymax = conf.high, y = predicted),
fill = "lightgrey") +
geom_line(data = tmp, aes(x = x, y = predicted, group = 1)) +
geom_point(alpha = 0.2) +
facet_wrap(~DV, scales = "free_y",labeller=variable_labeller) +
labs(y = "Score",
x = "Sleep satisfaction person centered")
So the factor needs to be defined in d3 before being handed over to ggplot.
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
# Generate data
df_durations = data.frame(x = as.character(1:100)) %>%
mutate(linesize = runif(n = n()),
linesize = linesize / sum(linesize),
linesize = linesize / min(linesize)) %>%
mutate(start_x1 = as.Date(sample(as.Date('2018-01-01'):as.Date('2018-04-01'), size = n(), replace = T), origin='1970-01-01'),
end_x1 = start_x1 + 20 + sample(-5:5, size = n(), replace = T),
start_x2 = end_x1 + sample(-5:10, size = n(), replace = T),
end_x2 = start_x2 + 15 + sample(-5:5, size = n(), replace = T),
start_x3 = end_x2 + sample(-5:10, size = n(), replace = T),
end_x3 = start_x1 + 30 + sample(-10:10, size = n(), replace = T)) %>%
arrange(start_x1) %>%
mutate(x = factor(x, levels = x, ordered = T),
fontsize = round(runif(n(), 5, 12)),
colour = sample(c('black', 'red', 'blue'), n(), replace = T),
location = cumsum(fontsize))
# Pivot data for plotting
df_durations_long = df_durations %>%
gather(key, value, contains('start'), contains('end')) %>%
separate(key, c('id', 'activity'), '_') %>%
spread(id, value)
# Plot
df_durations_long %>%
ggplot(aes(x=location)) +
geom_linerange(aes(ymin=start, ymax=end, colour=activity, size=linesize), alpha=.5) +
scale_y_date(date_labels = '%b-%Y', date_breaks = '2 month') +
scale_x_continuous(breaks = df_durations$location, labels = df_durations$x) +
theme(axis.text.y = element_text(colour = df_durations$colour,
size = df_durations$fontsize)) +
coord_flip()
The above codes generates the following chart.
As you see we have both overlapping and non-overlappping horizontal bars (intended).
However there is also overlap between vertically stacked bars (not intended).
What I am looking for is a way to introduce an equal amount of spacing between each of the bars while keeping the relative widths of all the bars intact.
I tried working with scale_size_identity, however I haven't managed to get the formatting right.