ggplot2 how to get a monochromatic scale - r

I have a dataframe with plenty of columns, the first one is datetimes and the rest are a path of stock prices. I intend to plot them all at once so I can see what the overall simulation does. To do this, I used:
y %>%
melt(id.vars = "hora",
variable.name = "series" ) %>%
ggplot(aes(x = hora, y = value))+
geom_line(aes(color = series))+
xlab("")+
ylab("")+
theme(legend.position = "none")
Where y is my dataframe, "hora" is the datetime, "series" is each column of the original dataframe and "value" would be the stock price. This allowed me to plot all the stock prices at once.
You can replicate a similar dataset with:
library(tidyverse)
library(lubridate)
library(reshape2)
price.path <- function(p, t, meanr, sdr){
score <- c(0,rt(t, 3))
retorno <- meanr + sdr*score
prices <- p*exp(cumsum(retorno))
}
start_date <- today() + hms("9:30:00")
end_date <- today() + hms("16:00:00")
n <- interval(start_date,end_date)/minutes(5)-1
f <- start_date + 5*minutes(0:n)
p <- map(seq(1,100),
~price.path(
p = 20,
t = n,
meanr = 0.00005,
sdr = 0.0002))
y <- map_dfc(p, `[`)
y <- cbind(hora = f, y)
This worked as intended, but now I want to make this more pleasing to the eye. Since no particular path is relevant, I tried to give all lines the same color, but this broke the graph:
y %>%
melt(id.vars = "hora",
variable.name = "series" ) %>%
ggplot(aes(x = hora, y = value))+
geom_line(color = "red")+
xlab("")+
ylab("")+
theme(legend.position = "none")
(Any explanation onto why this happened is welcomed.)
I figured the best approach would be to use a monochromatic color scale (eg: colors ranging from a light blue to a darker blue), assigning one color in such scale to any individual path. But this was way harder than I expected, using scale_color_gradient() didn't work because
Error: Discrete value supplied to continuous scale
I found some color palettes that worked like viridis, but such color palettes did not mix well with my desired graph.
I found that scale_color_hue() gave something kinda similar to what I wanted, but it changed the hue while fixing the lightness, whereas I want to keep the hue while changing the hue.
Also, scale_color_grey() worked pretty well too, but I want my graph to have a little more color. It would be nice if I could add some parameter to this function so it adds a color:
Any kind of help would be appreciated, if the way I tried to plot all the columns in the dataframe is not efficient I would also like to know, thanks in advance.

Related

How to insert color code for two geom_step functions in the same grid

I am currently working in a comparison between two inventory levels and I want to plot two step graphs in the same grid with a color code. This is my code.
Intento1<-data.frame(Fecha, NivelI)
Intento2<-data.frame(Fecha, Nivel2)
#Printing the step graphs in one grid
ggplot()+geom_step(Intento1, mapping=aes(x=Fecha, y=NivelI))+geom_step(Intento2, mapping=aes(x=Fecha, y=Nivel2))
And it works fine plotting both graphs in the same grid, I could also add a different color to each graph but I couldn´t add the little colored labels that appear normally at the right. All support is appreciated.
For example data dummy,
dummy <- data.table(
Fecha = seq(as.Date("2020/1/1"), as.Date("2020/1/31"), "day")
)
dummy$NivelI = runif(31, 0, 10)
dummy$Nivel2 = runif(31, 0, 10)
plot using reshape2::melt like below will work.
dummy %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
In your case, to make dummy formed data, if Fecha, NivelI and Nivel2 are vectors, just try
df <- data.frame(
Fecha,
NivelI,
Nivel2
)
then
df %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
where "aaa" will be your legend name.

How to make funnel chart with bars in R ggplot2?

I want to make a funnel chart in R with ggplot2 as following:
https://chartio.com/assets/c15a30/tutorials/charts/funnel-charts/c7cd4465bc714689646515692b6dbe7c74ae7550a265cd2d6a530f1f34d68ae1/funnel-chart-example.png
My code looks like this, but I don't know how to do the the light blue fills between the bars. (maybe with polygon?)
library(ggplot2)
library(reshape2) # for melt()
library(dplyr)
# get data
dat <- read.table(text=
"steps numbers rate
clicks 332835 100.000000
signup 157697 47.379933
cart 29866 8.973215
buys 17012 5.111241",
header = T)
barWidth <- 0.9
# add spacing, melt, sort
total <- subset(dat, rate==100)$numbers
dat$padding <- (total - dat$numbers) / 2
molten <- melt(dat[, -3], id.var='steps')
molten <- molten[order(molten$variable, decreasing = T), ]
molten$steps <- factor(molten$steps, levels = rev(dat$steps))
ggplot(molten, aes(x=steps)) +
geom_bar(aes(y = value, fill = variable),
stat='identity', position='stack') +
geom_text(data=dat,
aes(y=total/2, label= paste(round(rate), '%')),
color='white') +
scale_fill_manual(values = c('grey40', NA) ) +
coord_flip() +
theme(legend.position = 'none') +
labs(x='steps', y='volume')
I needed the same but hadn't found one, so I created a function to do so. It might need some improvements, but it is working well. The example below shows only numbers, but you can also add texts.
x <- c(86307,
34494,
28127,
17796,
12488,
11233
)
source("https://gist.github.com/jjesusfilho/fd14b58becab4924befef5be239c6011")
gg_funnel(x, color = viridisLite::plasma(6))
This should be just a comment, since you explicitly asked for a ggplot solution, which this is not - I posted it as an answer purely for reasons of code formatting.
You could consider plotly, which has a funnel type. Something like
library(plotly)
dat %>% mutate(steps=factor(steps, unique(steps)),
rate=sprintf("%.2f%%", rate)) %>%
plot_ly(
type = "funnel",
y = ~steps,
text= ~rate,
x = ~numbers)
could get you started; I do not really grasp the padding you have in your data, so this might not be exactly what you want.

R ggplot2 Specify separate color gradients by group

I'm trying to make separate color gradients for grouped data that is displayed on the same scatterplot. I've included sample data below. User is unique user IDs, task is unique task IDs, days_completion is the time in days when the task was completed, task_group is the group indicator that the tasks are grouped into, and task_order is the order in which the tasks were made available for users to complete. Each row represents the time that the user completed a specific task. The task_order may not logically follow this organization as it was randomly generated, but it should suffice for demonstration.
The resulting plot would have days_completion of the x axis, user on the y axis, each point from geom_point would represent the time in days that the user completed their task. The tasks groups would each have their own color in a gradient of dark to light by task_order. For example, task group 1 would be dark red at task order == 1 and light red at task order == 7.
Sample code is below:
library(dplyr)
library(forcats)
library(ggplot2)
test_data <- tibble(user = rep(seq(1:50), 10) %>%
as_factor(),
task = sample(1:10, 500, replace = TRUE) %>%
as_factor(),
days_completion = sample(1:500, 500, replace = FALSE),
task_group = sample(1:3, 500, replace = TRUE) %>%
as_factor(),
task_order = sample(1:7, 500, replace = TRUE, prob = c(rep(.25,3),.2,.2,.1,.1)) %>%
as_factor()) %>%
arrange(days_completion)
#Sample plotting approach; does not work
test_plot <- test_data %>%
ggplot(aes(x = days_completion, y = user, color = task)) +
geom_point() +
#This seems to be what I need, but I can't figure out how to specify multiple gradients by task_group
scale_color_gradient()
I know I could manually order the factors and map colors with hex codes, but I'd like something that can scale and avoid the manual process. Also, if anyone has any suggestions for how to display this plot other than a scatterplot, I'm open to suggestions. The main idea is to detect patterns in completion time in trends displayed by the color. The trends may not show due to it being randomly generated data, but that's okay.
My coworker found a solution in another post that requires an additional package called ggnewscale. I still don't know if this can be done only with ggplot2, but this works. I'm still open to alternative plotting suggestions though. The purpose is to detect any trends in day of completion across and within users. Across users is where I expect to see more of a trend, but within could be informative too.
How merge two different scale color gradient with ggplot
library(ggnewscale)
dat1 <- test_data %>% filter(task_group == 1)
dat2 <- test_data %>% filter(task_group == 2)
dat3 <- test_data %>% filter(task_group == 3)
ggplot(mapping = aes(x = days_completion, y = user)) +
geom_point(data = dat1, aes(color = task_order)) +
scale_color_gradientn(colors = c('#99000d', '#fee5d9')) +
new_scale_color() +
geom_point(data = dat2, aes(color = task_order)) +
scale_color_gradientn(colors = c('#084594', '#4292c6')) +
new_scale_color() +
geom_point(data = dat3, aes(color = task_order)) +
scale_color_gradientn(colors = c('#238b45'))
You can have generate your own color scale by using RColorBrewer and pass it to scale_color_manual:
library(RColorBrewer)
colo <- colorRampPalette(c("darkred", "orangered"))(10)
library(ggplot2)
ggplot(test_data, aes(x = days_completion, y = user))+
geom_point(aes(color = task))+
scale_color_manual(values = colo)
Regarding the representation other than scatterplot, it is difficult to propose something else. It will based on your original data and the question you are trying to solve. Do you need to see the pattern per user ? or does your 50 users are just replicate of your experiments. In those cases, maybe some geom_density could be helpful. Otherwise, maybe you can take a look at stat_contour function.

How to plot the mean of a single factor in a barplot with

I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)

jitter geom_line()

Is there a way to jitter the lines in geom_line()? I know it kinda defies the purpose of this plot, but if you have a plot with few lines and would like them all to show it could be handy. Maybe some other solution to this visibility problem.
Please see below for code,
A <- c(1,2,3,5,1)
B <- c(3,4,1,2,3)
id <- 1:5
df <- data.frame(id, A, B)
# install.packages(reshape2)
require(reshape2) # for melt
dfm <- melt(df, id=c("id"))
# install.packages(ggplot2)
require(ggplot2)
p1 <- ggplot(data = dfm, aes(x = variable, y = value, group = id,
color= as.factor(id))) + geom_line() + labs(x = "id # 1 is hardly
visible as it is covered by id # 5") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
p2 <- ggplot(subset(dfm, id != 5), aes(x = variable, y = value,
group = id, color= as.factor(id))) + geom_line() + labs(x = "id # 
5 removed, id # 1 is visible") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
# install.packages(RODBC)
require(gridExtra)
grid.arrange(p1, p2)
You can try
geom_line(position=position_jitter(w=0.02, h=0))
and see if that works well.
If you just want to prevent two lines from overlapping exactly, there is now a better way: position_dodge(), which "adjusts position by dodging overlaps to the side". This is nicer than adding jitter to any line, even when it's not needed.
Avoid ggplot2 lines overlapping exactly using position_dodge()
Code example:
df<-data.frame(x=1:10,y=1:10,z=1:10);
df.m <- melt(df, id.vars = "x");
ggplot(df.m, aes(x=x,y=value,group=variable,colour=variable))
+ geom_line(position=position_dodge(width=0.2));
Thanks to position_dodge(), we can now see that there are two lines in the plot, which just happen to co-incide exactly:
I tend to use different linestyles, so that, say, a solid blue line "peeks through" a dashed red line on top of it.
Then again, it does depend on what you want to impart to the reader. Keep in mind first and foremost that data should be points and theory lines unless this makes things cluttered. Unless the y and x values are identical, it'll be easier to see the points. (or you could apply the existing jitter function to the x-values)
Next, if you just want to show which runs are in the "bundle" and which are outliers, overlap doesn't matter because it's very unlikely that two outliers will be near-equal.
If you want to show a bunch of near-equal runs, you may prefer (which is to say, your readers will understand better) to plot the deltas against a mean rather than the actual values.
I would like to suggest a solution to a different problem than described, in which the Y axis is a factor, so position_dodge does nothing.
code:
library(tidyverse)
time_raw <- tibble(year=1900:1909,
person_A=c(rep("Rome",2),rep("Jerusalem",8)),
person_B=c(rep("Jerusalem",5),rep("Rome",5)))
achievements <- tribble(~year,~who,~what,
1900,"person_A","born",
1900,"person_B","born",
1909,"person_A","died",
1909,"person_B","died",
1905,"person_A","super star",
1905,"person_B","super star")
SCALE=0.5
jitter_locations <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(place)%>%
filter(!is.na(place)) %>%
mutate(y_place=seq_along(place))
jitter_lines <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(who) %>%
mutate(y_jitter=scale(seq_along(who))*0.015)
data_for_plot <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
filter(!is.na(place)) %>%
left_join(achievements) %>%
left_join(jitter_locations) %>%
left_join(jitter_lines)
data_for_plot %>%
ggplot(aes(x=year,y=y_place+y_jitter,color=who,group=who))+
geom_line(size=2)+
geom_hline(aes(yintercept=y_place),size=50,alpha=0.1)+
geom_point(data = . %>% filter(!is.na(what)),size=5)+
geom_label(aes(label=what),size=3,nudge_y = -0.025)+
theme_bw()+
coord_cartesian(ylim = c(min(jitter_locations$y_place)-0.5*SCALE,
max(jitter_locations$y_place)+0.5*SCALE))+
scale_y_continuous(breaks =
min(jitter_locations$y_place):max(jitter_locations$y_place),
labels = jitter_locations$place)+
scale_x_continuous(breaks =
min(data_for_plot$year):max(data_for_plot$year))+
ylab("Place")

Resources