I have this code:
x <- seq(-600, 600, length=10000)
dat1 <- data.frame(x=x, SD=400, val = (1/(1+10^(-x/400))))
dat2 <- data.frame(x=x, SD=200, val = (1/(1+10^(-x/200))))
dat3 <- data.frame(x=x, SD=600, val = (1/(1+10^(-x/600))))
dat <- rbind(dat1, dat2, dat3)
ggplot(data=dat, aes(x=x, y=val, colour=SD)) + geom_line(aes(group=SD))
What I expected is to have 3 curves and I do. However the legend shows that there are 6 curves - for SD 100, 200, 300, 400, 500, 600 instead of only 200, 400, 600. Why is that and how do I fix this?
The legend is not indicating the presence of 6 curves. You've mapped the continuous variable SD to the aesthetic colour, which results in a continuous colour scale, i.e. a gradient. If you want only the three values in the legend, try wrapping SD in factor:
ggplot(data=dat, aes(x=x, y=val, colour=factor(SD))) + geom_line(aes(group=SD))
Related
I have 3 lines in a plot which I want to move 1 unit on the X axis and 100 units on the Y axis to create a 3D effect as in the example below. So far I have only been able to do the lines. I tried with the position_nudge() function, but it didn't have the effect I expected, it changed the scale of the axes, but not the position of the lines.
Plus: If the plot with the frames looks like a cube, that would be a great thing.
Example:
MWE:
library(ggplot2)
Group <- c("A", "B", "C")
Time <- 0:10
DF <- expand.grid(Time = Time,
Group = Group)
DF$Y <- c(rep(1,5), 100, rep(1,5),
rep(1,5), 500, rep(1,5),
rep(1,5), 1000, rep(1,5))
ggplot(data = DF,
aes(x = Time,
y = Y,
color = Group)) +
geom_line(position = position_nudge(y = 100, x=1)) +
theme_bw()
I would like a column in variabel df with a value of either TRUE or FALSE if the point falls below the diagonal line. The plot is just to illustrate the concept. I am quite stuck on this any help is appreciated.
# Test data
df <- data.frame(
x = sample(1:100, 100, replace=FALSE),
y = sample(1:100, 100, replace=FALSE))
library(ggplot2)
g <- ggplot(data=df, aes(x=x, y=y))
g + geom_point() +
geom_abline(intercept = 25, slope = 1)
You can try something like this (expanding further the suggestion of #Bas). The values you want are saved in variables ye which you can change to a logical too. Also you can change the definition of ye. In this case I added the numbers so that you can see as tags:
library(ggplot2)
library(dplyr)
#Data
df <- data.frame(
x = sample(1:100, 100, replace=FALSE),
y = sample(1:100, 100, replace=FALSE))
#Create equation
df %>% mutate(ye=25+1*x,lab=ifelse(y<ye,y,NA)) -> df2
#Plot
ggplot(data=df2, aes(x=x, y=y,label=lab))+
geom_point() +
geom_abline(intercept = 25, slope = 1)+
geom_text(nudge_y = 2,check_overlap = T,size=2)
Output:
I have already found numerous of questions to it, but somehow it did not really help me. I do not understand how to change the binwidth in a density histogram in ggplot2, so that the probabilities sum up to 1. It seems like it only works if the binwidth is exactly 1.
Here is an example:
set.seed(1)
df = data.frame("data" = runif(1000, min=0, max=100))
a = ggplot(data = df, aes(x = data))+
geom_histogram(aes(y=..density..),colour="black", fill = "white",
breaks=seq(0, 100, by = 50))
b = ggplot(data = df, aes(x = data))+
geom_histogram(aes(y =..density..),
breaks=seq(0, 100, by = 30),
col="black",
fill="white")
c = ggplot(data = df, aes(x = data))+
geom_histogram(aes(y =..density..),
breaks=seq(0, 100, by = 10),
col="black",
fill="white")
d = ggplot(data = df, aes(x = data))+
geom_histogram(aes(y =..density..),
breaks=seq(0, 100, by = 1),
col="black",
fill="white")
grid.arrange(a,b,c,d, ncol= 2)
If you look at the probability axis, you can see that the first three graphs must be wrong. These are not the right histograms as the bins do not sum up to 1. The y-axis even does not change significantly according to the histogram a, b, c or d. I also tried to replace the "breaks" command by the "binwidth" command, but it is even worse then.
I would also like to know how you can count the probabilities of the single bins of a histogram to proof that it sums up to 1 or not?
Thanks for any help.
Simulate some data:
library(ggplot2)
library(dplyr)
set.seed(1)
df = data.frame("data" = runif(1000, min=0, max=100))
The first plot you can get is:
# y axis has the density estimate values
ggplot(data = df, aes(x = data))+
geom_histogram(aes(y=..density..),colour="black", fill = "white",
breaks=seq(0, 100, by = 50))
This plot has the density estimates on the y axis. Those values correspond to the density plot and not to the bars you created. You can see this version where the density plot is overlayed:
# y axis has the density estimate values and the density plot
ggplot(data = df, aes(x = data))+
geom_histogram(aes(y=..density..),colour="black", fill = "white",
breaks=seq(0, 100, by = 50)) +
geom_density(aes(data), col="red")
A way to interpret this is that each point on the red line has a probability to be selected and that's on the y axis (i.e. lots of points means that probabilities tend closer to zero).
You can get what you want with this:
# y axis has the probabilities of each bar (bar counts / all counts)
ggplot(data = df, aes(x = data))+
geom_histogram(aes(y=..count../sum(..count..)),colour="black", fill = "white",
breaks=seq(0, 100, by = 50))
Another way to do the above, while keeping the data (for future usage or just check probabilities sum to 1) is this:
# assign the breaks
breaks = cut(df$data, seq(0, 100, by = 50))
# count observations in each bar and probability of each bar
df %>%
mutate(Breaks = breaks) %>%
count(Breaks) %>%
mutate(Prc = n/sum(n))
# # A tibble: 2 x 3
# Breaks n Prc
# <fctr> <int> <dbl>
# 1 (0,50] 520 0.52
# 2 (50,100] 480 0.48
# plot the above
df %>%
mutate(Breaks = breaks) %>%
count(Breaks) %>%
mutate(Prc = n/sum(n)) %>%
ggplot(aes(Breaks, Prc)) + geom_col()
In this example of a hexbin plot, the legend on the right has 10 levels/classes/breaks. Does anyone know how to change the number of levels? Say I want to change it to 5 or something.
library(hexbin)
x=rnorm(1000, mean = 50, sd = 1)
y=rnorm(1000, mean = 30, sd = 0.5)
df <- data.frame(x,y)
#plot(df)
hb <- hexbin(x=df$x, df$y)
#hb <- hexbin(x=df$x, df$y,xbins=30)
#plot(hb)
gplot.hexbin(hb)
Like this?
gplot.hexbin(hb,colorcut=5)
And here's approximately the same thing using ggplot.
library(ggplot2)
ggplot(df, aes(x,y))+
geom_hex(aes(fill=cut(..value..,breaks=pretty(..value..,n=5))),bins=15)+
scale_fill_manual("Count",values=grey((5:0)/6))
Using ggplot2, I want to create a histogram where anything above X is grouped into the final bin. For example, if most of my distribution was between 100 and 200, and I wanted to bin by 10, I would want anything above 200 to be binned in "200+".
# create some fake data
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#plot the data
hist <- ggplot(df, aes(x=visits)) + geom_histogram(binwidth=50)
How can I limit the X axis, while still representing the data I want limit?
Perhaps you're looking for the breaks argument for geom_histogram:
# create some fake data
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#plot the data
require(ggplot2)
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10), max(visits)), position = "identity") +
coord_cartesian(xlim=c(0,210))
This would look like this (with the caveats that the fake data looks pretty bad here and the axis need to be adjusted as well to match the breaks):
Edit:
Maybe someone else can weigh in here:
# create breaks and labels
brks <- c(seq(0, 200, by=10), max(visits))
lbls <- c(as.character(seq(0, 190, by=10)), "200+", "")
# true
length(brks)==length(lbls)
# hmmm
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=brks, position = "identity") +
coord_cartesian(xlim=c(0,220)) +
scale_x_continuous(labels=lbls)
The plot errors with:
Error in scale_labels.continuous(scale) :
Breaks and labels are different lengths
Which looks like this but that was fixed 8 months ago.
If you want to fudge it a little to get around the issues of bin labelling then just subset your data and create the binned values in a new sacrificial data-frame:
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#create sacrificical data frame
dfsac <- df
dfsac$visits[dfsac$visits > 200 ] <- 200
Then use the breaks command in scale_x_continuous to define your bin labels easily:
ggplot(data=dfsac, aes(dfsac$visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10)),
col="black",
fill="red") +
labs(x="Visits", y="Count")+
scale_x_continuous(limits=c(0, 200), breaks=c(seq(0, 200, by=10)), labels=c(seq(0,190, by=10), "200+"))