geom_hline with multiple points and facet_wrap - r

i am trying to plot horizontal lines at specific points of my data. The idea is that i would like a horizontal line from the first value of equivalent iterations(i.e 0) at y intercept for each of my axis; SA, VLA, HLA. My question will become clearer with data.
iterations subsets equivalent_iterations axis ratio1 ratio2
0 0 0 SA 0.023569024 0.019690577
0 0 0 SA 0.023255814 0.019830028
0 0 0 VLA 0.025362319 0.020348837
0 0 0 HLA 0.022116904 0.021472393
2 2 4 SA 0.029411765 0.024911032
2 2 4 SA 0.024604569 0.022838499
2 2 4 VLA 0.026070764 0.022727273
2 2 4 HLA 0.027833002 0.027888446
4 15 60 SA 0.019746121 0.014403292
4 15 60 SA 0.018691589 0.015538291
4 15 60 VLA 0.021538462 0.01686747
4 15 60 HLA 0.017052375 0.017326733
16 5 80 SA 0.019021739 0.015021459
16 5 80 SA 0.020527859 0.015384615
16 5 80 VLA 0.023217247 0.017283951
16 5 80 HLA 0.017391304 0.016298021
and this is my plot using ggplot
ggplot(df)+
aes(x = equivalent_iterations, y = ratio1, color = equivalent_iterations)+
geom_point() +
facet_wrap(~axis) +
expand_limits(x = 0, y = 0)
What i want is for each axis SA, VLA, HLA (i.e. each facet_wrap) a horizontal line from the first point (which is at 0 equivalent iterations) at the y intercept (which is given by the ratio1 in column 5 in the first 4 values). Any help will be greatly appreciated. Thank you in advance

You can treat it like any other geom_*. Just create a new column with the value of ratio1 at which you want to plot the horizontal line. I do this by sub setting the the data by those where iterations = 0 (note SA has 2 of these) and joining the ratio1 column onto the original dataframe. This column can then be passed to the aesthetics call in geom_hline().
library(tidyverse)
df %>%
left_join(df %>%
filter(iterations == 0) %>%
select(axis, intercept = ratio1)) %>%
ggplot(aes(x = equivalent_iterations, y = ratio1,
color = equivalent_iterations)) +
geom_point() +
geom_hline(aes(yintercept = intercept)) +
facet_wrap(~axis) +
expand_limits(x = 0, y = 0)

Related

How to get predicted values from geom_smooth graph in R? [duplicate]

Is there a way to extract the values of the fitted line returned from stat_smooth?
The code I am using looks like this:
p <- ggplot(df1, aes(x=Days, y= Qty,group=Category,color=Category))
p <- p + stat_smooth(method=glm, fullrange=TRUE)+ geom_point())
This new r user would greatly appreciate any guidance.
Riffing off of #James example
p <- qplot(hp,wt,data=mtcars) + stat_smooth()
You can use the intermediate stages of the ggplot building process to pull out the plotted data. The results of ggplot_build is a list, one component of which is data which is a list of dataframes which contain the computed values to be plotted. In this case, the list is two dataframes since the original qplot creates one for points and the stat_smooth creates a smoothed one.
> ggplot_build(p)$data[[2]]
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
x y ymin ymax se PANEL group
1 52.00000 1.993594 1.149150 2.838038 0.4111133 1 1
2 55.58228 2.039986 1.303264 2.776709 0.3586695 1 1
3 59.16456 2.087067 1.443076 2.731058 0.3135236 1 1
4 62.74684 2.134889 1.567662 2.702115 0.2761514 1 1
5 66.32911 2.183533 1.677017 2.690049 0.2465948 1 1
6 69.91139 2.232867 1.771739 2.693995 0.2244980 1 1
7 73.49367 2.282897 1.853241 2.712552 0.2091756 1 1
8 77.07595 2.333626 1.923599 2.743652 0.1996193 1 1
9 80.65823 2.385059 1.985378 2.784740 0.1945828 1 1
10 84.24051 2.437200 2.041282 2.833117 0.1927505 1 1
11 87.82278 2.490053 2.093808 2.886297 0.1929096 1 1
12 91.40506 2.543622 2.145018 2.942225 0.1940582 1 1
13 94.98734 2.597911 2.196466 2.999355 0.1954412 1 1
14 98.56962 2.652852 2.249260 3.056444 0.1964867 1 1
15 102.15190 2.708104 2.303465 3.112744 0.1969967 1 1
16 105.73418 2.764156 2.357927 3.170385 0.1977705 1 1
17 109.31646 2.821771 2.414230 3.229311 0.1984091 1 1
18 112.89873 2.888224 2.478136 3.298312 0.1996493 1 1
19 116.48101 2.968745 2.531045 3.406444 0.2130917 1 1
20 120.06329 3.049545 2.552102 3.546987 0.2421773 1 1
21 123.64557 3.115893 2.573577 3.658208 0.2640235 1 1
22 127.22785 3.156368 2.601664 3.711072 0.2700548 1 1
23 130.81013 3.175495 2.625951 3.725039 0.2675429 1 1
24 134.39241 3.181411 2.645191 3.717631 0.2610560 1 1
25 137.97468 3.182252 2.658993 3.705511 0.2547460 1 1
26 141.55696 3.186155 2.670350 3.701961 0.2511175 1 1
27 145.13924 3.201258 2.687208 3.715308 0.2502626 1 1
28 148.72152 3.235698 2.721744 3.749652 0.2502159 1 1
29 152.30380 3.291766 2.782767 3.800765 0.2478037 1 1
30 155.88608 3.353259 2.857911 3.848607 0.2411575 1 1
31 159.46835 3.418409 2.938257 3.898561 0.2337596 1 1
32 163.05063 3.487074 3.017321 3.956828 0.2286972 1 1
33 166.63291 3.559111 3.092367 4.025855 0.2272319 1 1
34 170.21519 3.634377 3.165426 4.103328 0.2283065 1 1
35 173.79747 3.712729 3.242093 4.183364 0.2291263 1 1
36 177.37975 3.813399 3.347232 4.279565 0.2269509 1 1
37 180.96203 3.910849 3.447572 4.374127 0.2255441 1 1
38 184.54430 3.977051 3.517784 4.436318 0.2235917 1 1
39 188.12658 4.037302 3.583959 4.490645 0.2207076 1 1
40 191.70886 4.091635 3.645111 4.538160 0.2173882 1 1
41 195.29114 4.140082 3.700184 4.579981 0.2141624 1 1
42 198.87342 4.182676 3.748159 4.617192 0.2115424 1 1
43 202.45570 4.219447 3.788162 4.650732 0.2099688 1 1
44 206.03797 4.250429 3.819579 4.681280 0.2097573 1 1
45 209.62025 4.275654 3.842137 4.709171 0.2110556 1 1
46 213.20253 4.295154 3.855951 4.734357 0.2138238 1 1
47 216.78481 4.308961 3.861497 4.756425 0.2178456 1 1
48 220.36709 4.317108 3.859541 4.774675 0.2227644 1 1
49 223.94937 4.319626 3.851025 4.788227 0.2281358 1 1
50 227.53165 4.316548 3.836964 4.796132 0.2334829 1 1
51 231.11392 4.308435 3.818728 4.798143 0.2384117 1 1
52 234.69620 4.302276 3.802201 4.802351 0.2434590 1 1
53 238.27848 4.297902 3.787395 4.808409 0.2485379 1 1
54 241.86076 4.292303 3.772103 4.812503 0.2532567 1 1
55 245.44304 4.282505 3.754087 4.810923 0.2572576 1 1
56 249.02532 4.269040 3.733184 4.804896 0.2608786 1 1
57 252.60759 4.253361 3.710042 4.796680 0.2645121 1 1
58 256.18987 4.235474 3.684476 4.786473 0.2682509 1 1
59 259.77215 4.215385 3.656265 4.774504 0.2722044 1 1
60 263.35443 4.193098 3.625161 4.761036 0.2764974 1 1
61 266.93671 4.168621 3.590884 4.746357 0.2812681 1 1
62 270.51899 4.141957 3.553134 4.730781 0.2866658 1 1
63 274.10127 4.113114 3.511593 4.714635 0.2928472 1 1
64 277.68354 4.082096 3.465939 4.698253 0.2999729 1 1
65 281.26582 4.048910 3.415849 4.681971 0.3082025 1 1
66 284.84810 4.013560 3.361010 4.666109 0.3176905 1 1
67 288.43038 3.976052 3.301132 4.650972 0.3285813 1 1
68 292.01266 3.936392 3.235952 4.636833 0.3410058 1 1
69 295.59494 3.894586 3.165240 4.623932 0.3550782 1 1
70 299.17722 3.850639 3.088806 4.612473 0.3708948 1 1
71 302.75949 3.804557 3.006494 4.602619 0.3885326 1 1
72 306.34177 3.756345 2.918191 4.594499 0.4080510 1 1
73 309.92405 3.706009 2.823813 4.588205 0.4294926 1 1
74 313.50633 3.653554 2.723308 4.583801 0.4528856 1 1
75 317.08861 3.598987 2.616650 4.581325 0.4782460 1 1
76 320.67089 3.542313 2.503829 4.580796 0.5055805 1 1
77 324.25316 3.483536 2.384853 4.582220 0.5348886 1 1
78 327.83544 3.422664 2.259739 4.585589 0.5661643 1 1
79 331.41772 3.359701 2.128512 4.590891 0.5993985 1 1
80 335.00000 3.294654 1.991200 4.598107 0.6345798 1 1
Knowing a priori where the one you want is in the list isn't easy, but if nothing else you can look at the column names.
It is still better to do the smoothing outside the ggplot call, though.
EDIT:
It turns out replicating what ggplot2 does to make the loess is not as straightforward as I thought, but this will work. I copied it out of some internal functions in ggplot2.
model <- loess(wt ~ hp, data=mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=80)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y = pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")
That gives a plot that looks identical to
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth()
(which is the expanded form of the original p).
stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.
You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.
qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
[1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
[9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654
The outputs you can obtain are:
y, predicted value
ymin, lower pointwise confidence interval around
the mean
ymax, upper pointwise confidence interval around the mean
se, standard error
Note that by default it predicts on 80 data points, which may not be aligned with your original data.
A more general approach could be to simply use the predict() function to predict any range of values that are interesting.
# define the model
model <- loess(wt ~ hp, data = mtcars)
# predict fitted values for each observation in the original dataset
modelFit <- data.frame(predict(model, se = TRUE))
# define data frame for ggplot
df <- data.frame(cbind(hp = mtcars$hp
, wt = mtcars$wt
, fit = modelFit$fit
, upperBound = modelFit$fit + 2 * modelFit$se.fit
, lowerBound = modelFit$fit - 2 * modelFit$se.fit
))
# build the plot using the fitted values from the predict() function
# geom_linerange() and the second geom_point() in the code are built using the values from the predict() function
# for comparison ggplot's geom_smooth() is also shown
g <- ggplot(df, aes(hp, wt))
g <- g + geom_point()
g <- g + geom_linerange(aes(ymin = lowerBound, ymax = upperBound))
g <- g + geom_point(aes(hp, fit, size = 1))
g <- g + geom_smooth(method = "loess")
g
# Predict any range of values and include the standard error in the output
predict(model, newdata = 100:300, se = TRUE)
If you want to bring in the power of the tidyverse, you can use the "broom" library to add the predicted values from the loess function to your original dataset. This is building on #phillyooo's solution.
library(tidyverse)
library(broom)
# original graph with smoother
ggplot(data=mtcars, aes(hp,wt)) +
stat_smooth(method = "loess", span = 0.75)
# Create model that will do the same thing as under the hood in ggplot2
model <- loess(wt ~ hp, data = mtcars, span = 0.75)
# Add predicted values from model to original dataset using broom library
mtcars2 <- augment(model, mtcars)
# Plot both lines
ggplot(data=mtcars2, aes(hp,wt)) +
geom_line(aes(hp, .fitted), color = "red") +
stat_smooth(method = "loess", span = 0.75)
Save the graph object and use ggplot_build() or layer_data() to obtain the elements/estimates for the layers. e.g.
pp<-ggplot(mtcars, aes(x=hp, y=wt)) + geom_point() + geom_smooth();
ggplot_build(pp)

Grouped barplot side by side

I'm trying to plot the table below using a grouped barplot with ggplot2.
How do I plot it in a way such that the scheduled audits and noofemails are plotted sided by side based on each day?
Email Type Sent Month Sent Day Scheduled Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21
ggplot(joined, aes(x=`Sent Day`, y=`Scheduled Audits`, fill = Noofemails )) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()
Does not achieve the plot I hope to see.
Using this data format, so slightly new column names, no more back-ticks. read_table(text = "") is a nice way to share little datasets on Stack
joined <- read.table(text =
"ID Email_Type Sent_Month Sent_Day Scheduled_Audits Noofemails
27 A 1 30 7 581
29 A 1 31 0 9
1 A 2 1 2 8
26 B 1 29 1048 25312
28 B 1 30 23 170
30 B 1 31 18 109
2 B 2 1 6 93
3 B 2 2 9 86
4 B 2 4 3 21",
header = TRUE)
This is why ggplot2 really likes long data instead of wide data. Because it needs column names to create the aesthetics.
So you can use the function tidyr::gather() to rearrange the two columns of interest into one with labels and one with values. This increase the number of rows in the data frame, so thats why its called long.
long <- tidyr::gather(joined,"key", "value", Scheduled_Audits, Noofemails)
ggplot(long, aes(Sent_Day, value, fill = key)) +
geom_col(position = "dodge")
Alternatively you can use the melt() function from the reshape package. See example below.
library("ggplot2")
library(reshape2)
joined2 <- melt(joined[,c("Sent_Day", "Noofemails", "Scheduled_Audits")], id="Sent_Day")
ggplot(joined2, aes(x=`Sent_Day`, y= value, group = variable, fill= variable)) +
geom_bar(stat="identity", position = position_dodge()) +
scale_x_continuous(breaks = c(1:29)) +
ggtitle("Number of emails sent in February") +
theme_classic()

geom_bar removed 3 rows with missing values

I'm trying to create a histogram using ggplot2 in R.
This is the code I'm using:
library(tidyverse)
dat_male$explicit_truncated <- trunc(dat_male$explicit_mean)
means2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), mean, na.rm=TRUE)
colnames(means2) <- c("explicit", "id", "IAT_D")
sd2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), sd, na.rm=TRUE)
length2 <- aggregate(dat_male$IAT_D, by=list(dat_male$explicit_truncated,dat_male$id), length)
se2 <- sd2$x / sqrt(length$x)
means2$lo <- means2$IAT_D - 1.6*se2
means2$hi <- means2$IAT_D + 1.6*se2
ggplot(data = means2, aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2), position=position_dodge(0.9), data=means2) +
xlab("Explicit attitude score") +
ylab("D-score")
For some reason I get the following warning message:
Removed 3 rows containing missing values (geom_bar).
And I get the following histogram:
I really have no clue what is going on.
Please let me know if you need to see anything else of my code, I'm never really sure what to include.
dat_male is a dataset that looks like this (I have only included the variables that I mentioned in this question, as the dataset contains 68 variables):
id explicit_mean IAT_D explicit_truncated
5 1 3.1250 0.366158652 3
6 1 3.3125 0.373590066 3
9 1 3.6250 0.208096230 3
11 1 3.1250 0.661983618 3
15 1 2.3125 0.348246184 2
19 1 3.7500 0.562406383 3
28 1 2.5625 -0.292888526 2
35 1 4.3750 0.560039531 4
36 1 3.8125 -0.117455439 3
37 1 3.1250 0.074375196 3
46 1 2.5625 0.488265849 2
47 1 4.2500 -0.131005579 4
53 1 2.0625 0.193040876 2
55 1 2.6875 0.875420303 2
62 1 3.8750 0.579146056 3
63 1 3.3125 0.666095380 3
66 1 2.8125 0.115607820 2
68 1 4.3750 0.259929946 4
80 1 3.0000 0.502709149 3
means2 is a dataset I have used to calculate means, and that looks like this:
explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864
Now that I see it front of me, it probably has something to do with the NaN's?
From your dataset it seems like everything is alright.
The errors that you get are an indication that your data.frame has empty values (i.e. NaN and NA).
I actually got two warning messages:
Warning messages:
1: Removed 1 rows containing missing values
(geom_bar).
2: Removed 2 rows containing missing values
(geom_errorbar).
Regarding the plot, because you don't have any zero values under explicit, you don't see it in the graph. Similarly, because you have NAs under lo and hi for one in explicit, you don't get the corresponding error bar.
Dataset:
means2 <- read.table(text = " explicit id IAT_D lo hi
1 0 0 NaN NaN NaN
2 2 0 0.23501191 0.1091807 0.3608431
3 3 0 0.31478389 0.2311406 0.3984272
4 4 0 -0.24296625 -0.3241166 -0.1618159
5 1 1 -0.04010111 NA NA
6 2 1 0.21939286 0.1109138 0.3278719
7 3 1 0.29097806 0.1973051 0.3846511
8 4 1 0.22965463 0.1209229 0.3383864",
header = TRUE)
plot:
means2 %>%
ggplot(aes(x = factor(explicit), y = IAT_D, fill = factor(id))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin=lo,ymax=hi, width=.2),
position=position_dodge(0.9)) +
xlab("Explicit attitude score") +
ylab("D-score")

Plot Issues - Start always in (0,0)

I am working with a huge data set where all columns look something like this:
0
10
12
30
10
0
20
30
0
40
50
10
0
The idea is to make a simple plot in R where every time it reads a 0 the plot will begin in (0,0).
Do you have any idea of how I can do this?
Thanks in advance,
J
UPDATE:
I am a new user so I can't post any images!
Here's an example of the column I want to plot:
0
10
20
12
5
6
9
0
20
24
40
14
0
20
59
50
12
0
20
23
49
45
23
12
(...)
Image a line plot.
Instead of plotting a long line with all the values I want to plot several shorter lines with the first line plotting (0,10,20,12,5,6,9), the second line plotting (0,20,24,40,14) etc...
I would add an additional column specifying which subdataset your are:
Value Group
0 1
1 1
5 1
0 2
Etc.
You can then plot the subgroups using e.g. ggplot2:
ggplot(yourdata, aes(x = xcoor, y = Value, color = Group)) +
geom_line()
Which will draw the lines with different colors. Or using plot using something like:
split_dat = with(yourdata, split(Value, Group))
plot(split_dat[[1]])
for(i in 2:length(split_dat)) {
lines(split_dat[[i]])
}

How can I use stat_bin2d with pre-binned data?

I want to generate a stat_bin2d() plot but for pre-binned data;
i.e. Rather than raw points
x y
5 3
13 4
13 14
16 12
15 13
I instead have the data pre-binned with the corner points, in this case.
x y freq
0 0 1
0 10 0
10 0 1
10 10 3
I believe it might have something to do with the data param of stat_bin2d but i can't find any doco on this.
You can use geom_bin2d() (with an "identity" stat), or just directly draw rectangles.
dat <- data.frame(x=c(0,0,10,10), y=c(0,10,0,10), freq=c(1,0,1,3))
ggplot(dat) +
geom_bin2d(aes(xmin=x, ymin=y, xmax=x+10, ymax=y+10, fill=freq), stat="identity")
ggplot(dat) +
geom_rect(aes(xmin=x, ymin=y, xmax=x+10, ymax=y+10, fill=freq))

Resources