Overlaying ggplot data layers

Overlaying ggplot data layers - r

I am trying to overlay two different length datasets within ggplot.
Dataset 1: dataframe r where m is the date and V2 is the value with a range between -1 to +1:
> r
m V2
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
19991221 1
19910703 -0.396825397
19850326 0.916666667
19890328 -0.473053892
19610912 -0.75
20021106 -0.991525424
19940324 -1
19840522 -0.502145923
19780718 1
19811222 -0.447154472
19781017 0
19761108 -0.971014493
19791006 1
19891219 0.818181818
19851217 0.970149254
19980818 0.808219178
19940816 -0.985185185
19790814 -0.966666667
19990203 -0.882352941
19831220 1
19830114 -1
19980204 -0.991489362
19941115 -0.966101695
19860520 -0.986206897
19761019 -0.666666667
19900207 -0.983870968
19731010 0
19821221 -0.833333333
19770517 1
19800205 0.662337662
19760329 -0.545454545
19810224 -0.957446809
20000628 -0.989473684
19911105 -0.988571429
19960924 -0.483870968
19880816 1
19860923 1
20030506 -1
20031209 -1
19950201 -0.974025974
19790206 1
19811117 -0.989304813
19950822 -1
19860212 0.808219178
19730821 -0.463203463
use these lines to generate r
m<-gsub("-", "/", as.Date(as.character(fileloc$V1), "%Y%m%d"))
r<-cbind(m, fileloc[2])
colnames(r)
r
Dataset 2: The following data sets which defines the recession period in US:
library(quantmod)
getSymbols("USREC",src="FRED")
getSymbols("UNRATE", src="FRED")
unrate.df <- data.frame(date= index(UNRATE),UNRATE$UNRATE)
start <- index(USREC[which(diff(USREC$USREC)==1)])
end <- index(USREC[which(diff(USREC$USREC)==-1)-1])
reccesion.df <- data.frame(start=start, end=end[-1])
recession.df <- subset(reccesion.df, start >= min(unrate.df$date))
The resulting recession.df
> recession.df
start end
1 1948-12-01 1949-10-01
2 1953-08-01 1954-05-01
3 1957-09-01 1958-04-01
.....
11 2008-01-01 2009-06-01
Plotting:
I can generate separate scatter plots with the following:
ggplot(r, aes(V2, r$m, colour=V2))+
geom_point()+xlab(label='Tone Score')+ylab(label='Dates')
and timeseries with shaded region for recession with:
ggplot()+
geom_line(data=unrate.df, aes(x=date, y=UNRATE)) +
geom_rect(data=recession.df,
aes(xmin=start,xmax=end, ymin=0,ymax=max(unrate.df$UNRATE)),
fill="red", alpha=0.2)
How do I merge these plots to see overlay those scatter plot over the time series?

Without you providing the full dataset for the question, I have generated some random data for the dates between the dates 1973/08/21 and 1999/12/21:
set.seed(123)
r <- data.frame(m = seq.Date(as.Date("2017/12/21"), as.Date("1950/08/21"),
length.out = 135),
V2 = rnorm(n = 135, mean = 0, sd = 0.5))
You can overlay multiple layers within a ggplot by adding different a different data and aes arguments for each of the geom_ items you are calling.
ggplot() +
geom_point(data = r, aes(x = m, y = V2, colour=V2))+
geom_line(data=unrate.df, aes(x=date, y=UNRATE)) +
geom_rect(data=recession.df,
aes(xmin=start, xmax=end, ymin=0, ymax=max(unrate.df$UNRATE)),
fill="red", alpha=0.2) +
xlab(label='Tone Score')+ylab(label='Dates')

Related

Comparing Multiple lm() Results within ggplot2 [duplicate]

Is there a way to extract the values of the fitted line returned from stat_smooth?
The code I am using looks like this:
p <- ggplot(df1, aes(x=Days, y= Qty,group=Category,color=Category))
p <- p + stat_smooth(method=glm, fullrange=TRUE)+ geom_point())
This new r user would greatly appreciate any guidance.

Riffing off of #James example
p <- qplot(hp,wt,data=mtcars) + stat_smooth()
You can use the intermediate stages of the ggplot building process to pull out the plotted data. The results of ggplot_build is a list, one component of which is data which is a list of dataframes which contain the computed values to be plotted. In this case, the list is two dataframes since the original qplot creates one for points and the stat_smooth creates a smoothed one.
> ggplot_build(p)$data[[2]]
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
x y ymin ymax se PANEL group
1 52.00000 1.993594 1.149150 2.838038 0.4111133 1 1
2 55.58228 2.039986 1.303264 2.776709 0.3586695 1 1
3 59.16456 2.087067 1.443076 2.731058 0.3135236 1 1
4 62.74684 2.134889 1.567662 2.702115 0.2761514 1 1
5 66.32911 2.183533 1.677017 2.690049 0.2465948 1 1
6 69.91139 2.232867 1.771739 2.693995 0.2244980 1 1
7 73.49367 2.282897 1.853241 2.712552 0.2091756 1 1
8 77.07595 2.333626 1.923599 2.743652 0.1996193 1 1
9 80.65823 2.385059 1.985378 2.784740 0.1945828 1 1
10 84.24051 2.437200 2.041282 2.833117 0.1927505 1 1
11 87.82278 2.490053 2.093808 2.886297 0.1929096 1 1
12 91.40506 2.543622 2.145018 2.942225 0.1940582 1 1
13 94.98734 2.597911 2.196466 2.999355 0.1954412 1 1
14 98.56962 2.652852 2.249260 3.056444 0.1964867 1 1
15 102.15190 2.708104 2.303465 3.112744 0.1969967 1 1
16 105.73418 2.764156 2.357927 3.170385 0.1977705 1 1
17 109.31646 2.821771 2.414230 3.229311 0.1984091 1 1
18 112.89873 2.888224 2.478136 3.298312 0.1996493 1 1
19 116.48101 2.968745 2.531045 3.406444 0.2130917 1 1
20 120.06329 3.049545 2.552102 3.546987 0.2421773 1 1
21 123.64557 3.115893 2.573577 3.658208 0.2640235 1 1
22 127.22785 3.156368 2.601664 3.711072 0.2700548 1 1
23 130.81013 3.175495 2.625951 3.725039 0.2675429 1 1
24 134.39241 3.181411 2.645191 3.717631 0.2610560 1 1
25 137.97468 3.182252 2.658993 3.705511 0.2547460 1 1
26 141.55696 3.186155 2.670350 3.701961 0.2511175 1 1
27 145.13924 3.201258 2.687208 3.715308 0.2502626 1 1
28 148.72152 3.235698 2.721744 3.749652 0.2502159 1 1
29 152.30380 3.291766 2.782767 3.800765 0.2478037 1 1
30 155.88608 3.353259 2.857911 3.848607 0.2411575 1 1
31 159.46835 3.418409 2.938257 3.898561 0.2337596 1 1
32 163.05063 3.487074 3.017321 3.956828 0.2286972 1 1
33 166.63291 3.559111 3.092367 4.025855 0.2272319 1 1
34 170.21519 3.634377 3.165426 4.103328 0.2283065 1 1
35 173.79747 3.712729 3.242093 4.183364 0.2291263 1 1
36 177.37975 3.813399 3.347232 4.279565 0.2269509 1 1
37 180.96203 3.910849 3.447572 4.374127 0.2255441 1 1
38 184.54430 3.977051 3.517784 4.436318 0.2235917 1 1
39 188.12658 4.037302 3.583959 4.490645 0.2207076 1 1
40 191.70886 4.091635 3.645111 4.538160 0.2173882 1 1
41 195.29114 4.140082 3.700184 4.579981 0.2141624 1 1
42 198.87342 4.182676 3.748159 4.617192 0.2115424 1 1
43 202.45570 4.219447 3.788162 4.650732 0.2099688 1 1
44 206.03797 4.250429 3.819579 4.681280 0.2097573 1 1
45 209.62025 4.275654 3.842137 4.709171 0.2110556 1 1
46 213.20253 4.295154 3.855951 4.734357 0.2138238 1 1
47 216.78481 4.308961 3.861497 4.756425 0.2178456 1 1
48 220.36709 4.317108 3.859541 4.774675 0.2227644 1 1
49 223.94937 4.319626 3.851025 4.788227 0.2281358 1 1
50 227.53165 4.316548 3.836964 4.796132 0.2334829 1 1
51 231.11392 4.308435 3.818728 4.798143 0.2384117 1 1
52 234.69620 4.302276 3.802201 4.802351 0.2434590 1 1
53 238.27848 4.297902 3.787395 4.808409 0.2485379 1 1
54 241.86076 4.292303 3.772103 4.812503 0.2532567 1 1
55 245.44304 4.282505 3.754087 4.810923 0.2572576 1 1
56 249.02532 4.269040 3.733184 4.804896 0.2608786 1 1
57 252.60759 4.253361 3.710042 4.796680 0.2645121 1 1
58 256.18987 4.235474 3.684476 4.786473 0.2682509 1 1
59 259.77215 4.215385 3.656265 4.774504 0.2722044 1 1
60 263.35443 4.193098 3.625161 4.761036 0.2764974 1 1
61 266.93671 4.168621 3.590884 4.746357 0.2812681 1 1
62 270.51899 4.141957 3.553134 4.730781 0.2866658 1 1
63 274.10127 4.113114 3.511593 4.714635 0.2928472 1 1
64 277.68354 4.082096 3.465939 4.698253 0.2999729 1 1
65 281.26582 4.048910 3.415849 4.681971 0.3082025 1 1
66 284.84810 4.013560 3.361010 4.666109 0.3176905 1 1
67 288.43038 3.976052 3.301132 4.650972 0.3285813 1 1
68 292.01266 3.936392 3.235952 4.636833 0.3410058 1 1
69 295.59494 3.894586 3.165240 4.623932 0.3550782 1 1
70 299.17722 3.850639 3.088806 4.612473 0.3708948 1 1
71 302.75949 3.804557 3.006494 4.602619 0.3885326 1 1
72 306.34177 3.756345 2.918191 4.594499 0.4080510 1 1
73 309.92405 3.706009 2.823813 4.588205 0.4294926 1 1
74 313.50633 3.653554 2.723308 4.583801 0.4528856 1 1
75 317.08861 3.598987 2.616650 4.581325 0.4782460 1 1
76 320.67089 3.542313 2.503829 4.580796 0.5055805 1 1
77 324.25316 3.483536 2.384853 4.582220 0.5348886 1 1
78 327.83544 3.422664 2.259739 4.585589 0.5661643 1 1
79 331.41772 3.359701 2.128512 4.590891 0.5993985 1 1
80 335.00000 3.294654 1.991200 4.598107 0.6345798 1 1
Knowing a priori where the one you want is in the list isn't easy, but if nothing else you can look at the column names.
It is still better to do the smoothing outside the ggplot call, though.
EDIT:
It turns out replicating what ggplot2 does to make the loess is not as straightforward as I thought, but this will work. I copied it out of some internal functions in ggplot2.
model <- loess(wt ~ hp, data=mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=80)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y = pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")
That gives a plot that looks identical to
ggplot(mtcars, aes(x=hp, y=wt)) +
geom_point() +
geom_smooth()
(which is the expanded form of the original p).

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.
You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.
qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
[1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
[9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654
The outputs you can obtain are:
y, predicted value
ymin, lower pointwise confidence interval around
the mean
ymax, upper pointwise confidence interval around the mean
se, standard error
Note that by default it predicts on 80 data points, which may not be aligned with your original data.

A more general approach could be to simply use the predict() function to predict any range of values that are interesting.
# define the model
model <- loess(wt ~ hp, data = mtcars)
# predict fitted values for each observation in the original dataset
modelFit <- data.frame(predict(model, se = TRUE))
# define data frame for ggplot
df <- data.frame(cbind(hp = mtcars$hp
, wt = mtcars$wt
, fit = modelFit$fit
, upperBound = modelFit$fit + 2 * modelFit$se.fit
, lowerBound = modelFit$fit - 2 * modelFit$se.fit
))
# build the plot using the fitted values from the predict() function
# geom_linerange() and the second geom_point() in the code are built using the values from the predict() function
# for comparison ggplot's geom_smooth() is also shown
g <- ggplot(df, aes(hp, wt))
g <- g + geom_point()
g <- g + geom_linerange(aes(ymin = lowerBound, ymax = upperBound))
g <- g + geom_point(aes(hp, fit, size = 1))
g <- g + geom_smooth(method = "loess")
g
# Predict any range of values and include the standard error in the output
predict(model, newdata = 100:300, se = TRUE)

If you want to bring in the power of the tidyverse, you can use the "broom" library to add the predicted values from the loess function to your original dataset. This is building on #phillyooo's solution.
library(tidyverse)
library(broom)
# original graph with smoother
ggplot(data=mtcars, aes(hp,wt)) +
stat_smooth(method = "loess", span = 0.75)
# Create model that will do the same thing as under the hood in ggplot2
model <- loess(wt ~ hp, data = mtcars, span = 0.75)
# Add predicted values from model to original dataset using broom library
mtcars2 <- augment(model, mtcars)
# Plot both lines
ggplot(data=mtcars2, aes(hp,wt)) +
geom_line(aes(hp, .fitted), color = "red") +
stat_smooth(method = "loess", span = 0.75)

Save the graph object and use ggplot_build() or layer_data() to obtain the elements/estimates for the layers. e.g.
pp<-ggplot(mtcars, aes(x=hp, y=wt)) + geom_point() + geom_smooth();
ggplot_build(pp)

#NOT DUPLICATED!!# How can I get kernel density value from geom_density output in ggplot in R? [duplicate]

I would like to know what is geom_density() exactly doing, so I justify the graph and if there is any way of extracting the function or points that generates for each of the curves being plotted.
Thanks

Typing get("compute_group", ggplot2::StatDensity) (or, formerly, get("calculate", ggplot2:::StatDensity)) will get you the algorithm used to calculate the density. (At root, it's a call to density() with kernel="gaussian" the default.)
The points used in the plot are invisibly returned by print.ggplot(), so you can access them like this:
library(ggplot2)
m <- ggplot(movies, aes(x = rating))
m <- m + geom_density()
p <- print(m)
head(p$data[[1]], 3)
# y x density scaled count PANEL group ymin ymax
# 1 0.0073761 1.0000 0.0073761 0.025917 433.63 1 1 0 0.0073761
# 2 0.0076527 1.0176 0.0076527 0.026888 449.88 1 1 0 0.0076527
# 3 0.0078726 1.0352 0.0078726 0.027661 462.81 1 1 0 0.0078726
## Just to show that those are the points you are after,
## extract and use them to create a lattice xyplot
library(gridExtra)
library(lattice)
mm <- xyplot(y ~x, data=p$data[[1]], type="l")

As suggested in other answers, you can access the ggplot points using print.ggplot(). However, print()-ing code also prints the ggplot object, which may not be desired.
You can get extract the ggplot object data, without printing the plot, using ggplot_build():
library(ggplot2)
library(ggplot2movies)
m <- ggplot(movies, aes(x = rating))
m <- m + geom_density()
p <- ggplot_build(m) # <---- INSTEAD OF `p <- print(m)`
head(p$data[[1]], 3)
# y x density scaled count n PANEL group ymin
# 1 0.007376115 1.000000 0.007376115 0.02591684 433.6271 58788 1 -1 0
# 2 0.007652653 1.017613 0.007652653 0.02688849 449.8842 58788 1 -1 0
# 3 0.007872571 1.035225 0.007872571 0.02766120 462.8127 58788 1 -1 0
# Just to show that those are the points you are after, extract and use them
# to create a lattice xyplot
library(lattice)
m2 <- xyplot(y ~x, data=p$data[[1]], type="l")
library(gridExtra)
grid.arrange(m, m2, nrow=1)

Generate 3D surface plot in R

I have a data frame which consists of 4 variables A, B, C, D, I need to plot 3D surface plot for x = A, y = B, z = C. My problem is the plot should contain 3 surfaces which is with respect to the values in D i.e D has values of 0,1 and -1 I need to have 3 surfaces for 3 different values of D.
I tried by sub setting the data frame into 3 different dataframes with respect to the values of D and adding surface to plot_ly function but it doesnt seem to work and I am getting blank graph. I don't know if I am using the right plot function.
Below is my data frame d1
A B C D
734.5 2.28125 3.363312755 0
738 2.53125 3.395864326 0
727.25 2.484375 3.41183431 1
737 2.421875 3.380499188 1
727.25 2.3828125 3.39538442 1
933.25 4.6875 3.148660474 1
932.75 4.671875 3.155840809 1
934 4.671875 3.165391107 1
920.75 4.671875 3.194808475 1
913.25 4.671875 3.22907393 1
896.75 4.671875 3.287157844 1
880 4.671875 3.341203642 -1
866.75 4.59375 3.388017143 -1
714.5 3.296875 3.572828317 -1
730.75 3.296875 3.535364241 -1
734.75 3.296875 3.526142314 -1
713.25 3.7734375 3.653888449 -1
711.75 3.8203125 3.665152882 -1
711.75 3.8125 3.65967422 -1
714 3.796875 3.630867839 0
754.25 3.796875 3.560165628 0
715.25 3.78125 3.650415301 0
Below is my R code
library(plotly)
pd1 <- subset(d1, (D == 1))
nd1 <- subset(d1, (D == -1))
zd1 <- subset(d1, (D == 0))
p <- plot_ly(showscale = FALSE) %>%
add_surface(x= pd1$A,y = pd1$B, z = pd1$C)%>%
add_surface(x= nd1$A,y =nd1$B, z = nd1$C)%>%
add_surface(x= zd1$A,y =zd1$B, z = zd1$C)%>%

I think you have to pass a numeric matrix as argument to add_surface.
pd1_ma <- as.matrix(pd1)
nd1_ma <- as.matrix(nd1)
zd1_ma <- as.matrix(zd1)
p <- plot_ly(showscale = FALSE) %>%
add_surface(z = ~pd1_ma) %>%
add_surface(z = ~nd1_ma, opacity = 0.98) %>%
add_surface(z = ~zd1_ma, opacity = 0.98)
p
That was working for me.

ggplot2 missing data when plotting histogram with custom x axis limits

I am trying to plot six histograms (2 colums of data (calories, sodium) x 3 types (beef, meat, poultry)) with these data and I want to give them the same scale for x and y axis. I'm using scale_x_continuous to limit the x axis, which according to various sources, removes data that won't appear on the plot. Here is my code:
#src.table is the data frame containing my data
histogram <- function(df, dataset, n_bins, label) {
ggplot(df, aes(x=df[[dataset]])) +
geom_histogram(color="darkblue", fill="lightblue", bins = n_bins) + xlab(label)
}
src2_12.beef <- src2_12.table[src2_12.table$Type == "Beef",]
src2_12.meat <- src2_12.table[src2_12.table$Type == "Meat",]
src2_12.poultry <- src2_12.table[src2_12.table$Type == "Poultry",]
src2_12.calories_scale <- lims(x = c(min(src2_12.table$Calories), max(src2_12.table$Calories)), y = c(0, 6))
src2_12.sodium_scale <- lims(x = c(min(src2_12.table$Sodium), max(src2_12.table$Sodium)), y = c(0, 6))
#src2_12.calories_scale <- lims()
#src2_12.sodium_scale <- lims()
src2_12.plots <- list(
histogram(src2_12.beef, "Calories", 10, "Calories-Beef") + src2_12.calories_scale,
histogram(src2_12.meat, "Calories", 10, "Calories-Meat") + src2_12.calories_scale,
histogram(src2_12.poultry, "Calories", 10, "Calories-Poultry") + src2_12.calories_scale,
histogram(src2_12.beef, "Sodium", 10, "Sodium-Beef") + src2_12.sodium_scale,
histogram(src2_12.meat, "Sodium", 10, "Sodium-Meat") + src2_12.sodium_scale,
histogram(src2_12.poultry, "Sodium", 10, "Sodium-Poultry") + src2_12.sodium_scale
)
multiplot(plotlist = src2_12.plots, cols = 2, layout = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = TRUE))
Here is the output:
vs. what the data are supposed to look like:
I couldn't understand why some data points are missing since given that the limit I set is already the min and the max of the data.

You probably want to use coord_cartesian instead of lims. Unexpected things can happen when you're fiddling around with the limits on histograms, because a fair bit of fiddly transformations have to happen to get from your raw data to the actual histogram.
Let's peer under the hood for one example:
p <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10)
p1 <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10) +
lims(x = c(86,195))
a <- ggplot_build(p)
b <- ggplot_build(p1)
>a$data[[1]][,1:5]
y count x xmin xmax
1 1 1 114.1111 109.7222 118.5000
2 0 0 122.8889 118.5000 127.2778
3 3 3 131.6667 127.2778 136.0556
4 2 2 140.4444 136.0556 144.8333
5 5 5 149.2222 144.8333 153.6111
6 2 2 158.0000 153.6111 162.3889
7 0 0 166.7778 162.3889 171.1667
8 2 2 175.5556 171.1667 179.9444
9 3 3 184.3333 179.9444 188.7222
10 2 2 193.1111 188.7222 197.5000
> b$data[[1]][,1:5]
y count x xmin xmax
1 0 0 NA NA 90.83333
2 0 0 96.88889 90.83333 102.94444
3 1 1 109.00000 102.94444 115.05556
4 0 0 121.11111 115.05556 127.16667
5 4 4 133.22222 127.16667 139.27778
6 4 4 145.33333 139.27778 151.38889
7 4 4 157.44444 151.38889 163.50000
8 1 1 169.55556 163.50000 175.61111
9 4 4 181.66667 175.61111 187.72222
10 2 2 193.77778 187.72222 NA
>
So now you're wondering, how the heck did that happen, right?
Well, when you tell ggplot that you want 10 bins and the x limits go from 86 to 195, the histogram algorithm tries to create ten bins that span that actual range. That's why it's trying to create bins down below 100 even though there's no data there.
And then further oddities can happen because the bars may extend past the nominal data range (the xmin and xmax values), since the bar widths will generally encompass a little above and a little below your actual data at the high and low ends.
coord_cartesian will adjust the x limits after all this processing has happened, so it bypasses all these little quirks.

R - What algorithm does geom_density() use and how to extract points/equation of curves?

I would like to know what is geom_density() exactly doing, so I justify the graph and if there is any way of extracting the function or points that generates for each of the curves being plotted.
Thanks

Typing get("compute_group", ggplot2::StatDensity) (or, formerly, get("calculate", ggplot2:::StatDensity)) will get you the algorithm used to calculate the density. (At root, it's a call to density() with kernel="gaussian" the default.)
The points used in the plot are invisibly returned by print.ggplot(), so you can access them like this:
library(ggplot2)
m <- ggplot(movies, aes(x = rating))
m <- m + geom_density()
p <- print(m)
head(p$data[[1]], 3)
# y x density scaled count PANEL group ymin ymax
# 1 0.0073761 1.0000 0.0073761 0.025917 433.63 1 1 0 0.0073761
# 2 0.0076527 1.0176 0.0076527 0.026888 449.88 1 1 0 0.0076527
# 3 0.0078726 1.0352 0.0078726 0.027661 462.81 1 1 0 0.0078726
## Just to show that those are the points you are after,
## extract and use them to create a lattice xyplot
library(gridExtra)
library(lattice)
mm <- xyplot(y ~x, data=p$data[[1]], type="l")

As suggested in other answers, you can access the ggplot points using print.ggplot(). However, print()-ing code also prints the ggplot object, which may not be desired.
You can get extract the ggplot object data, without printing the plot, using ggplot_build():
library(ggplot2)
library(ggplot2movies)
m <- ggplot(movies, aes(x = rating))
m <- m + geom_density()
p <- ggplot_build(m) # <---- INSTEAD OF `p <- print(m)`
head(p$data[[1]], 3)
# y x density scaled count n PANEL group ymin
# 1 0.007376115 1.000000 0.007376115 0.02591684 433.6271 58788 1 -1 0
# 2 0.007652653 1.017613 0.007652653 0.02688849 449.8842 58788 1 -1 0
# 3 0.007872571 1.035225 0.007872571 0.02766120 462.8127 58788 1 -1 0
# Just to show that those are the points you are after, extract and use them
# to create a lattice xyplot
library(lattice)
m2 <- xyplot(y ~x, data=p$data[[1]], type="l")
library(gridExtra)
grid.arrange(m, m2, nrow=1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Overlaying ggplot data layers - r

Related

Comparing Multiple lm() Results within ggplot2 [duplicate]

#NOT DUPLICATED!!# How can I get kernel density value from geom_density output in ggplot in R? [duplicate]

Generate 3D surface plot in R

ggplot2 missing data when plotting histogram with custom x axis limits

R - What algorithm does geom_density() use and how to extract points/equation of curves?

Categories

Resources