how to get the stat details in ggplot2? [duplicate] - r

I have made a plot using ggplot2 geom_histogram from a data frame. See sample below and link to the ggplot histogram Need to label each geom_vline with the factors using a nested ddply function and facet wrap
I now need to make a data frame that contains the summarized data used to generate the ggplot above.
Sector2 Family Year Length
BUN Acroporidae 2010 332.1300496
BUN Poritidae 2011 141.1467966
BUN Acroporidae 2012 127.479
BUN Acroporidae 2013 142.5940556
MUR Faviidae 2010 304.0405
MUR Faviidae 2011 423.152
MUR Pocilloporidae 2012 576.0295
MUR Poritidae 2013 123.8936667
NTH Faviidae 2010 60.494
NTH Faviidae 2011 27.427
NTH Pocilloporidae 2012 270.475
NTH Poritidae 2013 363.4635

To get values actually plotted you can use function ggplot_build() where argument is your plot.
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count). If you use facets then column PANEL shows in which facet values are used. If there are more than one geom_ in your plot then data will contains dataframes for each - in my example there is one dataframe for histogramm and another for vlines.
head(pg$data[[1]])
y count x ndensity ncount density PANEL group ymin ymax
1 0 0 9.791667 0 0 0 1 1 0 0
2 0 0 10.575000 0 0 0 1 1 0 0
3 0 0 11.358333 0 0 0 1 1 0 0
4 0 0 12.141667 0 0 0 1 1 0 0
5 0 0 12.925000 0 0 0 1 1 0 0
6 0 0 13.708333 0 0 0 1 1 0 0
xmin xmax
1 9.40000 10.18333
2 10.18333 10.96667
3 10.96667 11.75000
4 11.75000 12.53333
5 12.53333 13.31667
6 13.31667 14.10000
head(pg$data[[2]])
xintercept PANEL group xend x
1 20 1 1 20 20
2 30 1 1 30 30
3 20 2 2 20 20
4 30 2 2 30 30
5 20 3 3 20 20
6 30 3 3 30 30

layer_data is designed precisely for this :
layer_data(p, 1)
It will give you the data of the first layer, same as ggplot_build(p)$data[[1]].
Its source code is indeed precisely:
function (plot, i = 1L) ggplot_build(plot)$data[[i]]

While the other answers get you close, if you are looking for the actual data that was passed to ggplot(), you can use:
ggplot_build(p)$plot$data
require(tidyverse)
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
pg$plot$data
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Created on 2019-03-04 by the reprex package (v0.2.1)
While that isn't useful for an un-modified data frame, if you are piping through a series of mutate()'s or summarize()'s before you get to the ggplot, this can be useful after the fact to show the data.

Load the purrr package and write my_plot %>% pluck("data")

Related

How do I speed up this specific for loop?

I've looked at other threads and tried to apply it to my code but have had no luck.
CDR3_post_challenge_unique_clonecount$participant_per_cdr3aa <- as.numeric(CDR3_post_challenge_unique_clonecount$cdr3aa)
participant_list <- unique(CDR3_post_challenge_unique_clonecount$cdr3aa)
for (c in participant_list)
{
CDR3_post_challenge_unique_clonecount$participant_per_cdr3aa[CDR3_post_challenge_unique_clonecount$cdr3aa == c] <- length(unique(CDR3_post_challenge_unique_clonecount$PartID[CDR3_post_challenge_unique_clonecount$cdr3aa == c]))
}
Here is a bit of the dataframe:
cdr3aa clonecount PartID
CAAGRAARGGSVPHWFDPF 1 S-1
CAALADSGSQTDAFDIA 1 S-1
CAFHAAYGSQHGLDVW 1 S-1
CAGGLAWLVDDW 1 S-1
CAGRWFFPW 1 S-1
CAGVKNGRGMDVW 1 S-1
I think you can replace the for loop with
CDR3_post_challenge_unique_clonecount$per3 <-
as.integer(
ave(CDR3_post_challenge_unique_clonecount$PartID,
CDR3_post_challenge_unique_clonecount$cdr3aa,
FUN = function(z) length(unique(z)))
)
I'll demonstrate with mtcars, using the follow analogs:
mtcars --> CDR3_post_challenge_unique_clonecount
cyl --> cdr3aa, the categorical variable in which we want to count PartID
drat --> PartID, the thing we want to count (uniquely) within each cdr3aa
mtcars$drat_per_cyl <- ave(mtcars$drat, mtcars$cyl, FUN = function(z) length(unique(z)))
mtcars
# mpg cyl disp hp drat wt qsec vs am gear carb drat_per_cyl
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 10
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 11
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 5
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 11
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 10
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 10
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 5
# Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 5
# Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 11
# Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 11
# Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 11
# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 11
# Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 11
# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 11
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 10
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 10
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 10
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 10
# Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 11
# AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 11
# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 11
# Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 11
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 10
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 10
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 10
# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 11
# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 5
# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 11
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 10
Notes:
ave is a little brain-dead in that the class of the return value is always the same as the class of the first argument. This means that one cannot count unique "character" and expect to get an integer, it is instead returned as a string. It's because of this that I wrap ave in as.integer(.).
ave returns a vector the same length as the input, with values corresponding 1-for-1 (meaning the order is relevant and preserved). In my example of mtcars, this means that it is effectively doing something like this:
ind4 <- which(mtcars$cyl == 4L)
ind4
# [1] 3 8 9 18 19 20 21 26 27 28 32
length(unique(mtcars$drat[ind4]))
# [1] 10
ind6 <- which(mtcars$cyl == 6L)
ind6
# [1] 1 2 4 6 10 11 30
length(unique(mtcars$drat[ind6]))
# [1] 5
### ...
but it will place the return value 10 in the ind4 positions of the return value. For example, because of my ind6, the return value will start with
c(5, 5, .., 5, .., 5, .., .., .., 5, 5, .., .....)
Because of ind4, it will contain
c(.., .., 10, .., .., .., .., 10, 10, .....)
(And same for cyl==8L.)

mutate a variable with curly-curly [duplicate]

This question already has answers here:
Use dynamic name for new column/variable in `dplyr`
(10 answers)
Closed 2 years ago.
I've used curly-curly with group_by and summarise as described in the rlang announcement. But I can't get it to work when mutating a variable in place. What's the best way to do this currently with dplyr?
Say I want to supply an unquoted column name and have it mutated, here's a toy example function that doesn't work:
my_fun <- function(dat, var_name){
dat %>%
mutate({{var_name}} = 1)
}
my_fun(mtcars, cyl)
What should that mutate line be to change any column in mtcars to be a constant?
You need to use the assignment operator (:=) if you want to use the curly-curly to specify a name on the left hand side of an assignment in mutate:
my_fun <- function(dat, var_name){
dat %>%
mutate({{var_name}} := 1)
}
Which allows:
my_fun(mtcars, cyl)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 1 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 1 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 1 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 1 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 1 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 1 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 1 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 1 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 1 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 1 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 1 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 1 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 1 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 1 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 1 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 1 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 1 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 1 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 1 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 1 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 1 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 1 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 1 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 1 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 1 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 1 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 1 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 1 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 1 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 1 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 1 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 1 121.0 109 4.11 2.780 18.60 1 1 4 2

How to search for string within variable labels of a data frame, and return a vector with all these variables in R

I have a data frame with a large amount of math and science related items, and I want all math related variables removed.
Variable names has no consistent naming for neither math nor science, so it's hard to search and select based variable name. However, the variable labels are descriptive of what the variable represents. I essentially want all variables with labels that contain the word "math" removed. I tried the following code:
library(dplyr)
library(Hmisc)
# Sample data frame:
M <- c(1, 2)
S <- c(3, 4)
old_df <- data.frame(M, S)
label(old_df$M) <- "My Mathematics Variable"
label(old_df$S) <- "My Science Variable"
#dplyr syntax:
new_df <- old_df %>% select( -contains(hmisc::label(.) == "MATH" ) )
using the Hmisc::label()-function to retrieve a vector with labels.
Sample code of the label()-function:
> label(old_df)
M S
"My Mathematics Variable" "My Science Variable"
> str(label(old_df))
Named chr [1:2] "My Mathematics Variable" "My Science Variable"
- attr(*, "names")= chr [1:2] "M" "S"
I need a what to search through the label items and find the string "math" within. I tried coerce to a matrix and data frame, but I still can't figure out how to search and retrive the variable names. Any suggestions that will get this to work is welcome.
You mean something like this? (UPDATED to more closely map grepl to your example.)
library(Hmisc)
library(dplyr)
Hmisc::label(mtcars$mpg) <- "Miles per Gallon" # grepl WILL catch this
Hmisc::label(mtcars$hp) <- "Not here" # nope
Hmisc::label(mtcars$qsec) <- "MILES all caps here" # nope unless you ignore_case = TRUE
Hmisc::label(mtcars$drat) <- "later in the label Miles is here" # yepp
mtcars %>% select_if(.predicate = !(grepl("Miles", Hmisc::label(.), ignore.case = TRUE)))
#> cyl disp hp wt qsec vs am gear carb
#> Mazda RX4 6 160.0 110 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 6 160.0 110 2.875 17.02 0 1 4 4
#> Datsun 710 4 108.0 93 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 6 258.0 110 3.215 19.44 1 0 3 1
#> Hornet Sportabout 8 360.0 175 3.440 17.02 0 0 3 2
#> Valiant 6 225.0 105 3.460 20.22 1 0 3 1
#> Duster 360 8 360.0 245 3.570 15.84 0 0 3 4
#> Merc 240D 4 146.7 62 3.190 20.00 1 0 4 2
#> Merc 230 4 140.8 95 3.150 22.90 1 0 4 2
#> Merc 280 6 167.6 123 3.440 18.30 1 0 4 4
#> Merc 280C 6 167.6 123 3.440 18.90 1 0 4 4
#> Merc 450SE 8 275.8 180 4.070 17.40 0 0 3 3
#> Merc 450SL 8 275.8 180 3.730 17.60 0 0 3 3
#> Merc 450SLC 8 275.8 180 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 8 472.0 205 5.250 17.98 0 0 3 4
#> Lincoln Continental 8 460.0 215 5.424 17.82 0 0 3 4
#> Chrysler Imperial 8 440.0 230 5.345 17.42 0 0 3 4
#> Fiat 128 4 78.7 66 2.200 19.47 1 1 4 1
#> Honda Civic 4 75.7 52 1.615 18.52 1 1 4 2
#> Toyota Corolla 4 71.1 65 1.835 19.90 1 1 4 1
#> Toyota Corona 4 120.1 97 2.465 20.01 1 0 3 1
#> Dodge Challenger 8 318.0 150 3.520 16.87 0 0 3 2
#> AMC Javelin 8 304.0 150 3.435 17.30 0 0 3 2
#> Camaro Z28 8 350.0 245 3.840 15.41 0 0 3 4
#> Pontiac Firebird 8 400.0 175 3.845 17.05 0 0 3 2
#> Fiat X1-9 4 79.0 66 1.935 18.90 1 1 4 1
#> Porsche 914-2 4 120.3 91 2.140 16.70 0 1 5 2
#> Lotus Europa 4 95.1 113 1.513 16.90 1 1 5 2
#> Ford Pantera L 8 351.0 264 3.170 14.50 0 1 5 4
#> Ferrari Dino 6 145.0 175 2.770 15.50 0 1 5 6
#> Maserati Bora 8 301.0 335 3.570 14.60 0 1 5 8
#> Volvo 142E 4 121.0 109 2.780 18.60 1 1 4 2

labelling values using autoplot [duplicate]

I have made a plot using ggplot2 geom_histogram from a data frame. See sample below and link to the ggplot histogram Need to label each geom_vline with the factors using a nested ddply function and facet wrap
I now need to make a data frame that contains the summarized data used to generate the ggplot above.
Sector2 Family Year Length
BUN Acroporidae 2010 332.1300496
BUN Poritidae 2011 141.1467966
BUN Acroporidae 2012 127.479
BUN Acroporidae 2013 142.5940556
MUR Faviidae 2010 304.0405
MUR Faviidae 2011 423.152
MUR Pocilloporidae 2012 576.0295
MUR Poritidae 2013 123.8936667
NTH Faviidae 2010 60.494
NTH Faviidae 2011 27.427
NTH Pocilloporidae 2012 270.475
NTH Poritidae 2013 363.4635
To get values actually plotted you can use function ggplot_build() where argument is your plot.
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count). If you use facets then column PANEL shows in which facet values are used. If there are more than one geom_ in your plot then data will contains dataframes for each - in my example there is one dataframe for histogramm and another for vlines.
head(pg$data[[1]])
y count x ndensity ncount density PANEL group ymin ymax
1 0 0 9.791667 0 0 0 1 1 0 0
2 0 0 10.575000 0 0 0 1 1 0 0
3 0 0 11.358333 0 0 0 1 1 0 0
4 0 0 12.141667 0 0 0 1 1 0 0
5 0 0 12.925000 0 0 0 1 1 0 0
6 0 0 13.708333 0 0 0 1 1 0 0
xmin xmax
1 9.40000 10.18333
2 10.18333 10.96667
3 10.96667 11.75000
4 11.75000 12.53333
5 12.53333 13.31667
6 13.31667 14.10000
head(pg$data[[2]])
xintercept PANEL group xend x
1 20 1 1 20 20
2 30 1 1 30 30
3 20 2 2 20 20
4 30 2 2 30 30
5 20 3 3 20 20
6 30 3 3 30 30
layer_data is designed precisely for this :
layer_data(p, 1)
It will give you the data of the first layer, same as ggplot_build(p)$data[[1]].
Its source code is indeed precisely:
function (plot, i = 1L) ggplot_build(plot)$data[[i]]
While the other answers get you close, if you are looking for the actual data that was passed to ggplot(), you can use:
ggplot_build(p)$plot$data
require(tidyverse)
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
pg$plot$data
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Created on 2019-03-04 by the reprex package (v0.2.1)
While that isn't useful for an un-modified data frame, if you are piping through a series of mutate()'s or summarize()'s before you get to the ggplot, this can be useful after the fact to show the data.
Load the purrr package and write my_plot %>% pluck("data")

Extract data from a ggplot

I have made a plot using ggplot2 geom_histogram from a data frame. See sample below and link to the ggplot histogram Need to label each geom_vline with the factors using a nested ddply function and facet wrap
I now need to make a data frame that contains the summarized data used to generate the ggplot above.
Sector2 Family Year Length
BUN Acroporidae 2010 332.1300496
BUN Poritidae 2011 141.1467966
BUN Acroporidae 2012 127.479
BUN Acroporidae 2013 142.5940556
MUR Faviidae 2010 304.0405
MUR Faviidae 2011 423.152
MUR Pocilloporidae 2012 576.0295
MUR Poritidae 2013 123.8936667
NTH Faviidae 2010 60.494
NTH Faviidae 2011 27.427
NTH Pocilloporidae 2012 270.475
NTH Poritidae 2013 363.4635
To get values actually plotted you can use function ggplot_build() where argument is your plot.
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count). If you use facets then column PANEL shows in which facet values are used. If there are more than one geom_ in your plot then data will contains dataframes for each - in my example there is one dataframe for histogramm and another for vlines.
head(pg$data[[1]])
y count x ndensity ncount density PANEL group ymin ymax
1 0 0 9.791667 0 0 0 1 1 0 0
2 0 0 10.575000 0 0 0 1 1 0 0
3 0 0 11.358333 0 0 0 1 1 0 0
4 0 0 12.141667 0 0 0 1 1 0 0
5 0 0 12.925000 0 0 0 1 1 0 0
6 0 0 13.708333 0 0 0 1 1 0 0
xmin xmax
1 9.40000 10.18333
2 10.18333 10.96667
3 10.96667 11.75000
4 11.75000 12.53333
5 12.53333 13.31667
6 13.31667 14.10000
head(pg$data[[2]])
xintercept PANEL group xend x
1 20 1 1 20 20
2 30 1 1 30 30
3 20 2 2 20 20
4 30 2 2 30 30
5 20 3 3 20 20
6 30 3 3 30 30
layer_data is designed precisely for this :
layer_data(p, 1)
It will give you the data of the first layer, same as ggplot_build(p)$data[[1]].
Its source code is indeed precisely:
function (plot, i = 1L) ggplot_build(plot)$data[[i]]
While the other answers get you close, if you are looking for the actual data that was passed to ggplot(), you can use:
ggplot_build(p)$plot$data
require(tidyverse)
p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))
pg <- ggplot_build(p)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
pg$plot$data
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Created on 2019-03-04 by the reprex package (v0.2.1)
While that isn't useful for an un-modified data frame, if you are piping through a series of mutate()'s or summarize()'s before you get to the ggplot, this can be useful after the fact to show the data.
Load the purrr package and write my_plot %>% pluck("data")

Resources