Restricting the x being counted in a historgram

Restricting the x being counted in a historgram - r

library(alr4)
par(mfrow = c(2,2))
ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
I would like to create 4 histograms from the data set walleye. I would like the histograms to be for the length of the walleye. The for histograms should each have their own age for counting. I would like to restrict the ages from 1 to 4. How can I do that with ggplot?

If I understand what you are trying to do correctly, this should help:
library(alr4)
library(ggplot2)
ggplot(subset(walleye, age<5), aes(x=length)) + geom_histogram() + facet_grid(~age)
This way you are only plotting the subset of the data where age is 1-4, and you are actually plotting histograms of length.

You could try this too (adding another line of code on top of your code):
library(alr4)
library(ggplot2)
p <- ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
p %+% subset(walleye, age %in% 1:4)

Related

Filtering data into subplots using ggplot2

I have a dataset of three variables: year, age group and result. There are 9 different age groups. What I am trying to do is to create a 3x3 plot where I have 9 subplots with geom_line() plots of each subgroups. So far I ended up with a 3x3 plot with all the results in all of the plots which was my step 1. However, I can't seem to find a way to do the last little bit to create my plot. Help is appreciated.
Here is my code so far:
# n is a list of unique values aka age groups in the Data1
n <- unique(Data1$age)
# Preparation for plotting all subplots
plot_lst <- vector("list", length = length(n))
for (i in 1:length(n)) {
g <- ggplot(Data1, aes(x=year, y=data, color=age)) +
geom_line()
plot_lst[[i]] <- g
}
# Plotting all subplots into one
cowplot::plot_grid(plotlist = plot_lst, ncol = 3)
I know it is currently missing the filtering which I mainly tried to do within the geom_line() but I wasn't able to find a working solution.
Found a solution with:
ggplot(Data1, aes(year, data)) +
geom_line() +
geom_point() +
facet_wrap(~ age)

Plot point on ggplot2 smoothing regression on vline intersection

I want to create a (time-series) plot out of 40 million data points in order to show two regression lines with two specific events on each of it (first occurrence of an optimum in time-series).
Currently, I draw the regression lines and add a geom_vline to it to indicate the event.
As I want to be independent from colours in the plot, it would be beneficial if I could just plot the marker geom_vline as a point on the regression line.
Do you have any idea how to solve this using ggplot2?
My current approach is this here (replaced data points with test data):
library(ggplot2)
# Generate data
m1 <- "method 1"
m2 <- "method 2"
data1 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m1), 100))
data2 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m2), 100))
df <- rbind(data1, data2)
rm(data1, data2)
# Calculate first minima for each Type
m1_intercept <- df[which(df$Type == m1), ][which.min(df[which(df$Type == m1), ]$Value),]
m2_intercept <- df[which(df$Type == m2), ][which.min(df[which(df$Type == m2), ]$Value),]
# Plot regression and vertical lines
p1 <- ggplot(df, aes(x=Time, y=Value, group=Type, colour=Type), linetype=Type) +
geom_smooth(se=F) +
geom_vline(aes(xintercept=m1_intercept$Time, linetype=m1_intercept$Type)) +
geom_vline(aes(xintercept=m2_intercept$Time, linetype=m2_intercept$Type)) +
scale_linetype_manual(name="", values=c("dotted", "dashed")) +
guides(colour=guide_legend(title="Regression"), linetype=guide_legend(title="First occurrence of optimum")) +
theme(legend.position="bottom")
ggsave("regression.png", plot=p1, height=5, width=7)
which generates this plot:
My desired plot would be something like this:
So my questions are
Does it make sense to indicate a minimum value on a regression line? The values y-axis position would be in fact wrong but just to indicate the timepoint?
If yes, how can I achieve such a behaviour?
If no, what would you think could be better?
Thank you very much in advance!
Robin

If you first run your ggplot() call with only geom_smooth(), you can access plotted values through ggplot_build(), which we then can use to plot points on the two fitted lines. Example:
# Create initial plot
p1<-ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F)
# Now we can access the fitted values
smooths <- ggplot_build(p1)$data[[1]]
smooths_1 <- smooths[smooths$group==1,] # First group (method 1)
smooths_2 <- smooths[smooths$group==2,] # Second group (method 2)
# Then we find the closest plotted values to the minima
smooth_1_x <- smooths_1$x[which.min(abs(smooths_1$x - m1_intercept$Time))]
smooth_2_x <- smooths_2$x[which.min(abs(smooths_2$x - m2_intercept$Time))]
# Subset the previously defined datasets for respective closest values
point_data1 <- smooths_1[smooths_1$x==smooth_1_x,]
point_data2 <- smooths_1[smooths_2$x==smooth_2_x,]
Now we use point_data1 and point_data2 to place the points on your plot:
ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F) +
geom_point(data=point_data1, aes(x=x, y=y), colour = "red",size = 5) +
geom_point(data=point_data2, aes(x=x, y=y), colour = "red", size = 5)
To reproduce this plot, you can use set.seed(42) for your data generation step.

Removing Empty Facet Categories

Am having trouble making my faceted plot only display data, as opposed to displaying facets with no data.
The following code:
p<- ggplot(spad.data, aes(x=Day, y=Mean.Spad, color=Inoc))+
geom_point()
p + facet_grid(N ~ X.CO2.)
Gives the following graphic:
I have played around with it for a while but can't seem to figure out a solution.
Dataframe viewable here: https://docs.google.com/spreadsheets/d/11ZiDVRAp6qDcOsCkHM9zdKCsiaztApttJIg1TOyIypo/edit?usp=sharing
Reproducible Example viewable here: https://docs.google.com/document/d/1eTp0HCgZ4KX0Qavgd2mTGETeQAForETFWdIzechTphY/edit?usp=sharing

Your issue lies in the missing observations for your x- and y variables. Those don't influence the creation of facets, that is only influenced by the levels of faceting variables present in the data. Here is an illustration using sample data:
#generate some data
nobs=100
set.seed(123)
dat <- data.frame(G1=sample(LETTERS[1:3],nobs, T),
G2 = sample(LETTERS[1:3], nobs, T),
x=rnorm(nobs),
y=rnorm(nobs))
#introduce some missings in one group
dat$x[dat$G1=="C"] <- NA
#attempt to plot
p1 <- ggplot(dat, aes(x=x,y=y)) + facet_grid(G1~G2) + geom_point()
p1 #facets are generated according to the present levels of the grouping factors
#possible solution: remove the missing data before plotting
p2 <- ggplot(dat[complete.cases(dat),], aes(x=x, y=y)) + facet_grid(G1 ~G2) + geom_point()
p2

Get data associated to ggplot + stat_ecdf()

I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table?
Please have a look to the following reproducible example
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf() # building of the cumulated chart
p
attributes(p) # chart attributes
p$data # data is iris dataset, not the serie used for displaying the chart

As #krfurlong showed me in this question, the layer_data function in ggplot2 can get you exactly what you're looking for without the need to recreate the data.
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf()
p.data <- layer_data(p)
The first column in p.data, "y", contains the ecdf values. "x" is the Sepal.Length values on the x-axis in your plot.

We can recreate the data:
#Recreate ecdf data
dat_ecdf <-
data.frame(x=unique(iris$Sepal.Length),
y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
#rescale y to 0,1 range
dat_ecdf$y <-
scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))
Below 2 plots should look the same:
#plot using new data
ggplot(dat_ecdf,aes(x,y)) +
geom_step() +
xlim(4,8)
#plot with built-in stat_ecdf
ggplot(iris, aes_string(x = "Sepal.Length")) +
stat_ecdf() +
xlim(4,8)

Coloring density plot in ggplot2

When I use following code to generate a density plot:
require(ggplot2)
set.seed(seed=10)
n <- 10000
s.data <- data.frame(score = rnorm(n,500,100),
gender = sample(c("Male","Female","No Response"),size=n,replace=T,prob=c(.4,.55,.05)),
major = sample(c("A","B","C","D"),size=n,replace=T,prob=c(.02,.25,.05,.68)))
ggplot(s.data, aes(major,..density..,fill=major,group=1)) +
geom_histogram() + facet_wrap(~ gender)
I cannot distinguish between categories of "major" by color.
What I want to get is density plot similar to this frequency plot in the sense of colors and legend:
ggplot(s.data, aes(major,fill=major)) +
geom_histogram() + facet_wrap(~ gender)
This question is following my question (here) which is already answered here.

You can still try frequency plot with facet parameter scale="free_y":
ggplot(s.data, aes(major,..count..,fill=major)) +
geom_histogram() + facet_wrap(~ gender, scale="free_y")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Restricting the x being counted in a historgram - r

You could try this too (adding another line of code on top of your code): library(alr4) library(ggplot2) p <- ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age) p %+% subset(walleye, age %in% 1:4)

Related

Filtering data into subplots using ggplot2

Plot point on ggplot2 smoothing regression on vline intersection

Removing Empty Facet Categories

Get data associated to ggplot + stat_ecdf()

Coloring density plot in ggplot2

Categories

Resources