R histogram with multiple populations

R histogram with multiple populations - r

I'm interested in creating a histogram in R that will contain two (or more) population on top of each other, meaning - I don't want a two histograms sharing the same graph but a bar containing two colors or more.
Found the image below - this is what I want to accomplish.
Any ideas?

That is actually the annoying default in ggplot2:
library(ggplot2)
ggplot(iris, aes(x=Sepal.Length, fill=Species)) +
geom_histogram()

Here is another option without using ggplot:
#plot the entire data set (everything)
hist(everything, breaks=c(1:10), col="Red")
#then everything except one sub group (1 in this case)
hist(everything[everything!=1], breaks=c(1:10), col="Blue", add=TRUE)
#then everything except two sub groups (1&2 in this case)
hist(everything[everything!=1 && everything!=2], breaks=c(1:10), col="Green", add=TRUE)

# 1) Define the breaks to use on your Histogram
xrange = seq(-3,3,0.1)
# 2) Have your vectors ready
v1 = rnorm(n=300,mean=1.1,sd=1.5)
v2 = rnorm(n=350,mean=1.3,sd=1.5)
v3 = rnorm(n=380,mean=1.2,sd=1.9)
# 3) subset your vectors to be inside xrange
v1 = subset(v1,v1<=max(xrange) & v1>=min(xrange))
v2 = subset(v2,v2<=max(xrange) & v2>=min(xrange))
v3 = subset(v3,v3<=max(xrange) & v3>=min(xrange))
# 4) Now, use hist to compute the counts per interval
hv1 = hist(v1,breaks=xrange,plot=F)$counts
hv2 = hist(v2,breaks=xrange,plot=F)$counts
hv3 = hist(v3,breaks=xrange,plot=F)$counts
# 5) Finally, Generate a Frequency BarPlot that is equivalent to a Stacked histogram
maintitle = "Stacked Histogram Example using Barplot"
barplot(rbind(hv1,hv2,hv3),col=2:4,names.arg=xrange[-1],space=0,las=1,main=maintitle)
# 6) You can also generate a Density Barplot
Total = hv1 + hv2 + hv3
barplot(rbind(hv1/Total,hv2/Total,hv3/Total),col=2:4,names.arg=xrange[-1],space=0,las=1)

Related

Select data and name when pointing it chart with ggplotly

I did everything in ggplot, and it was everything working well. Now I need it to show data when I point a datapoint. In this example, the model (to identify point), and the disp and wt ( data in axis).
For this I added the shape (same shape, I do not actually want different shapes) to model data. and asked ggplot not to show shape in legend. Then I convert to plotly. I succeeded in showing the data when I point the circles, but now I am having problems with the legend showing colors and shapes separated with a comma...
I did not wanted to make it again from scrach in plotly as I have no experience in plotly, and this is part of a much larger shiny project, where the chart adjust automatically the axis scales and adds trend lines the the chart among other things (I did not include for simplicity) that I do not know how to do it in plotly.
Many thanks in advance. I have tried a million ways for a couple of days now, and did not succeed.
# choose mtcars data and add rowname as column as I want to link it to shapes in ggplot
data1 <- mtcars
data1$model <- rownames(mtcars)
# I turn cyl data to character as when charting it showed (Error: Continuous value supplied to discrete scale)
data1$cyl <- as.character(data1$cyl)
# linking colors with cylinders and shapes with models
ccolor <- c("#E57373","purple","green")
cylin <- c(6,4,8)
# I actually do not want shapes to be different, only want to show data of model when I point the data point.
models <- data1$model
sshapes <- rep(16,length(models))
# I am going to chart, do not want legend to show shape
graff <- ggplot(data1,aes(x=disp, y=wt,shape=model,col=cyl)) +
geom_point(size = 1) +
ylab ("eje y") + xlab('eje x') +
scale_color_manual(values= ccolor, breaks= cylin)+
scale_shape_manual(values = sshapes, breaks = models)+
guides(shape='none') # do not want shapes to show in legend
graff
chart is fine, but when converting to ggplotly, I am having trouble with the legend
# chart is fine, but when converting to ggplotly, I am having trouble with the legend
graffPP <- ggplotly(graff)
graffPP
legend is not the same as it was in ggplot
I succeeded in showing the model and data from axis when I point a datapoint in the chart... but now I am having problems with the legend....

To the best of my knowledge there is no easy out-of-the box solution to achieve your desired result.
Using pure plotly you could achieve your result by assigning legendgroups which TBMK is not available using ggplotly. However, you could assign the legend groups manually by manipulating the plotly object returned by ggplotly.
Adapting my answer on this post to your case you could achieve your desired result like so:
library(plotly)
p <- ggplot(data1, aes(x = disp, y = wt, shape = model, col = cyl)) +
geom_point(size = 1) +
ylab("eje y") +
xlab("eje x") +
scale_color_manual(values = ccolor, breaks = cylin) +
scale_shape_manual(values = sshapes, breaks = models) +
guides(shape = "none")
gp <- ggplotly(p = p)
# Get the names of the legend entries
df <- data.frame(id = seq_along(gp$x$data), legend_entries = unlist(lapply(gp$x$data, `[[`, "name")))
# Extract the group identifier, i.e. the number of cylinders from the legend entries
df$legend_group <- gsub("^\\((\\d+).*?\\)", "\\1", df$legend_entries)
# Add an indicator for the first entry per group
df$is_first <- !duplicated(df$legend_group)
for (i in df$id) {
# Is the layer the first entry of the group?
is_first <- df$is_first[[i]]
# Assign the group identifier to the name and legendgroup arguments
gp$x$data[[i]]$name <- df$legend_group[[i]]
gp$x$data[[i]]$legendgroup <- gp$x$data[[i]]$name
# Show the legend only for the first layer of the group
if (!is_first) gp$x$data[[i]]$showlegend <- FALSE
}
gp

Combining variables 2 x-variables to one

I would like to combine x-variables with lines since they were measured not at interrelated time periods with different colours of one mainline to denote the difference. Any suggestions to the following script?
plot(dat$days,dat$wc_10_1,
main="Rollesbroich-1, 0.1 m",
xlab="Days",
ylab=expression( "water content (cm"^3 / "cm)"^3),
type="l",
col="blue",
pch=16)
lines(dat$days,dat$m_wc_10_1, col="red",pch=16, type="l")

This is probably most easily done using ggplot2. Below a reproducible example of what I think you are trying to achieve assuming that both x-variables are continuous.
dat = data.frame(days=1:10,
wc_10_1=rnorm(10),
m_wc_10_1=rnorm(10,5,5))
# ggplot2 ----
library(ggplot2)
ggplot(data = dat, aes(x=days, y=wc_10_1, colour=m_wc_10_1)) +
geom_line() +
labs(
x="Days",
y=expression(cm^{3} / cm)
)
Unsure how to supply a continuous scale to lines using base plot, but I'm sure it's possible. A workaround would be to use type="b":
# base ----
my_colour = "blue"
x2_norm = dat$m_wc_10_1 - min(dat$m_wc_10_1)
x2_norm = x2_norm/max(x2_norm)
my_colour_scale = scales::alpha(my_colour, x2_norm)
plot(x=dat$days,
y=dat$wc_10_1,
col=my_colour_scale,
t="b",
pch=16)

How to plot deviation from mean

In R I have created a simple matrix of one column yielding a list of numbers with a set mean and a given standard deviation.
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
I now would like to plot how these numbers differ from the mean. I can do this in Excel as shown below:
But I would like to use ggplot2 to create a graph in R. in the Excel graph I have cheated by using a line graph but if I could do this as columns it would be better. I have tried using a scatter plot but I cant work out how to turn this into deviations from the mean.

Perhaps you want:
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(100,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
par(las=1,bty="l") ## cosmetic preferences
plot(x, r, col = "green", pch=16) ## draws the points
## if you don't want points at all, use
## plot(x, r, type="n")
## to set up the axes without drawing anything inside them
segments(x0=x, y0=4, x1=x, y1=r, col="green") ## connects them to the mean line
abline(h=4)
If you were plotting around 0 you could do this automatically with type="h":
plot(x,r-4,type="h", col="green")
To do this in ggplot2:
library("ggplot2")
theme_set(theme_bw()) ## my cosmetic preferences
ggplot(data.frame(x,r))+
geom_segment(aes(x=x,xend=x,y=mean(r),yend=r),colour="green")+
geom_hline(yintercept=mean(r))

Ben's answer using ggplot2 works great, but if you don't want to manually adjust the line width, you could do this:
# Half of Ben's data
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(50,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
# New variable for the difference between each value and the mean
value <- r - mean(r)
ggplot(data.frame(x, value)) +
# geom_bar anchors each bar at zero (which is the mean minus the mean)
geom_bar(aes(x, value), stat = "identity"
, position = "dodge", fill = "green") +
# but you can change the y-axis labels with a function, to add the mean back on
scale_y_continuous(labels = function(x) {x + mean(r)})

in base R it's quite simple, just do
plot(r, col = "green", type = "l")
abline(4, 0)
You also tagged ggplot2, so in that case it will be a bit more complicated, because ggplot requires creating a data frame and then melting it.
library(ggplot2)
library(reshape2)
df <- melt(data.frame(x = 1:100, mean = 4, r = r), 1)
ggplot(df, aes(x, value, color = variable)) +
geom_line()

How to convert a bar histogram into a line histogram in R

I've seen many examples of a density plot but the density plot's y-axis is the probability. What I am looking for a is a line plot (like a density plot) but the y-axis should contain counts (like a histogram).
I can do this in excel where I manually make the bins and the frequencies and make a bar histogram and then I can change the chart type to a line - but can't find anything similar in R.
I've checked out both base and ggplot2; yet can't seem to find an answer. I understand that histograms are meant to be bars but I think representing them as a continuous line makes more visual sense.

Using default R graphics (i.e. without installing ggplot) you can do the following, which might also make what the density function does a bit clearer:
# Generate some data
data=rnorm(1000)
# Get the density estimate
dens=density(data)
# Plot y-values scaled by number of observations against x values
plot(dens$x,length(data)*dens$y,type="l",xlab="Value",ylab="Count estimate")

This is an old question, but I thought it might be helpful to post a solution that specifically addresses your question.
In ggplot2, you can plot a histogram and display the count with bars using:
ggplot(data) +
geom_histogram()
You can also plot a histogram and display the count with lines using a frequency polygon:
ggplot(data) +
geom_freqpoly()
For more info --
ggplot2 reference

To adapt the example on the ?stat_density help page:
m <- ggplot(movies, aes(x = rating))
# Standard density plot.
m + geom_density()
# Density plot with y-axis scaled to counts.
m + geom_density(aes(y = ..count..))

Although this is old, I thought the following might be useful.
Let's say you have a data set of 10,000 points, and you believe they belong to a certain distribution, and you would like to plot the histogram of the actual data and the line of the probability density of the ideal distribution on top of it.
noise <- 2
#
# the noise is tagged onto the end using runif
# just do demo issues w/real data and fitting
# the subtraction causes the data to have some
# negative values, which must be addressed in
# the fit later on
#
noisylognorm <- rlnorm(10000,
mean = 0.25,
sd = 1) +
(noise * runif(10000) - noise / 10)
#
# using package fitdistrplus
#
# subset is used to remove the negative values
# as the lognormal distribution needs positive only
#
fitlnorm <- fitdist(subset(noisylognorm,
noisylognorm > 0),
"lnorm")
fitlnorm_density <- density(rlnorm(10000,
mean = fitlnorm$estimate[1],
sd = fitlnorm$estimate[2]))
hist(subset(noisylognorm,
noisylognorm < 25),
breaks = seq(-1, 25, 0.5),
col = "lightblue",
xlim = c(0, 25),
xlab = "value",
ylab = "frequency",
main = paste0("Log Normal Distribution\n",
"noise = ", noise))
lines(fitlnorm_density$x,
10000 * fitlnorm_density$y * 0.5,
type="l",
col = "red")
Note the * 0.5 in the lines function. As far as I can tell, this is necessary to account for the width of the hist() bars.

There is a very simple and fast way for count data.
First let's generate some dummy count data:
my.count.data = rpois(n = 10000, lambda = 3)
And then the plotting command (assuming you have called library(magrittr)):
my.count.data %>% table %>% plot

R parallel coordinate plot with fixed scale on X-axis, no matter how large the plot becomes

I am trying to build a parallel coordinate diagram in R for showing the difference in ranking in different age groups. And I want to have a fixed scale on the Y axis for showing the values.
Here is a PC plot :
The goal is to see the slopes of the lines really well. So if I have value 1 that is bound with the value 1000, I want to see the line going aaall the way down steeply.
In R so far, if I have values that are too big, my plot is all squished so everything fits and it's hard to visualize anything.
My code for drawing the parallel coordinate plot is the following so far:
pc_18_34 <- read.table("parCoordData_18_24_25_34.csv", header=FALSE, sep="\t")
#name columns of data frame
colnames(pc_18_34) = c("18-25","25-34")
#build the parallel coordinate plot
# doc : http://docs.ggplot2.org/current/geom_path.html
group <- rep(c("Top 10", "Top 10-29", "Top 30-49"), each = 18)
df <- data.frame(id = seq_along(group), group, pc_18_34[,1], pc_18_34[,2])
colnames(df)[3] = "18-25"
colnames(df)[4] = "25-34"
library(reshape2) # for melt
dfm <- melt(df, id.var = c("id", "group"))
dfm[order(dfm$group,dfm$ArtistRank,decreasing=TRUE),]
colnames(dfm)[3] = "AgeGroup"
colnames(dfm)[4] = "ArtistRank"
ggplot(dfm, aes(x=AgeGroup, y=ArtistRank, group = id, colour = group), main="Tops across age groups")+ geom_path(alpha = 0.5, size=1) + geom_path(aes(color=group))
I have looked into how to get the scales to change in ggplot, using libraries like scales but when I had a layer of scale, the diagram doesn't even show up anymore.
Any thoughts on how to make to use a fixed scale (say difference of 1 in rank shown as 5px in the plot), even if it means that the plot is very tall ?
Thaanks !! :)

You can set the panel height to an absolute size based on the number of axis breaks. Note that the device won't scale automatically, so you'll have to adjust it manually for your plot to fit well.
library(ggplot2)
library(gtable)
p <- ggplot(Loblolly, aes(height, factor(age))) +
geom_point()
gb <- ggplot_build(p)
gt <- ggplot_gtable(gb)
n <- length(gb$panel$ranges[[1]]$y.major_source)
# locate the panel in the gtable layout
panel <- gt$layout$t[grepl("panel", gt$layout$name)]
# assign new height to the panels, based on the number of breaks
gt$heights[panel] <- list(unit(n*25,"pt"))
grid.newpage()
grid.draw(gt)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R histogram with multiple populations - r

I'm interested in creating a histogram in R that will contain two (or more) population on top of each other, meaning - I don't want a two histograms sharing the same graph but a bar containing two colors or more. Found the image below - this is what I want to accomplish. Any ideas?

That is actually the annoying default in ggplot2: library(ggplot2) ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_histogram()

Related

Select data and name when pointing it chart with ggplotly

Combining variables 2 x-variables to one

How to plot deviation from mean

How to convert a bar histogram into a line histogram in R

R parallel coordinate plot with fixed scale on X-axis, no matter how large the plot becomes

Categories

Resources