I have to find out the cumulative frequency, converted to percentage, of a continuous variable by factor.
For example:
data <- data.frame(n = sample(1:12),
d = seq(10, 120, by = 10),
Site = rep(c("FirstSite", "SecondSite"), 6),
Plot = rep(c("Plot1", "Plot1", "Plot2", "Plot2"), 3)
)
data <- with(data, data[order(Site,Plot),])
data <- transform(data, G = ((pi * (d/2)^2) * n) / 10000)
data
n d Site Plot G
1 7 10 FirstSite Plot1 0.05497787
5 9 50 FirstSite Plot1 1.76714587
9 12 90 FirstSite Plot1 7.63407015
3 10 30 FirstSite Plot2 0.70685835
7 5 70 FirstSite Plot2 1.92422550
11 1 110 FirstSite Plot2 0.95033178
2 3 20 SecondSite Plot1 0.09424778
6 8 60 SecondSite Plot1 2.26194671
10 6 100 SecondSite Plot1 4.71238898
4 4 40 SecondSite Plot2 0.50265482
8 2 80 SecondSite Plot2 1.00530965
12 11 120 SecondSite Plot2 12.44070691
I need the cumulaive frequency of column G by factors Plot~Sitein order to plot a geom_step ggplot of G against d for each plot and site.
I have achieved to compute cumulative sum of G by factor by:
data.ss <- by(data[, "G"], data[,c("Plot", "Site")], function(x) cumsum(x))
# Gtot
(data.ss.tot <- sapply(ss, max))
[1] 9.456194 3.581416 7.068583 13.948671
Now I need to express each Plot G in the range [0..1] where 1 is Gtot for each Plot. I imagine I should divide G by its Plot Gtot, then apply a new cumsum to it. How to do it?
Please note that I have to plot this cumulative frequency against d not G itself, so it is not a proper ecdf.
Thank you.
I usually use ddply and transform to do this type of thing:
> data = ddply(data, c('Site', 'Plot'), transform, Gsum=cumsum(G), Gtot=sum(G))
> qplot(x=d, y=Gsum/Gtot, facets=Plot~Site, geom='step', data=data)
Related
I'm trying to understand the default behavior of ggplot2::facet_wrap(), in terms of how the panel layout is decided as the number of facets increases.
I've read the ?facet_wrap help file, and also googled this topic with limited success. In one SO post, facet_wrap() was said to "return a symmetrical matrix of plots", but I did not find anything that explained what exactly the default behavior would be.
So next I made a series of plots which had increasing numbers of facets (code shown further down).
The pattern in the image makes it seem like facet_wrap() tries to "make a square"...
Questions
Is that correct? Does facet_wrap try to render the facet
panels so in totality they are most like a square, in terms of the
number of elements in the rows and columns?
If not, what is it actually doing? Do graphical parameters factor in?
Code that made the plot
# load libraries
library(ggplot2)
library(ggpubr)
# plotting function
facetPlots <- function(facets, groups = 8){
# sample data
df <- data.frame(Group = sample(LETTERS[1:groups], 1000, replace = T),
Value = sample(1:10000, 1000, replace = T),
Facet = factor(sample(1:facets, 1000, replace = T)))
# get means
df <- aggregate(list(Value = df$Value),
list(Group = df$Group, Facet = df$Facet), mean)
# plot
p1 <- ggplot(df, aes(x= Group, y= Value, fill = Group))+
geom_bar(stat="identity", show.legend = FALSE)+
facet_wrap(. ~ Facet) +
theme_bw()+
theme(strip.text.x = element_text(size = 6,
margin = margin(.1, 0, .1, 0, "cm")),
axis.text.x=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.title.y=element_blank(),
plot.margin = unit(c(3,3,3,3), "pt"))
p1
}
# apply function to list
plot_list <- lapply(c(1:25), facetPlots)
# unify into single plot
plot <- ggpubr::ggarrange(plotlist = plot_list)
Here is how the default number of rows and columns are calculated:
ncol <- ceiling(sqrt(n))
nrow <- ceiling(n/ncol)
Apparently, facet_wrap tends to prefer wider grids, since "most displays are roughly rectangular" (according to the documentation). Hence, the number of columns would be greater than or equal to the number of rows.
For your example:
n <- c(1:25)
ncol <- ceiling(sqrt(n))
nrow <- ceiling(n/ncol)
data.frame(n, ncol, nrow)
Here are the computed numbers of rows/cols:
# n ncol nrow
# 1 1 1
# 2 2 1
# 3 2 2
# 4 2 2
# 5 3 2
# 6 3 2
# 7 3 3
# 8 3 3
# 9 3 3
# 10 4 3
# 11 4 3
# 12 4 3
# 13 4 4
# 14 4 4
# 15 4 4
# 16 4 4
# 17 5 4
# 18 5 4
# 19 5 4
# 20 5 4
# 21 5 5
# 22 5 5
# 23 5 5
# 24 5 5
# 25 5 5
I woould like to display a histogram with the allocation of school notes.
The dataframe looks like:
> print(xls)
# A tibble: 103 x 2
X__1 X__2
<dbl> <chr>
1 3 w
2 1 m
3 2 m
4 1 m
5 1 w
6 0 m
7 3 m
8 1 w
9 0 m
10 5 m
I create the histogram with:
hist(xls$X__1, main='Notenverteilung', xlab='Note (0 = keine Beurteilung)', ylab='Anzahl')
It looks like:
Why are there spaces between 1,2,3 but not between 0 & 1?
Thanks, BR Bernd
Use ggplot2 for that, and your bars will be aligned
library(ggplot2)
ggplot(xls, aes(x = X__1)) + geom_histogram(binwidth = 1)
You can try
barplot(table(xls$X__1))
or try
h <- hist(xls$X__1, xaxt = "n", breaks = seq(min(xls$X__1), max(xls$X__1)))
axis(side=1, at=h$mids, labels=seq(min(xls$X__1), max(xls$X__1))[-1])
and using ggplot
ggplot(xls, aes(X__1)) +
geom_histogram(binwidth = 1, color=2) +
scale_x_continuous(breaks = seq(min(xls$X__1), max(xls$X__1)))
I want to draw a heatmap, but the size of units on the x (and y) Axis should vary. Here an example code:
users = rep(1:3,3)
Inst = c(rep("A",3),rep("B",3),rep("C",3))
dens = rnorm(9)
n_inst = c(3,3,3,2,2,2,1,1,1)
df <- data.frame( users, Inst, dens, n_inst )
1 1 A 1.2521487 3
2 2 A -0.1013088 3
3 3 A 1.5770535 3
4 1 B 1.1093957 2
5 2 B 1.1059166 2
6 3 B 0.6884662 2
7 1 C -0.3864710 1
8 2 C -1.0216373 1
9 3 C 0.4500778 1
z <- ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens))
z + scale_x_discrete(breaks = n_inst)
So this draws a heatmap, but all units of Inst have the same size. I want A to be 3 times the width of C and B two times the width of C. So I want n_inst to give the width of units.
I tried scale_discret, but that doesn't do it
Thank you in advance.
You can try this:
ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens, width=n_inst))
I am trying to adjust the colour scale of a geom_tile plot.
A short version of my data (in data.frame format) is:
mydat <-
Sc K n minC
A 2 1 NA
A 2 2 37.453023
A 2 3 23.768316
A 2 4 17.628376
A 3 1 NA
A 3 2 12.693124
A 3 3 8.884226
A 3 4 7.436250
A 10 1 2.128121
A 10 2 2.116539
A 10 3 2.737923
A 10 4 3.509773
A 20 1 1.104592
A 20 2 1.840195
A 20 3 2.717198
A 20 4 3.616501
B 2 1 NA
B 2 2 25.090085
B 2 3 15.924186
B 2 4 11.811022
B 3 1 NA
B 3 2 8.827183
B 3 3 6.179484
B 3 4 5.175331
B 10 1 2.096934
B 10 2 2.064984
B 10 3 2.662373
B 10 4 3.407246
B 20 1 1.096871
B 20 2 1.802418
B 20 3 2.649153
B 20 4 3.517776
My code to prepare the data to plot is the following:
mydat$Sc <- factor(mydat$Sc, levels =c("A", "B"))
mydat$K <- factor(mydat$K, levels =c("2", "3","10","20"))
mydat.m <- melt(pmydat,id.vars=c("Sc","K","n"), measure.vars=c("minC"))
I want to plot with geom_tile the value of minC with K and n as axis and different facets for Sc with the following:
mydat.m.p <- ggplot(mydat.m, aes(x=n, y=K))
mydat.m.p +
geom_tile(data=mydat.m, aes(fill=value)) +
scale_fill_gradient(low="palegreen", high="lightcoral") +
facet_wrap(~ Sc, ncol=2)
This gives me a plot for each Sc factor. However, the colour scale does not reflect want I want to portray, because a few high values making low values all equal.
I want to adjust to a relevant scale in 4 breaks, i.e., 1-2, 2-3, 3-5, >5.
Looking at other questions there was a suggestion to use the cut function and scale fill manual as:
mydat.m$value1 <- cut(mydat.m$value, breaks = c(1:5, Inf), right = FALSE)
Then use the following in geom_tile:
scale_fill_manual(breaks = c("\[1,2)", "\[2, 3)", "\[3, 5)", "\[5, Inf)"),
values = c("darkgreen", "palegreen", "lightcoral", "red"))
However, I am not sure how this can be applied to a data.frame with other factors and in long format.
You're almost there. Simply use cut before melting:
mydat$minC.cut <- cut(mydat$minC, breaks = c(1:3, 5, Inf), right = FALSE)
mydat.cut <- melt(mydat, id.vars=c("Sc", "K", "n"), measure.vars=c("minC.cut"))
Now, you don't need to specify breaks since we took care of that already.
ggplot(mydat.cut, aes(x=n, y=K)) +
geom_tile(aes(fill=value)) +
facet_wrap(~ Sc, ncol=2) +
scale_fill_manual(values = c("darkgreen", "palegreen", "lightcoral", "red"))
I am newbie to R and stuck with this problem.
I have matrix of integer value and I want to plot it as matrix where the size of points correspond the value of integer. So the larger is the value of one cell,the larger the point. at the end I want to connect the largest value of each column together using a line.
m <- matrix(sample(1:15,15),nrow=3,ncol=5)
dimnames(m)<-list(c("r1","r2","r3"),c("c1","c2","c3","c4","c5"))
> m
c1 c2 c3 c4 c5
r1 2 4 8 7 5
r2 1 9 6 13 3
r3 12 14 15 10 11
for example here my plot should contains 15 points where x-axis shows c1,c2,c3,c4 and y-axis r1,r2,r3. And finally m(r3,d1),m(r3,c2),m(r3,c3),m(r2,c4) and m(r3,c5) should be connected.
I tried using matplot:
matplot(my[,-1],my[,1],type='p',pch=1)
but it doesn't produce what I want.
UPDATE:
I have a very spars matrix,so there are some columns with only zero values. In that case it should consider only one of them. The result of sven solution produces this:
UPDATE2
Thomas result:
Here's an approach with reshape2 and ggplot2:
# The matrix (m):
c1 c2 c3 c4 c5
r1 8 6 5 2 15
r2 12 9 10 13 14
r3 1 7 4 11 3
# transform data
library(reshape2)
dat <- melt(m, varnames = c("y", "x"))
dat <- transform(dat, max = ave(value, x, FUN = function(x)
replace(integer(length(x)), which.max(x), 1L)))
# create plot
library(ggplot2)
ggplot(dat, aes(x = x, y = y)) +
geom_point(aes(size = value)) +
geom_line(data = subset(dat, as.logical(max)), aes(group = 1))
Here's a base graphics solution:
# reproducible data
set.seed(1)
m <- matrix(sample(1:15,15),nrow=3,ncol=5)
dimnames(m)<-list(c("r1","r2","r3"),c("c1","c2","c3","c4","c5"))
The plot:
e <- expand.grid(1:nrow(m), 1:ncol(m))
plot(e[,2], e[,1], cex=sqrt(m), xlim=c(0,6), ylim=c(0,4), pch=21, bg='black')
lines(1:ncol(m), apply(m,2,which.max), lwd=2)
The result: