Change x-variable bin counts of a histogram - r

I have plots that are .25 ha and I need my data to be displayed as 1 ha. I'm trying to make the following graph but multiplying the counts by 4 (so I have a full hectare instead of a quarter). However, all posts seem to deal with changing axis titles, values, etc., but I need to change the actual histogram frequency counts.
Histogram x-variable in size classes plotted by factor variable
ggplot(liveTrees, aes(diam1DBH)) +
geom_histogram(binwidth =10) +
facet_wrap(~site) +
ggtitle("Stems/0.25ha by Size Class") +
ylab("Stems/0.25ha") +
xlab("Diameter Class")
liveTrees = my data
diam1DBH = diameter (numeric, continuous)
site = plot location (factor)
Original code:
What I've tried: `
for (i in 1:length(unique(liveTrees$site))) {
test<-hist(liveTrees[liveTrees$site== unique(liveTrees$site)[i], "diam1DBH"], plot = F)
b <- barchart(test$counts*4, width = 10, xlim=c(0,350), cex.axis = 0.85)
axis(side = 1, at = "b", cex.axis = 0.85)
}
But I keep getting
Error in axis(side = 1, at = "b", cex.axis = 0.85) : no locations are
finite In addition: Warning message: In axis(side = 1, at = "b",
cex.axis = 0.85) : NAs introduced by coercion
So, with this I can get the counts, but the numbers aren't right and they're not in a useful format.
My data is a data.frame, example: data example
What I need is the sum of each diameter class, each bin frequency amount, multiplied by 4. I've been trying to do this but can't get it to work, any help is appreciated!

If you multiply the frequencies by 4, the values will change but the graphs will still look the same, so there are two options, one is to simply change the axis value labels, or the other simpler way is to add the data 4 times. For example:
ggplot(rbind(data, data,data,data), aes(variable_X)) + geom_histogram(binwidth =10)
This way the data is multiplied, and no new data.frame is made that could confuse analysis later on.

Related

how to add labels above the bar of "barplot" graphics?

I asked a question before, but now I would like to know how do I put the labels above the bars.
post old: how to create a frequency histogram with predefined non-uniform intervals?
dataframe <- c (1,1.2,40,1000,36.66,400.55,100,99,2,1500,333.45,25,125.66,141,5,87,123.2,61,93,85,40,205,208.9)
Upatdate
Update
Following the guidance of the colleague I am updating the question.
I have a data base and I would like to calculate the frequency that a given value of that base appears within a pre-defined range, for example: 0-50, 50-150, 150-500, 500-2000.
in the post(how to create a frequency histogram with predefined non-uniform intervals?) I managed to do this, but I don't know how to add the labels above the bars. I Tried:
barplort (data, labels = labels), but it didn't work.
I used barplot because the post recommended me, but if it is possible to do it using ggplot, it would be good too.
Based on the answer to your first question, here is one way to add a text() element to your Base R plot, that serves as a label for each one of your bars (assuming you want to double-up the information that is already on the x axis).
data <- c(1,1.2,40,1000,36.66,400.55,100,99,2,1500,333.45,25,125.66,141,5,87,123.2,61,93,85,40,205,208.9)
# Cut your data into categories using your breaks
data <- cut(data,
breaks = c(0, 50, 150, 500, 2000),
labels = c('0-50', '50-150', '150-500', '500-2000'))
# Make a data table (i.e. a frequency count)
data <- table(data)
# Plot with `barplot`, making enough space for the labels
p <- barplot(data, ylim = c(0, max(data) + 1))
# Add the labels with some offset to be above the bar
text(x = p, y = data + 0.5, labels = names(data))
If it is the y values that you are after, you can change what you pass to the labels argument:
p <- barplot(data, ylim = c(0, max(data) + 1))
text(x = p, y = data + 0.5, labels = data)
Created on 2020-12-11 by the reprex package (v0.3.0)

Reduce the amount of y axis tick values in R (pirateplot)

Since I have updated R to version 3.5.2, my pirateplots (yarrr library) show more y axis ticks than before. Before they only showed ticks at full number values (1, 2, 3, etc., 1), but now they also show ticks at .5 values (1, 1.5, 2, 2.5, 3, etc., 2). This is despite the fact that the data and scripts are exactly the same as before.
Do you know how I can remove the .5 value ticks and make the plots look like they looked before?
I think the yaxt.y argument is what you're looking for as it allows you to override the default y axis construction.
Here's a plot using default arguments
pirateplot(weight ~ Diet, data = ChickWeight)
Original plot
Now with the yaxt.y argument
pirateplot(weight ~ Diet, data = ChickWeight, yaxt.y = seq(0, 400, 25))
Second version
You can also specify the grid line widths with gl.lwd:
pirateplot(weight ~ Diet, data = ChickWeight, yaxt.y = seq(0, 400, 25), gl.lwd = c(.5, 1.5), gl.col = "black")
Third version
Hope this helps!

Twosided Barplot in R with different data

I was wondering if it's possible to get a two sided barplot (e.g. Two sided bar plot ordered by date) that shows above Data A and below Data B of each X-Value.
Data A would be for example the age of a person and Data B the size of the same person. The problem with this and the main difference to the examples above: A and B have obviously totally different units/ylims.
Example:
X = c("Anna","Manuel","Laura","Jeanne") # Name of the Person
A = c(12,18,22,10) # Age in years
B = c(112,186,165,120) # Size in cm
Any ideas how to solve this? I don't mind a horizontal or a vertical solution.
Thank you very much!
Here's code that gets you a solid draft of what I think you want using barplot from base R. I'm just making one series negative for the plotting, then manually setting the labels in axis to reference the original (positive) values. You have to make a choice about how to scale the two series so the comparison is still informative. I did that here by dividing height in cm by 10, which produces a range similar to the range for years.
# plot the first series, but manually set the range of the y-axis to set up the
# plotting of the other series. Set axes = FALSE so you can get the y-axis
# with labels you want in a later step.
barplot(A, ylim = c(-25, 25), axes = FALSE)
# plot the second series, making whatever transformations you need as you go. Use
# add = TRUE to add it to the first plot; use names.arg to get X as labels; and
# repeat axes = FALSE so you don't get an axis here, either.
barplot(-B/10, add = TRUE, names.arg = X, axes = FALSE)
# add a line for the x-axis if you want one
abline(h = 0)
# now add a y-axis with labels that makes sense. I set lwd = 0 so you just
# get the labels, no line.
axis(2, lwd = 0, tick = FALSE, at = seq(-20,20,5),
labels = c(rev(seq(0,200,50)), seq(5,20,5)), las = 2)
# now add y-axis labels
mtext("age (years)", 2, line = 3, at = 12.5)
mtext("height (cm)", 2, line = 3, at = -12.5)
Result with par(mai = c(0.5, 1, 0.25, 0.25)):

Group the variable according to its value and get a histogram

I'm trying to group the variable according to its values and get a histogram.
For example, this is my data:
r <-c(1,899,1,2525,763,3,2,2,1863,695,9,4,2876,1173,1156,5098,3,3876,1,1,
3023,76336,13,003,9898,1,10,843,10546,617,1375,1,1,5679,1,21,1,13,6,28,1,14088,682)
I want to group r by its value, like: 1-5, 5-10, 10-100, 100-500 and more than 500. And then I want to get a histogram which the x axis is in the type of interval (1-5,5-10,10-100,100-500 and more than 500) . How to solve that?
If I want to use le package ggplot2, code as following:
ggplot(data=r, aes(x=r))+geom_histogram(breaks = c(1, 5, 10, 100, 500,2000,Inf))
It dosen't work and R says that "missing value where TRUE/FALSE needed". And how to make the larges of bins are the same?
In base R
r <-c(1,899,1,2525,763,3,2,2,1863,695,9,4,2876,1173,1156,5098,3,3876,1,1,5,
3023,76336,13,003,9898,1,10,843,10546,617,1375,1,1,5679,1,21,1,13,6,28,1,14088,682)
cut.vals <- cut(r, breaks = c(1, 5, 10, 100, 500, Inf), right = FALSE)
xy <- data.frame(r, cut = cut.vals)
barplot(table(xy$cut))
Note that I added the xy variable to ease in comparing how values were grouped. You can directly put cut.vals into the barplot(table()).
To use ggplot2, you can pre-calculate all the bins and plot
ggplot(xy, aes(x = cut)) +
theme_bw() +
geom_bar() +
scale_x_discrete(drop = FALSE)
geom_histogram's most common parameter that controls bin size is binwidth, which is constant for all bins.

2 factor histogram analysis

I've looked for a long time for an answer to this problem and I haven't been able to find an answer.
Here is the problem: I have a data frame with the following variables: flow rate 1 (CH_SONAR), flow rate 2 (CH_SONAR_2T), density (CH_DENSITY), and the percent difference between the two flow rates (per_diff). I've created a 5 level factor for flow rate 1 and another 5 level factor for density.
f.factor <- cut(p.pipeline$CH_SONAR_2T, 5, labels = c('Very Low','Low', 'Medium', 'High', 'Very High'))
d.factor <- cut(p.pipeline$CH_DENSITY, 5, labels = c('Water', 'Very Sparce', 'Sparce', 'Dense', 'Very Dense'))
I've plotted both using ggplot2 using each factor as the fill variable:
qplot(per_diff, data = p.pipeline, geom = "histogram", binwidth = 1, xlim = c(-5, 15), fill = f.factor)
qplot(per_diff, data = p.pipeline, geom = "histogram", binwidth = 1, xlim = c(-5, 15), fill = d.factor)
Now I would like to create a histogram with ggplot that lets me see the relationship between flow rate and density (Water and Very Low, Very Sparce and Low, Sparce and Low, etc. for all 25 possible combinations). I've tried creating new factors, binding d.factor and f.factor to the data frame, binding the two factors together etc. and no results, do you guys have any idea how to approach this?
I've tried including the histograms I produced but I don't think I have enough reputation to do it.
Thanks for all your help!
You can use fill=interaction(f.factor, d.factor). Combinations that don't appear in the legend, such as 'Low.Very Sparce' indicate that there is not an observation belonging to both of these categories.
If you want the colors of adjacent levels to standout more, one thing you can do is generate the colors with rainbow, then swap every other color with it's opposite on the wheel.
col <- rainbow(length(levels(interaction(f.factor, d.factor))), v=.75, s=.5)
col.index <- ifelse(seq(col) %% 2,
seq(col),
(seq(ceiling(length(col)/2), length.out=length(col)) %% length(col)) + 1)
mixed <- col[col.index]
qplot(per_diff, data = p.pipeline,
geom = "histogram", binwidth = 1, xlim = c(-5, 15),
fill = interaction(f.factor, d.factor)) + scale_fill_manual(values=mixed)

Resources