2 factor histogram analysis - r

I've looked for a long time for an answer to this problem and I haven't been able to find an answer.
Here is the problem: I have a data frame with the following variables: flow rate 1 (CH_SONAR), flow rate 2 (CH_SONAR_2T), density (CH_DENSITY), and the percent difference between the two flow rates (per_diff). I've created a 5 level factor for flow rate 1 and another 5 level factor for density.
f.factor <- cut(p.pipeline$CH_SONAR_2T, 5, labels = c('Very Low','Low', 'Medium', 'High', 'Very High'))
d.factor <- cut(p.pipeline$CH_DENSITY, 5, labels = c('Water', 'Very Sparce', 'Sparce', 'Dense', 'Very Dense'))
I've plotted both using ggplot2 using each factor as the fill variable:
qplot(per_diff, data = p.pipeline, geom = "histogram", binwidth = 1, xlim = c(-5, 15), fill = f.factor)
qplot(per_diff, data = p.pipeline, geom = "histogram", binwidth = 1, xlim = c(-5, 15), fill = d.factor)
Now I would like to create a histogram with ggplot that lets me see the relationship between flow rate and density (Water and Very Low, Very Sparce and Low, Sparce and Low, etc. for all 25 possible combinations). I've tried creating new factors, binding d.factor and f.factor to the data frame, binding the two factors together etc. and no results, do you guys have any idea how to approach this?
I've tried including the histograms I produced but I don't think I have enough reputation to do it.
Thanks for all your help!

You can use fill=interaction(f.factor, d.factor). Combinations that don't appear in the legend, such as 'Low.Very Sparce' indicate that there is not an observation belonging to both of these categories.
If you want the colors of adjacent levels to standout more, one thing you can do is generate the colors with rainbow, then swap every other color with it's opposite on the wheel.
col <- rainbow(length(levels(interaction(f.factor, d.factor))), v=.75, s=.5)
col.index <- ifelse(seq(col) %% 2,
seq(col),
(seq(ceiling(length(col)/2), length.out=length(col)) %% length(col)) + 1)
mixed <- col[col.index]
qplot(per_diff, data = p.pipeline,
geom = "histogram", binwidth = 1, xlim = c(-5, 15),
fill = interaction(f.factor, d.factor)) + scale_fill_manual(values=mixed)

Related

Formatting histograms in R

I'm trying to fit Variance-Gamma distribution to empirical data of 1-minute logarithmic returns. In order to visualize the results I plotted together 2 histograms: empirical and theoretical.
(a is the vector of empirical data)
SP_hist <- hist(a,
col = "lightblue",
freq = FALSE,
breaks = seq(a, max(a), length.out = 141),
border = "white",
main = "",
xlab = "Value",
xlim = c(-0.001, 0.001))
hist(VG_sim_rescaled,
freq = FALSE,
breaks = seq(min(VG_sim_rescaled), max(VG_sim_rescaled), length.out = 141),
xlab = "Value",
main = "",
col = "orange",
add = TRUE)
(empirical histogram-blue, theoretical histogram-orange)
However, after having plotted 2 histograms together, I started wondering about 2 things:
In both histograms I stated, that freq = FALSE. Therefore, the y-axis should be in range (0, 1). In the actual picture values on the y-axis exceed 3,000. How could it happen? How to solve it?
I need to change the bucketing size (the width of the buckets) and the density per unit length of the x-axis. How is it possible to do these tasks?
Thank you for your help.
freq=FALSE means that the area of the entire histogram is normalized to one. As your x-axis has a very small range (about 10^(-4)), the y-values must be quite large to achieve an area (= x times y) of one.
The only way to set the number of bins is by providing a vector of break points to the parameter breaks. Theoretically, this parameter also accepts a single number, but this number is ignored by hist. Thus try the following:
bins <- 6 # number of cells
breaks <- seq(min(x),max(x),(max(x)-min(x))/bins)
hist(x, freq=FALSE, breaks=breaks)

Change x-variable bin counts of a histogram

I have plots that are .25 ha and I need my data to be displayed as 1 ha. I'm trying to make the following graph but multiplying the counts by 4 (so I have a full hectare instead of a quarter). However, all posts seem to deal with changing axis titles, values, etc., but I need to change the actual histogram frequency counts.
Histogram x-variable in size classes plotted by factor variable
ggplot(liveTrees, aes(diam1DBH)) +
geom_histogram(binwidth =10) +
facet_wrap(~site) +
ggtitle("Stems/0.25ha by Size Class") +
ylab("Stems/0.25ha") +
xlab("Diameter Class")
liveTrees = my data
diam1DBH = diameter (numeric, continuous)
site = plot location (factor)
Original code:
What I've tried: `
for (i in 1:length(unique(liveTrees$site))) {
test<-hist(liveTrees[liveTrees$site== unique(liveTrees$site)[i], "diam1DBH"], plot = F)
b <- barchart(test$counts*4, width = 10, xlim=c(0,350), cex.axis = 0.85)
axis(side = 1, at = "b", cex.axis = 0.85)
}
But I keep getting
Error in axis(side = 1, at = "b", cex.axis = 0.85) : no locations are
finite In addition: Warning message: In axis(side = 1, at = "b",
cex.axis = 0.85) : NAs introduced by coercion
So, with this I can get the counts, but the numbers aren't right and they're not in a useful format.
My data is a data.frame, example: data example
What I need is the sum of each diameter class, each bin frequency amount, multiplied by 4. I've been trying to do this but can't get it to work, any help is appreciated!
If you multiply the frequencies by 4, the values will change but the graphs will still look the same, so there are two options, one is to simply change the axis value labels, or the other simpler way is to add the data 4 times. For example:
ggplot(rbind(data, data,data,data), aes(variable_X)) + geom_histogram(binwidth =10)
This way the data is multiplied, and no new data.frame is made that could confuse analysis later on.

Twosided Barplot in R with different data

I was wondering if it's possible to get a two sided barplot (e.g. Two sided bar plot ordered by date) that shows above Data A and below Data B of each X-Value.
Data A would be for example the age of a person and Data B the size of the same person. The problem with this and the main difference to the examples above: A and B have obviously totally different units/ylims.
Example:
X = c("Anna","Manuel","Laura","Jeanne") # Name of the Person
A = c(12,18,22,10) # Age in years
B = c(112,186,165,120) # Size in cm
Any ideas how to solve this? I don't mind a horizontal or a vertical solution.
Thank you very much!
Here's code that gets you a solid draft of what I think you want using barplot from base R. I'm just making one series negative for the plotting, then manually setting the labels in axis to reference the original (positive) values. You have to make a choice about how to scale the two series so the comparison is still informative. I did that here by dividing height in cm by 10, which produces a range similar to the range for years.
# plot the first series, but manually set the range of the y-axis to set up the
# plotting of the other series. Set axes = FALSE so you can get the y-axis
# with labels you want in a later step.
barplot(A, ylim = c(-25, 25), axes = FALSE)
# plot the second series, making whatever transformations you need as you go. Use
# add = TRUE to add it to the first plot; use names.arg to get X as labels; and
# repeat axes = FALSE so you don't get an axis here, either.
barplot(-B/10, add = TRUE, names.arg = X, axes = FALSE)
# add a line for the x-axis if you want one
abline(h = 0)
# now add a y-axis with labels that makes sense. I set lwd = 0 so you just
# get the labels, no line.
axis(2, lwd = 0, tick = FALSE, at = seq(-20,20,5),
labels = c(rev(seq(0,200,50)), seq(5,20,5)), las = 2)
# now add y-axis labels
mtext("age (years)", 2, line = 3, at = 12.5)
mtext("height (cm)", 2, line = 3, at = -12.5)
Result with par(mai = c(0.5, 1, 0.25, 0.25)):

Adding color to circular data based on group membership

I'm trying to add color to specific points in my circular data based on group membership (I have two groups: one with individuals with a certain medical condition and another group of just healthy controls). I've converted their data from degrees to radians and put it on the plot, but I haven't managed to be able to selectively change the color of the points based on the factor variable I have).
Know that I've loaded library (circular), which doesn't allow me to use ggplot. Here's the syntax I've been working with:
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=w$dx, axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
If you've noticed, I specified the factor variable (which has two levels) in the color section, but it just keeps putting everything in one color. Any suggestions?
Seems plot.circular does not like to assign multiple colours. Here's one potential work-around:
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0), 10, control.circular=list(units="degrees"))
bcirc <- c(bcirc1, bcirc2)
dx <- c(rep(1,100),rep(2,100))
## start with blank plot, then add group-specific points
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=NA,
axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
points(bcirc[dx==1], col=rgb(1,0,0,0.1), cex=2) # note: a loop would be cleaner if dealing with >2 levels
points(bcirc[dx==2], col=rgb(0,0,1,0.1), cex=2)
Inspired by Paul Regular's example, here is a version using the same data where one condition is plotted stacking inwards and the other is plotted stacking outwards.
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc <- data.frame(condition = c(
rep(1,length(bcirc1)),
rep(2,length(bcirc2)) ),
angles = c(bcirc1,
bcirc2) )
## start with blank plot, then add group-specific points
dev.new(); par(mai = c(1, 1, 0.1,0.1))
plot(circular(subset(bcirc, condition == 1)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1, col=1,sep = 0.005, tcl.text = -0.073,#text outside
axes=T, xlab ="Basal sCORT", ylab = "Basal sAA")
par(new = T)
plot(circular(subset(bcirc, condition == 2)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1.05, col=2,
sep = -0.005, axes=F)#inner circle, no axes, stacks inwards

Line graph plot in R with a line for a single data series changing color, i.e. 1 line, 2 colors

I would like to make a simple line plot like this:
things <- c(1, 3, 6, 4, 9)
plot(things, type="o", col="blue", axes=FALSE, ann=FALSE)
axis(1, at=1:5, lab=c("Mon","Tue","Wed","Thu","Fri"))
axis(2, las=1)
box()
(Image)
BUT with the single line changing color at a certain data point, in this case, say, blue Monday-Wednesay, and red for Wednesday-Friday. I.e. from the data point number 1 to 3, the line is blue, from 3 to 5, it would be red.
I know I can just split the data series into two, and plot them separately and the image will join them, but the real data I am using is from a large complex data frame, and I need to make the plot from dozens of them, so having one quick little code to do it without manipulating the actual data would be a big time-saver.
One line, two colors, that's it!
Thanks!
Maybe I'm misunderstanding what you need here, but it seems to me that you can do it easily enough in ggplot2.
library(ggplot2)
dd <- data.frame(days = c("Mon","Tue","Wed","Thu","Fri"),
things = c(1, 3, 6, 4, 9))
# set the levels of the factor so that 'days' sorts properly
dd$days <- factor(dd$days, levels = c('Mon','Tue','Wed','Thu','Fri'))
# which days do we want to highlight?
days.highlight <- dd$days[4:5]
dd$highlight <- ifelse(dd$days %in% days.highlight, "red", "black")
ggplot(dd, aes(x = days, y = things, colour = highlight, group = 1)) +
geom_line() +
geom_point() +
scale_colour_identity(dd$highlight)

Resources