Calculate area of cross section for varying height - r

I'm trying to figure out how to calculate the area of a river cross section.
For the cross section I have the depth at every 25 cm over the 5 m wide river.
x_profile <- seq(0, 500, 25)
y_profile = c(50, 73, 64, 59, 60, 64, 82, 78, 79, 76, 72, 68, 63, 65, 62, 61, 56, 50, 44, 39, 25)
If anyone have some suggestions of how this could be done in r it's highly appreciated.

We can use the sf package to create a polygon showing the cross-section and then calculate the area. Notice that to create a polygon, it is necessary to provide three more points as c(0, 0), c(500, 0), and c(0, 0) when creating the matrix m.
x_profile <- seq(0, 500, 25)
y_profile <- c(50, 73, 64, 59, 60, 64, 82, 78, 79, 76, 72,
68, 63, 65, 62, 61, 56, 50, 44, 39, 25)
library(sf)
# Create matrix with coordinates
m <- matrix(c(0, x_profile, 500, 0, 0, -y_profile, 0, 0),
byrow = FALSE, ncol = 2)
# Create a polygon
poly <- st_polygon(list(m))
# View the polygon
plot(poly)
# Calcualte the area
st_area(poly)
31312.5

Related

Plot a plane in R: PCA

I'm working with a PCA problem where I have 3 variables and I reduce them to 2 by doing PCA. I've already plot all the points in 3D using scatter3D. My question is, how can I plot the plane determined by two vectors (the first two eigenvectors of the sampled covariance matrix) in R?
This is what I have so far
library(plot3D)
X <- matrix(c(55, 75, 110,
47, 69, 108,
42, 71, 110,
48, 74, 114,
47, 75, 114,
52, 73, 104,
49, 72, 106,
44, 67, 107,
52, 73, 108,
45, 73, 111,
50, 80, 117,
50, 71, 110,
48, 75, 114,
51, 73, 106,
44, 66, 102,
42, 71, 112,
50, 68, 107,
48, 70, 108,
51, 72, 108,
52, 73, 109,
49, 72, 112,
49, 73, 108,
46, 70, 105,
39, 66, 100,
50, 76, 108,
52, 71, 108,
56, 75, 108,
53, 70, 112,
53, 72, 110,
49, 74, 113,
51, 72, 109,
55, 74, 110,
56, 75, 110,
62, 79, 118,
58, 77, 115,
50, 71, 105,
52, 67, 104,
52, 73, 107,
56, 73, 106,
55, 78, 118,
53, 68, 103), ncol = 3,nrow = 41,byrow = TRUE)
S <- cov(X)
Gamma <- eigen(S)$vectors
scatter3D(X[,1], X[,2], X[,3], pch = 18, bty = "u", colkey = FALSE,
main ="bty= 'u'", col.panel ="gray", expand =0.4,
col.grid = "white",ticktype = "detailed",
phi = 25,theta = 45)
pc <- scale(X,center=TRUE,scale=FALSE) %*% Gamma[,c(1,2)]
Now I would like to plot the plane using scatter3D
Perhaps this will do. Using the iris data. It uses scatter3d in package car which can add a regression surface to a 3d plot:
library(car)
data(iris)
iris.pr <- prcomp(iris[, 1:3], scale.=TRUE)
# Draw 3d plot with surface and color points by species
scatter3d(PC3~PC1+PC2, iris.pr$x, point.col=c(rep(2, 50), rep(3, 50), rep(4, 50)))
This plots a regression surface predicting PC3 from PC1 and PC2. By definition the correlation between any two principal components is zero so the surface should be PC3=0 for any values of PC1 and PC2, but I don't see a way to produce exactly that surface. It is pretty close though.

update valence shifter in sentimentr package in r

I am trying to remove certain rows from the lexicon::hash_valence_shifters in the sentimentr package. Specifically, i want to keep only rows:
c( 1 , 2 , 3 , 6 , 7 , 13, 14 , 16 , 19, 24, 25 , 26, 27 , 28, 36, 38, 39, 41, 42, 43, 45, 46, 53, 54, 55, 56, 57, 59, 60, 65, 70, 71, 73, 74, 79, 84, 85, 87, 88, 89, 94, 95, 96, 97, 98, 99, 100, 102, 103, 104, 105, 106, 107, 114, 115, 119, 120, 123, 124, 125, 126, 127, 128, 129, 135, 136, 138)
I have tried the below approach:
vsm = lexicon::hash_valence_shifters[c, ]
vsm[ , y := as.numeric(y)]
vsm = sentimentr::as_key(vsm, comparison = NULL, sentiment = FALSE)
sentimentr::is_key(vsm)
vsn = sentimentr::update_valence_shifter_table(vsm, drop = c(dropvalue$x), x= lexicon::hash_valence_shifters, sentiment = FALSE, comparison = TRUE )
However, when I am calculating the sentiment using the updated valence shifter table "vsn", it is giving the sentiment as 0.
Can someone please let me know how to just keep specific rows of the valence shifter table ?
Thanks!

Simulate data from a Gompertz curve in R

I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
I have the data, and I know that the growth follows a Gompertz curve with formula:
[y = a*exp(-exp(-(x-x0)/b))], with a = 88.8, b = 11.7, and x0 = 15.1.
The data I have is
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
Any help would be appreciated thank you
*Will edit when more information is given**
I am a little confused by your question. I have compiled what you have written into R. Please elaborate for me so that I can help you:
gompertz <- function(x, x0, a, b){
a*exp(-exp(-(x-x0)/b))
}
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165) # means of 30 samples of grape cluster weights?
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112) # ?
#??
gompertz(x, x0 = 15.1, a = 88.8, b = 11.7)
gompertz(y, x0 = 15.1, a = 88.8, b = 11.7)

Reproducing the style of a histogram plot in R

I am attempting to use the following dataset to reproduce a histogram in R:
ages <- c(26, 31, 35, 37, 43, 43, 43, 44, 45, 47, 48, 48, 49, 50, 51, 51, 51, 51, 52, 54, 54, 54, 54,
55, 55, 55, 56, 57, 57, 57, 58, 58, 58, 58, 59, 59, 62, 62, 63, 64, 65, 65, 65, 66, 66, 67,
67, 72, 86)
I would like to get a histogram that looks as close as possible to this:
However, I am having three problems:
Ia m unable to get my frequency count on the y-axis to reach 18
I haven't been able to get the squiggly break symbol on the x-axis
My breaks don't seem to be properly setting to the vector I entered in my code
I read over ?hist and thought the first two issues could be accomplished by setting xlim and ylim, but that doesn't seem to be working.
I'm at a loss for the third issue since I thought it could be accomplished by including breaks = c(25.5, 34.5, 43.5, 52.5, 61.5, 70.5, 79.5, 88.5).
Here's my code so far:
hist(ages, breaks = c(25.5, 34.5, 43.5, 52.5, 61.5, 70.5, 79.5, 88.5),
freq=TRUE, col = "lightblue", xlim = c(25.5, 88.5), ylim = c(0,18),
xlab = "Age", ylab = "Frequency")
Followed by my corresponding histogram:
Any bump in the right direction is appreciated.
1. Reaching 18.
It appears that in your data you have at most 17 numbers in the category between 52.5 and 61.5. And that is even with open interval on both sides:
ages <- c(26, 31, 35, 37, 43, 43, 43, 44, 45, 47, 48, 48, 49, 50, 51, 51, 51,
51, 52, 54, 54, 54, 54, 55, 55, 55, 56, 57, 57, 57, 58, 58, 58, 58,
59, 59, 62, 62, 63, 64, 65, 65, 65, 66, 66, 67, 67, 72, 86
)
sum(ages >= 52.5 & ages <= 61.5)
[1] 17
So your histogram only reflects that.
2. Break symbol.
For that you might be interested in THIS SO ANSWER
3. Breaks.
If you read help(hist) you will see that breaks specify the points at which the groups are formed:
... * a vector giving the breakpoints between histogram cells
So you breaks work as intended. The problem you have is with showing the same numbers on x-axis. Here ANOTHER SO ANSWER might help you.
Example
Here is how you could go about reproducing the plot.
library(plotrix) # for the break on x axis
library(shape) # for styled arrow heads
# manually select axis ticks and colors
xticks <- c(25.5, 34.5, 43.5, 52.5, 61.5, 70.5, 79.5, 88.5)
yticks <- seq(2, 18, 2)
bgcolor <- "#F2ECE4" # color for the background
barcolor <- "#95CEEF" # color for the histogram bars
# top level parameters - background color and font type
par(bg=bgcolor, family="serif")
# establish a new plotting window with a coordinate system
plot.new()
plot.window(xlim=c(23, 90), ylim=c(0, 20), yaxs="i")
# add horizontal background lines
abline(h=yticks, col="darkgrey")
# add a histogram using our selected break points
hist(ages, breaks=xticks, freq=TRUE, col=barcolor, xaxt='n', yaxt='n', add=TRUE)
# L-shaped bounding box for the plot
box(bty="L")
# add x and y axis
axis(side=1, at=xticks)
axis(side=2, at=yticks, labels=NA, las=1, tcl=0.5) # for inward ticks
axis(side=2, at=yticks, las=1)
axis.break(1, 23, style="zigzag", bgcol=bgcolor, brw=0.05, pos=0)
# add labels
mtext("Age", 1, line=2.5, cex=1.2)
mtext("Frequency", 2, line=2.5, cex=1.2)
# add arrows
u <- par("usr")
Arrows(88, 0, u[2], 0, code = 2, xpd = TRUE, arr.length=0.25)
Arrows(u[1], 18, u[1], u[4], code = 2, xpd = TRUE, arr.length=0.25)
And the picture:

In ggplot2, what do the end of the boxplot lines represent?

I can't find a description of what the end points of the lines of a boxplot represent.
For example, here are point values above and below where the lines end.
(I realize that the top and bottom of the box are 25th and 75th percentile, and the centerline is the 50th). I assume, as there are points above and below the lines that they do not represent the max/min values.
The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:
less than Q1 - 1.5*IQR
greater than Q3 + 1.5*IQR
then that point is classed as an "outlier". The whiskers are defined as:
upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 – 1.5 * IQR)
where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.
Additional information
See the wikipedia boxplot page for alternative outlier rules.
There are actually a variety of ways of calculating quantiles. Have a look at `?quantile for the description of the nine different methods.
Example
Consider the following example
> set.seed(1)
> x = rlnorm(20, 1/2)#skewed data
> par(mfrow=c(1,3))
> boxplot(x, range=1.7, main="range=1.7")
> boxplot(x, range=1.5, main="range=1.5")#default
> boxplot(x, range=0, main="range=0")#The same as range="Very big number"
This gives the following plot:
As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However, range=0 is a special case - it's equivalent to "range=infinity"
I think ggplot using the standard defaults, the same as boxplot: "the whiskers extend to the most extreme data point which is no more than [1.5] times the length of the box away from the box"
See: boxplot.stats
P1IMSA Tutorial 8 - Understanding Box and Whisker Plots video offers a visual step-by-step explanation of (Tukey) box and whisker plots.
At 4m 23s I explain the meaning of the whisker ends and its relationship to the 1.5*IQR.
Although the chart shown in the video was rendered using D3.js rather than R, its explanations jibe with the R implementations of boxplots mentioned.
As highlighted by #TemplateRex in a comment, ggplot doesn't draw the whiskers at the upper/lower quartile plus/minus 1.5 times the IQR. It actually draws them at max(x[x < Q3 + 1.5 * IQR]) and min(x[x > Q1 + 1.5 * IQR]). For example, here is a plot drawn using geom_boxplot where I've added a dashed line at the value Q1 - 1.5*IQR:
Q1 = 52
Q3 = 65
Q1 - 1.5 * IQR = 52 - 13*1.5 = 32.5 (dashed line)
Lower whisker = min(x[x > Q1 + 1.5 * IQR]) = 35 (where x is the data used to create the boxplot, outlier is at x = 27).
MWE
Note this isn't the exact code I used to produce the image above but it gets the point over.
library("mosaic") # For favstats()
df <- c(54, 41, 55, 66, 71, 50, 65, 54, 72, 46, 36, 64, 49, 64, 73,
52, 53, 66, 49, 64, 44, 56, 49, 54, 61, 55, 52, 64, 60, 54, 59,
67, 58, 51, 63, 55, 67, 68, 54, 53, 58, 26, 53, 56, 61, 51, 51,
50, 51, 68, 60, 67, 66, 51, 60, 52, 79, 62, 55, 74, 62, 59, 35,
67, 58, 74, 48, 53, 40, 62, 67, 57, 68, 56, 75, 55, 41, 50, 73,
57, 62, 61, 48, 60, 64, 53, 53, 66, 58, 51, 68, 69, 69, 58, 54,
57, 65, 78, 70, 52, 59, 52, 65, 70, 53, 57, 72, 47, 50, 70, 41,
64, 59, 58, 65, 57, 60, 70, 46, 40, 76, 60, 64, 51, 38, 67, 57,
64, 51)
df <- as.data.frame(df)
Q1 <- favstats(df)$Q1
Q3 <- favstats(df)$Q3
IQR <- Q3 - Q1
lowerlim <- Q1 - 1.5*IQR
upperlim <- Q3 + 1.5* IQR
boxplot_Tukey_lower <- min(df[df > lowerlim])
boxplot_Tukey_upper <- max(df[df < upperlim])
ggplot(df, aes(x = "", y = df)) +
stat_boxplot(geom ='errorbar', width = 0.5) +
geom_boxplot() +
geom_hline(yintercept = lowerlim, linetype = "dashed") +
geom_hline(yintercept = upperlim, linetype = "dashed")

Resources