Related
I'm trying to calculate the latitudinal ranges for species (ASVs) along a transect but cannot get my head around how to do this.
Basically I want the maximum latitude minus the minimum latitude where a species is present, i.e. abundance does not equal 0. For examples Species 1 is present for the first time at -35 deg S and present for the last time at -40 deg S, it's latitudinal range would be 5 degrees. Thanks!
My data looks like this:
> dput(test[1:30, c(1:5)])
structure(list(Station_neat = c("001_DCM", "001_SA", "003_DCM",
"003_SA", "005_DCM", "005_SA", "007_DCM", "007_SA", "009_DCM",
"009_SA", "011_DCM", "011_SA", "013_DCM", "013_SA", "015_DCM",
"015_SA", "017_DCM", "017_SA", "019_DCM", "019_SA", "021_DCM",
"021_SA", "023_DCM", "023_SA", "025_DCM", "025_SA", "027_DCM",
"027_SA", "029_DCM", "029_SA"), Lat = c(-29.997, -29.997, -30.9975,
-30.9975, -31.9995, -31.9995, -32.99816667, -32.99816667, -34.00016667,
-34.00016667, -34.9995, -34.9995, -36.00083333, -36.00083333,
-36.9985, -36.9985, -38.00016667, -38.00016667, -38.99833333,
-38.99833333, -39.999, -39.999, -40.99783333, -40.99783333, -42.0005,
-42.0005, -42.99633333, -42.99633333, -43.9975, -43.9975), asv_3 = c(80,
0, 65, 0, 41, 0, 50, 0, 44, 0, 53, 0, 59, 0, 38, 0, 43, 0, 25,
0, 29, 51, 35, 22, 133, 35, 159, 83, 965, 414), asv_4 = c(766,
694, 286, 791, 421, 1202, 382, 431, 484, 684, 431, 529, 454,
722, 621, 370, 472, 439, 394, 243, 414, 518, 297, 300, 574, 396,
395, 1359, 1113, 541), asv_5 = c(1314, 2812, 729, 2874, 915,
3720, 1226, 2046, 940, 1783, 1220, 2627, 986, 3195, 1514, 566,
590, 1603, 325, 667, 748, 932, 616, 339, 1167, 1088, 988, 2333,
1563, 2146)), row.names = c(NA, 30L), class = "data.frame")
Edit: ASVs (e.g. asv_4) are my species. I have about 600 of these.
Edit 2: Scatterplot with mean latitudinal range and latitude (see comment):
Maximum latitude minus the minimum latitude where a species is present, i.e. abundance does not equal 0
A base solution:
lapply(test[grepl("asv", names(test))], \(x) diff(range(test$Lat[x > 0])))
# $asv_3
# [1] 14.0005
#
# $asv_4
# [1] 14.0005
#
# $asv_5
# [1] 14.0005
Its dplyr equivalent:
library(dplyr)
test %>%
summarise(across(starts_with("asv"), ~ diff(range(Lat[.x > 0]))))
# asv_3 asv_4 asv_5
# 1 14.0005 14.0005 14.0005
I have a dataset consist of 4 variables: CR, EN, LC and VU:
View first few values of my dateset
CR = c(2, 9, 10, 14, 24, 27, 29, 30, 34, 43, 50, 74, 86, 105, 140, 155, 200, …)
EN = c(24, 52, 86, 110, 144, 154, 206, 242, 300, 302, 366, 403, 422, 427, 427, 434, 448, …)
LC = c(447, 476, 543, 580, 647, 685, 745, 763, 819, 821, 863, 904, 908, 926, 934, 951, 968, …)
VU = c(75, 96, 97, 217, 297, 498, 511, 551, 560, 564, 570, 575, 609, 673, 681, 700, 755,...)
I want to create a histogram of a group of these variables in a plot by R that shows the normal distribution and density, a plot similar to the one below...
Could you please help me?
Here are the distributions, a clear-cut use of geom_density.
But first, to address "grouping", we need to pivot/reshape the data so that ggplot2 can automatically handle grouping. This will result in a column with a character (or factor) for each of the "CR", "EN", "LC", or "VU", and another column with the particular value. When pivoting, there is typically one or more columns that are preserved (an id, an x-value, a time/date, or something similar), but we don't have any data that would suggest something to preserve.
longdat <- tidyr::pivot_longer(dat, everything())
longdat
# # A tibble: 68 × 2
# name value
# <chr> <dbl>
# 1 CR 2
# 2 EN 24
# 3 LC 447
# 4 VU 75
# 5 CR 9
# 6 EN 52
# 7 LC 476
# 8 VU 96
# 9 CR 10
# 10 EN 86
# # … with 58 more rows
# # ℹ Use `print(n = ...)` to see more rows
ggplot(longdat, aes(x = value, group = name, fill = name)) +
geom_density(alpha = 0.2)
tidyr::pivot_longer works, one can also use melt from either reshape2 or data.table:
longdat <- reshape2::melt(dat, c())
## names are 'variable' and 'value' instead of 'name' and 'value'
Data
dat <- structure(list(CR = c(2, 9, 10, 14, 24, 27, 29, 30, 34, 43, 50, 74, 86, 105, 140, 155, 200), EN = c(24, 52, 86, 110, 144, 154, 206, 242, 300, 302, 366, 403, 422, 427, 427, 434, 448), LC = c(447, 476, 543, 580, 647, 685, 745, 763, 819, 821, 863, 904, 908, 926, 934, 951, 968), VU = c(75, 96, 97, 217, 297, 498, 511, 551, 560, 564, 570, 575, 609, 673, 681, 700, 755)), class = "data.frame", row.names = c(NA, -17L))
Since I just started getting familiar with forecasting, so I stumbled upon the example here based on which I have a few questions:
How can I forecast for the next 5 years?
What are the red and blue shaded areas around the forecast lines and what's the interpretation?
Why is there a break between the forecast lines and the historical lines?
What forecasting model does geom_forecast use?
lungDeaths data:
structure(c(2134, 1863, 1877, 1877, 1492, 1249, 1280, 1131, 1209,
1492, 1621, 1846, 2103, 2137, 2153, 1833, 1403, 1288, 1186, 1133,
1053, 1347, 1545, 2066, 2020, 2750, 2283, 1479, 1189, 1160, 1113,
970, 999, 1208, 1467, 2059, 2240, 1634, 1722, 1801, 1246, 1162,
1087, 1013, 959, 1179, 1229, 1655, 2019, 2284, 1942, 1423, 1340,
1187, 1098, 1004, 970, 1140, 1110, 1812, 2263, 1820, 1846, 1531,
1215, 1075, 1056, 975, 940, 1081, 1294, 1341, 901, 689, 827,
677, 522, 406, 441, 393, 387, 582, 578, 666, 830, 752, 785, 664,
467, 438, 421, 412, 343, 440, 531, 771, 767, 1141, 896, 532,
447, 420, 376, 330, 357, 445, 546, 764, 862, 660, 663, 643, 502,
392, 411, 348, 387, 385, 411, 638, 796, 853, 737, 546, 530, 446,
431, 362, 387, 430, 425, 679, 821, 785, 727, 612, 478, 429, 405,
379, 393, 411, 487, 574), .Dim = c(72L, 2L), .Dimnames = list(
NULL, c("mdeaths", "fdeaths")), .Tsp = c(1974, 1979.91666666667,
12), class = c("mts", "ts", "matrix"))
Code:
library(forecast)
# Data
lungDeaths = cbind(mdeaths, fdeaths)
# Plot
autoplot(lungDeaths) + geom_forecast()
Output:
To remove the gap you can use showgap:
If showgap=FALSE, the gap between the historical observations and the
forecasts is removed.
Code:
library(forecast)
autoplot(lungDeaths) +
geom_forecast(showgap = FALSE)
Output:
To forecast 5 years you can use h to set the number of forecasts:
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE)
Output:
To remove the confidence intervals use PI:
If FALSE, confidence intervals will not be plotted, giving only the
forecast line.
library(forecast)
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE, PI = FALSE)
Output:
I am trying to cut a dendrogram into three classes using the rect.hclust function, but when I export the graph, it cuts off the borders at the bottom of the graph. In addition, I would like to increase the weight of the borders, but I am not sure how to do this, as the lwd argument doesn't seem to exist for this function. What can I do to fix these parameters?
Data:
cluster <- data.frame(plot=c(1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20),
meas1 = c(443,
836,
903,
684,
94,
125,
733,
846,
625,
234,
437,
775,
269,
774,
17,
502,
80,
51,
523,
229),
meas2 = c(735,
574,
793,
261,
961,
136,
404,
138,
45,
935,
698,
675,
594,
497,
152,
153,
30,
667,
547,
745),
meas3 = c(23,
526,
36,
93,
708,
970,
399,
111,
456,
439,
569,
503,
337,
213,
399,
850,
614,
491,
28,
452))
Code for hierarchical clustering and dendrogram generation:
#Generate distance matrix
dist_mat <- dist(cluster, method = 'euclidean')
#Hierarchical clustering
hclust_avg <- hclust(dist_mat, method = 'average')
#Cut into 3 classes
cut_avg <- cutree(hclust_avg, k = 3)
#Plot dendrogram
plot(hclust_avg, xlab = "", ylab = "Euclidean Distance", sub = "",
main = "Cluster Dendrogram", lwd = 2)
#Generate borders around each group
rect.hclust(hclust_avg, k=3, border = 2:6)
Exporting the graph looks as follows:
I need the red-green-blue borders to be thicker (higher border weights), and I also need them to not be cut off at the bottom of the graph.
I bypassed the problem of the missing lower edge of the rectangle by setting the figure margins to zero before drawing the rectangle.
The line width of the rectangles can be set by setting par(lwd), e.g. par(lwd=4), as in the example below:
cluster <- data.frame(plot=1:20,
meas1 = c(443,
836,
903,
684,
94,
125,
733,
846,
625,
234,
437,
775,
269,
774,
17,
502,
80,
51,
523,
229),
meas2 = c(735,
574,
793,
261,
961,
136,
404,
138,
45,
935,
698,
675,
594,
497,
152,
153,
30,
667,
547,
745),
meas3 = c(23,
526,
36,
93,
708,
970,
399,
111,
456,
439,
569,
503,
337,
213,
399,
850,
614,
491,
28,
452))
#Generate distance matrix
dist_mat <- dist(cluster, method = 'euclidean')
#Hierarchical clustering
hclust_avg <- hclust(dist_mat, method = 'average')
#Cut into 3 classes
cut_avg <- cutree(hclust_avg, k = 3)
pars <- par()
#Plot dendrogram
plot(hclust_avg, xlab = "", ylab = "Euclidean Distance", sub = "",
main = "Cluster Dendrogram", lwd = 2)
par(lwd=4, mar=c(0,0,0,0))
#Generate borders around each group
rect.hclust(hclust_avg, k=3, border = 2:6)
# reset par
par(lwd=pars$lwd, mar=pars$mar)
Created on 2020-06-30 by the reprex package (v0.3.0)
I have a dataset with y-axis = diversity indices and x-axis = depth. I am looking at how diversity changes with depth (increases/decreases). It is informative to visualize these changes over depth (so transforming isn't helpful), however it is difficult with the disparity between number of samples for different depths (more samples at shallower versus deeper depths. With the following code:
breaks_depth=c(0,50,100,150,250,350,450,500,1200)
ggplot(data=df, aes(x=Depth, y=Diversity)) +
geom_line()+
scale_y_continuous(breaks=seq(0,1400,200), limits=c(0,1400))+
scale_x_continuous(breaks=breaks_depth, limits=c(0,1200))
I get the following plot:
I would like to get a plot such that the distance between 500m and 1200m depth is smaller and the distance between the shallower depths (0-150m) is greater. Is this possible? I have tried expand and different break and limit variations. The dput() of this dataset can be found here. The rownames are the sample IDs and the columns I am using for the plot are: y-axis=invsimpson_rd, and x-axis=Depth_rd. TIA.
****EDIT*****
Winner code and plot modified from Calum's answer below.
ggplot(data=a_div, aes(x=Depth_rd, y=invsimpson_rd)) +
geom_line()+
scale_y_continuous(breaks=seq(0,1400,200), limits=c(0,1400))+
scale_x_continuous(trans="log10",breaks = c(0,
15,25,50,100,150,200,250,300,350,400,450, seq(600, 1200, by = 200)))
Here's an example with the built in economics dataset. You can see that you can specify the breaks however you want as per usual, but the "sqrt" transformation shifts the actual plotted values to have more space near the beginning of the series. You can use other built in transformations or define your own as well.
EDIT: updated with example data and some comparison of common different trans options.
library(tidyverse)
tbl <- structure(list(yval = c(742, 494, 919, 625, 124, 788, 583, 213, 715, 363, 15, 313, 472, 559, 314, 494, 388, 735, 242, 153, 884, 504, 267, 454, 325, 305, 746, 628, 549, 345, 327, 230, 271, 486, 971, 979, 857, 779, 394, 903, 585, 238, 702, 850, 611, 710, 694, 674, 1133, 468, 784, 634, 234, 61, 325, 505, 693, 1019, 766, 435, 407, 772, 925, 877, 187, 290, 782, 674, 1263, 1156, 935, 499, 791, 797, 537, 308, 761, 744, 674, 764, 560, 805, 540, 427, 711), xval = c(80, 350, 750, 100, 20, 200, 350, 50, 110, 20, 200, 350, 60, 100, 20, 40, 60, 100, 20, 40, 350, 50, 20, 40, 50, 30, 40, 260, 1000, 200, 200, 200, 500, 50, 350, 360, 380, 250, 60, 190, 40, 70, 70, 40, 40, 70, 180, 180, 440, 370, 130, 1200, 20, 20, 30, 80, 120, 200, 220, 120, 40, 80, 350, 750, 20, 80, 200, 320, 500, 220, 160, 80, 140, 350, 100, 40, 350, 100, 200, 340, 60, 40, 100, 60, 40)), .Names = c("yval", "xval"), row.names = c(NA, -85L), class = c("tbl_df", "tbl", "data.frame"))
ggplot(tbl) +
geom_line(aes(x = xval, y = yval)) +
scale_x_continuous(trans = "sqrt", breaks = c(0,50,100,150,250,350,450,500,1200))
ggplot(tbl) +
geom_line(aes(x = xval, y = yval)) +
scale_x_continuous(trans = "log10", breaks = c(0,50,100,150,250,350,450,500,1200))
Created on 2018-04-27 by the reprex package (v0.2.0).