How can I delineate multiple watersheds in R using the streamstats package? - r

There is an R package in development that I would like to use called streamstats. What it does is delineate a watershed (within the USA) for a latitude & longitude point along a body of water and provides watershed characteristics such as drainage area and proportions of various land covers. What I would like to do is extract some watershed characteristics of interest from a data frame of several lat & long positions.
I can get the package to do what I want for one point
devtools::install_github("markwh/streamstats")
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 3))
x <- c("state","lat","long")
colnames(dat1) <- x
dat1$state <- c("NJ","NY","VA")
dat1$lat <- c(40.99194,42.02458,38.04235)
dat1$long <- c(-74.28000,-75.11928,-79.88144)
test_dat <- dat1[1,]
ws1 <- delineateWatershed(xlocation = test_dat$long, ylocation = test_dat$lat, crs = 4326,
includeparameters = "true", includeflowtypes = "true")
chars1 <- computeChars(workspaceID = ws1$workspaceID, rcode = "MA")
chars1$parameters
However what I would like is to be able to give the delineateWatershed function several watersheds at once (i.e., all 3 locations found in dat1) and combine the chars1$parameters output variables DRNAREA,FOREST,LC11DEV, and LC11IMP into a data frame. Maybe this could be achieved with a for loop?
The ideal output would look like this
state lat long DRNAREA FOREST LC11DEV LC11IMP
1 NJ 40.99194 -74.28000 160 66.2 26.20 5.50
2 NY 42.02458 -75.11928 457 89.3 2.52 0.18
3 VA 38.04235 -79.88144 158 NA 4.63 0.20

I would put what you have in a function then use purrr::pmap_df() to loop through each row in dat1 then bind all the results together. See also this answer
library(dplyr)
library(purrr)
library(tidyr)
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 2))
colnames(dat1) <- c("state", "lat", "long")
dat1$state <- c("NJ", "NY")
dat1$lat <- c(40.99194, 42.02458)
dat1$long <- c(-74.28000, -75.11928)
dat1
#> state lat long
#> 1 NJ 40.99194 -74.28000
#> 2 NY 42.02458 -75.11928
Define a function for catchment delineation
catchment_delineation <- function(rcode_in, lat_y, long_x) {
print(paste0("Processing for lat = ", lat_y, " and long = ", long_x))
ws <- delineateWatershed(xlocation = long_x, ylocation = lat_y, crs = 4326,
includeparameters = "true", includeflowtypes = "true")
ws_properties <- computeChars(workspaceID = ws$workspaceID, rcode = rcode_in)
# keep only what we need
ws_properties_df <- ws_properties$parameters %>%
filter(code %in% c("DRNAREA", "FOREST", "LC11DEV", "LC11IMP")) %>%
mutate(ID = ws$workspaceID,
state = rcode_in,
long = long_x,
lat = lat_y)
return(ws_properties_df)
}
Apply the function to each row in dat1 data frame
catchment_df <- pmap_df(dat1, ~ catchment_delineation(..1, ..2, ..3))
#> https://streamstats.usgs.gov/streamstatsservices/watershed.geojson?rcode=NJ&xlocation=-74.28&ylocation=40.99194&includeparameters=true&includeflowtypes=true&includefeatures=true&crs=4326https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NJ&workspaceID=NJ20210923064141811000&includeparameters=truehttps://streamstats.usgs.gov/streamstatsservices/watershed.geojson?rcode=NY&xlocation=-75.11928&ylocation=42.02458&includeparameters=true&includeflowtypes=true&includefeatures=true&crs=4326https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NY&workspaceID=NY20210923064248530000&includeparameters=true
catchment_df
#> ID name
#> 1 NJ20210923064141811000 Drainage Area
#> 2 NJ20210923064141811000 Percent Forest
#> 3 NJ20210923064141811000 Percent Developed from NLCD2011
#> 4 NJ20210923064141811000 Percent_Impervious_NLCD2011
#> 5 NY20210923064248530000 Drainage Area
#> 6 NY20210923064248530000 Percent Forest
#> 7 NY20210923064248530000 Percent Developed from NLCD2011
#> 8 NY20210923064248530000 Percent_Impervious_NLCD2011
#> description
#> 1 Area that drains to a point on a stream
#> 2 Percentage of area covered by forest
#> 3 Percentage of developed (urban) land from NLCD 2011 classes 21-24
#> 4 Average percentage of impervious area determined from NLCD 2011 impervious dataset
#> 5 Area that drains to a point on a stream
#> 6 Percentage of area covered by forest
#> 7 Percentage of developed (urban) land from NLCD 2011 classes 21-24
#> 8 Average percentage of impervious area determined from NLCD 2011 impervious dataset
#> code unit value state long lat
#> 1 DRNAREA square miles 160.00 NJ -74.28000 40.99194
#> 2 FOREST percent 66.20 NJ -74.28000 40.99194
#> 3 LC11DEV percent 26.20 NJ -74.28000 40.99194
#> 4 LC11IMP percent 5.50 NJ -74.28000 40.99194
#> 5 DRNAREA square miles 457.00 NY -75.11928 42.02458
#> 6 FOREST percent 89.30 NY -75.11928 42.02458
#> 7 LC11DEV percent 2.52 NY -75.11928 42.02458
#> 8 LC11IMP percent 0.18 NY -75.11928 42.02458
Reshape the result to desired format
catchment_reshape <- catchment_df %>%
select(state, long, lat, code, value) %>%
pivot_wider(names_from = code,
values_from = value)
catchment_reshape
#> # A tibble: 2 x 7
#> state long lat DRNAREA FOREST LC11DEV LC11IMP
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 NJ -74.3 41.0 160 66.2 26.2 5.5
#> 2 NY -75.1 42.0 457 89.3 2.52 0.18
Created on 2021-09-22 by the reprex package (v2.0.1)

Since you mentioned the use a for loop I thought why not make a solution of it.
Here is your data:
library(dplyr)
library(purrr)
library(tidyr)
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 2))
colnames(dat1) <- c("state", "lat", "long")
dat1$state <- c("NJ", "NY")
dat1$lat <- c(40.99194, 42.02458)
dat1$long <- c(-74.28000, -75.11928)
dat1
Create an empty list to store the watershed characteristics:
water_shed <- list()
Loop through the dat1 and return the properties for each respective longitude and latitude:
for(i in 1:nrow(dat1)){
water_shed[[i]] <-
delineateWatershed(xlocation = dat1$long[i], ylocation = dat1$lat[i], crs = 4326,
includeparameters = "true", includeflowtypes = "true")
}
Now create a list to store the watershed properties:
ws_properties <- list()
Loop through the water_shed returning the parameters of each location:
for(i in 1:length(water_shed)){
ws_properties[[i]] <- computeChars(workspaceID = water_shed[[i]][[1]], rcode = dat1$state)
}
Finally, create a dataframe for your desired outputs then append the properties for each location looping through the list of watershed properties:
# data frame:
ws_properties_df <- data.frame(state=character(),long=integer(), lat=integer(),
DRNAREA = integer(), FOREST = integer(), LC11DEV = integer(), LC11IMP = integer(),
stringsAsFactors=FALSE)
#append properties for eact location
for(i in 1:length(ws_properties)){
ws_properties_df[i,] <- ws_properties[[i]]$parameters %>%
filter(code %in% c("DRNAREA", "FOREST", "LC11DEV", "LC11IMP")) %>%
mutate(state = dat1$state[i],
long = dat1$long[i],
lat = dat1$lat[i]) %>%
select(state, long, lat, code, value) %>%
pivot_wider(names_from = code,
values_from = value)
}
Desired Output:

Related

R: Comparing Subgroups From Different Datasets

I am working with the R programming language.
I have the following dataset that contains the heights and weights of people from Canada - using the value of height (cm), I split weight (kg) into bins based on ntiles, and calculated the average value of var2 within each ntile bin:
library(dplyr)
library(gtools)
set.seed(123)
canada = data.frame(height = rnorm(10000,150,10), weight = rnorm(10000,90, 10))
Part_1 = canada %>%
mutate(quants = quantcut(weight, 100),
rank = as.numeric(quants)) %>%
group_by(quants) %>%
mutate(min = min(weight), max = max(weight), count = n(), avg_height = mean(height))
Part_1 = Part_1 %>% distinct(rank, .keep_all = TRUE)
> Part_1
# A tibble: 100 x 8
# Groups: quants [100]
height weight quants rank min max count avg_height
<dbl> <dbl> <fct> <dbl> <dbl> <dbl> <int> <dbl>
1 144. 114. (110.2,113.9] 99 110. 114. 100 150.
2 148. 88.3 (88.12,88.38] 44 88.1 88.4 100 149.
3 166. 99.3 (99.1,99.52] 83 99.1 99.5 100 152.
4 151. 84.3 (84.14,84.44] 29 84.1 84.4 100 150.
For example, I see that there are 100 people between the weight range of 100.2 - 113.9 kg and the average height of these people is 150 cm
Now, suppose I have a similar dataset for people from the USA:
set.seed(124)
usa = data.frame(height = rnorm(10000,150,10), weight = rnorm(10000,90, 10))
My Question: Based on the weight ranges I calculated using the Canada dataset - I want to find out how many people from the USA fall within these Canadian ranges and what is the average weight of the Americans within these Canadian ranges
For example:
In the Canada dataset, I saw that there are 100 people between the weight range of 100.2 - 113.9 kg and the average height of these people is 150 cm
How many Americans are between the weight range of 100.2 - 113.9 kg and what is the average height of these Americans?
I know that I can do this manually for each rank:
americans_in_canadian_rank99 = usa %>%
filter(weight > 110.2 & weight < 113.9) %>%
group_by() %>%
summarize(count = n(), avg_height = mean(height))
americans_in_canadian_rank44 = usa %>%
filter(weight > 88.1 & weight < 88.4) %>%
group_by() %>%
summarize(count = n(), avg_height = mean(height))
In the end, I would be looking for something a desired output like this:
# number of rows should be = number of unique ranks
canadian_rank min_weight max_weight canadian_count canadian_avg_height american_count american_avg_height
1 99 110.2 113.9 100 150 116 150
2 44 88.1 88.4 100 149 154 150
Can someone please help me figure out a better way to do this?
Thanks!
Note: updated based on the desired output format combining the two sets:
This can be done in a straight-forward manner using the non-equijoin functionality of data.table.
library(data.table)
library(gtools)
set.seed(123)
canada = data.table(height = rnorm(10000,150,10), weight = rnorm(10000,90, 10))
set.seed(124)
usa = data.table(height = rnorm(10000,150,10), weight = rnorm(10000,90, 10))
## You can also use data.table to generate your Part_1 summary table
Part_1 <- canada[, .(min = min(weight),
max = max(weight),
count = .N,
avg_height = mean(height)), keyby = .(quants = quantcut(weight,100))]
Part_1[, rank := as.numeric(quants)]
## Join using a non-equi join to combine data sets
usa[Part_1, on = .(weight >= min,
weight < max)
## On the join result, compute same summary states by quants & rank
][, .(usa_count = .N,
usa_avg_height = mean(height)), keyby = .(rank,
quants,
## whenever we do a non-equijoin, the foreign key values, in this case min/max
## overwrite the local keys. Since we used weight twice, canadian min/max
## will show up in the join result table as weight and weight.1
min_weight = weight,
max_weight = weight.1,
## To keep both sets of results distinct, we can rename columns in our "by" statement
canadian_count = count,
canadian_avg_height = avg_height)]
Gives results as follows:
rank quants min_weight max_weight canadian_count canadian_avg_height usa_count usa_avg_height
1: 1 [55.11,66.71] 55.11266 66.69011 100 149.2101 114 149.8116
2: 2 (66.71,69.48] 66.70575 69.46055 100 149.0639 119 148.6486
3: 3 (69.48,71.15] 69.48011 71.13895 100 150.5331 94 148.4336
4: 4 (71.15,72.44] 71.14747 72.43042 100 150.4779 104 149.8926
Also, another option would be to assign result columns for the usa table directly back to your Part_1 summary table in place.
## This is a two-part nested join
Part_1[
## Start by creating a result that matches Part_1 ranks to all usa data
Part_1[usa,on = .(min <= weight,
max > weight)
## Compute aggregated results on the join table result
][,.(usa_count = .N,
usa_avg_height = mean(height)), by = .(rank)],
## Finaly, assign results back to the Part_1 summary table joined by rank
c("usa_count",
"usa_avg_height") := .(usa_count,
usa_avg_height), on = .(rank)]
Gives the following
quants min max count avg_height rank usa_count usa_avg_height
1: [55.11,66.71] 55.11266 66.69011 100 149.2101 1 114 149.8116
2: (66.71,69.48] 66.70575 69.46055 100 149.0639 2 119 148.6486
3: (69.48,71.15] 69.48011 71.13895 100 150.5331 3 94 148.4336
4: (71.15,72.44] 71.14747 72.43042 100 150.4779 4 104 149.8926
With data.table you can do this:
library(data.table)
library(stringr)
dt1 <- as.data.table(usa)
dt1 <- dt1[, c("min", "max") := weight]
dt2 <- as.data.table(Part_1 %>% select("quants", "rank"))
dt2 <- cbind(dt2[,.(rank)],
setDT(tstrsplit(str_sub(dt2$quants, 2, -2), ",", fixed = TRUE, names = c("min", "max"))))
dt2 <- dt2[, lapply(.SD, as.numeric)]
setkey(dt2, min, max)
dt1 <- dt1[, rank := dt2$rank[foverlaps(dt1, dt2, by.x = c("min", "max"), by.y = c("min", "max"), which = TRUE)$yid]] %>%
select(-c("min", "max"))
EDIT
Totally missed the last part. But if you wish to do that, it should be relatively straightforward from the last point (you could use dplyr for that if you wish):
dt3 <- rbind(canada %>%
mutate(quants = quantcut(weight, 100),
rank = as.numeric(quants),
country = "Canada") %>%
as.data.table(),
copy(dt1)[, country := "USA"], fill = TRUE)
dt3 <- dt3[,.(count = .N, avg_height = mean(height)), by = c("rank", "country")] %>%
dcast(rank ~ country, value.var = c("count", "avg_height")) %>%
merge(dt2 %>% rename("min_weight" = "min", "max_weight" = "max"), by = c("rank"), all.x = TRUE)
EDIT 2
Alternatively, you could try to do something similar using cut function without learning anything from data.table
rank_breaks <- Part_1 %>%
mutate(breaks = sub(",.*", "", str_sub(quants, 2)) %>% as.numeric()) %>%
arrange(rank) %>%
pull(breaks)
# Here I change minimum and maximum of groups 1 and 100 to -Inf and Inf respectively.
# If you do not wish to do so, you can disregard it and run `rank_breaks <- c(rank_breaks, max(canada$weight))` instead
rank_breaks[1] <- -Inf
rank_breaks <- c(rank_breaks, Inf)
usa <- usa %>%
mutate(rank = cut(weight, breaks = rank_breaks, labels = c(1:100)))
You can use fuzzyjoin for this.
library(fuzzyjoin)
# take percentile ranges and join US data
us_by_canadian_quantiles <- Part_1 |>
ungroup() |>
distinct(rank, min, max, height_avg_can = avg_height) |>
fuzzy_full_join(usa, by = c(min = "weight", max = "weight"), match_fun = c(`<`, `>=`))
# get count and height average per bin
us_by_canadian_quantiles |>
group_by(rank) |>
summarize(n_us = n(),
height_avg_us = mean(height),
height_avg_can = first(height_avg_can)
)
#> # A tibble: 101 × 4
#> rank n_us height_avg_us height_avg_can
#> <dbl> <int> <dbl> <dbl>
#> 1 1 114 150. 149.
#> 2 2 119 149. 149.
#> 3 3 94 148. 151.
#> 4 4 104 150. 150.
#> 5 5 115 152. 150.
#> 6 6 88 150. 149.
#> 7 7 86 150. 150.
#> 8 8 86 150. 151.
#> 9 9 102 151. 151.
#> 10 10 81 152. 150.
#> # … with 91 more rows
Note that there are a number of cases in the US frame which fall outside of the Canadian percentile ranges. They are grouped together here with rank being NA, but you could also add ranks 0 and 101 if you wanted to distinguish them.
I should note that fuzzyjoin tends to be much slower than data.table. But since you have already gotten a data.table solution, this might be more to your liking.

Calculating distance between all locations to first location, by group

I have GPS locations from several seabird tracks, each starting from colony x. Therefore the individual tracks all have similar first locations. For each track, I would like to calculate the beeline distance between each GPS location and either (a) a specified location that represents the location of colony x, or (b) the first GPS point of a given track which represents the location of colony x. For (b), I would look to use the first location of each new track ID (track_id).
I have looked for appropriate functions in geosphere, sp, raster, adehabitatLT, move, ... and just cannot seem to find what I am looking for.
I can calculate the distance between successive GPS points, but that is not what I need.
package(dplyr)
df %>%
group_by(ID) %>%
mutate(lat_prev = lag(Lat,1), lon_prev = lag(Lon,1) ) %>%
mutate(dist = distVincentyEllipsoid(matrix(c(lon_prev, lat_prev), ncol = 2), # or use distHaversine
matrix(c(Lon, Lat), ncol = 2)))
#example data:
df <- data.frame(Lon = c(-96.8, -96.60861, -96.86875, -96.14351, -92.82518, -90.86053, -90.14208, -84.64081, -83.7, -82, -80, -88.52732, -94.46049,-94.30, -88.60, -80.50, -81.70, -83.90, -84.60, -90.10, -90.80, -92.70, -96.10, -96.55, -96.50, -96.00),
Lat = c(25.38657, 25.90644, 26.57339, 27.63348, 29.03572, 28.16380, 28.21235, 26.71302, 25.12554, 24.50031, 24.89052, 30.16034, 29.34550, 29.34550, 30.16034, 24.89052, 24.50031, 25.12554, 26.71302, 28.21235, 28.16380, 29.03572, 27.63348, 26.57339, 25.80000, 25.30000),
ID = c(rep("ID1", 13), rep("ID2", 13)))
Grateful for any pointers.
You were pretty close. The key is that you want to calcualte the distance from the first observation in each track. Therefore you need to first adorn with the order in each track (easy to do with dplyr::row_number()). Then for the distance calculation, make the reference observation always the first by subsetting with order == 1.
library(tidyverse)
library(geosphere)
df <- data.frame(Lon = c(-96.8, -96.60861, -96.86875, -96.14351, -92.82518, -90.86053, -90.14208, -84.64081, -83.7, -82, -80, -88.52732, -94.46049,-94.30, -88.60, -80.50, -81.70, -83.90, -84.60, -90.10, -90.80, -92.70, -96.10, -96.55, -96.50, -96.00),
Lat = c(25.38657, 25.90644, 26.57339, 27.63348, 29.03572, 28.16380, 28.21235, 26.71302, 25.12554, 24.50031, 24.89052, 30.16034, 29.34550, 29.34550, 30.16034, 24.89052, 24.50031, 25.12554, 26.71302, 28.21235, 28.16380, 29.03572, 27.63348, 26.57339, 25.80000, 25.30000),
ID = c(rep("ID1", 13), rep("ID2", 13)))
df %>%
group_by(ID) %>%
mutate(order = row_number()) %>%
mutate(dist = distVincentyEllipsoid(matrix(c(Lon[order == 1], Lat[order == 1]), ncol = 2),
matrix(c(Lon, Lat), ncol = 2)))
#> # A tibble: 26 x 5
#> # Groups: ID [2]
#> Lon Lat ID order dist
#> <dbl> <dbl> <chr> <int> <dbl>
#> 1 -96.8 25.4 ID1 1 0
#> 2 -96.6 25.9 ID1 2 60714.
#> 3 -96.9 26.6 ID1 3 131665.
#> 4 -96.1 27.6 ID1 4 257404.
#> 5 -92.8 29.0 ID1 5 564320.
#> 6 -90.9 28.2 ID1 6 665898.
#> 7 -90.1 28.2 ID1 7 732131.
#> 8 -84.6 26.7 ID1 8 1225193.
#> 9 -83.7 25.1 ID1 9 1319482.
#> 10 -82 24.5 ID1 10 1497199.
#> # ... with 16 more rows
Created on 2022-01-09 by the reprex package (v2.0.1)
This also seems to work (sent to me by a friend) - very similar to Dan's suggestion above, but slightly different
library(geosphere)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Dist_to_col = distHaversine(c(Lon[1], Lat[1]),cbind(Lon,Lat)))

Translating a `for loop` into purrr using `map` function in r

I need to download weather data from NASA’s POWER (Prediction Of Worldwide Energy Resource). The package nasapower is a package developed for data retrieval using R. I need to download many locations (lat, long coordinates). To do this I tried a simple loop with three locations as a reproducible example.
library(nasapower)
data1 <- read.csv(text = "
location,long,lat
loc1, -56.547, -14.2427
loc2, -57.547, -15.2427
loc3, -58.547, -16.2427")
i=1
all.weather <- data.frame()
for (i in seq_along(1:nrow(data1))) {
weather.data <- get_power(community = "AG",
lonlat = c(data1$long[i],data1$lat[i]),
dates = c("2015-01-01", "2015-01-10"),
temporal_average = "DAILY",
pars = c("T2M_MAX"))
all.weather <-rbind(all.weather, weather.data)
}
This works perfect. The problem is that I am trying to mimic this using purrr::map since I want to have an alternative within tidyverse. This is what I did but it does not work:
library(dplyr)
library(purrr)
all.weather <- data1 %>%
group_by(location) %>%
map(get_power(community = "AG",
lonlat = c(long, lat),
dates = c("2015-01-01", "2015-01-10"),
temporal_average = "DAILY",
site_elevation = NULL,
pars = c("T2M_MAX")))
I got the following error:
Error in isFALSE(length(lonlat != 2)) : object 'long' not found
Any hint on how to run this using purrr?
To make your code work make use of purrr::pmap instead of map like so:
map is for one argument functions, map2 for two argument funs and pmap is the most general one allowing for funs with more than two arguments.
pmap will loop over the rows of your df. As your df has 3 columns 3 arguments are passed to the function, even if the first argument location is not used. To make this work and to make use of the column names you have to specify the function and the argument names via function(location, long, lat)
library(nasapower)
data1 <- read.csv(text = "
location,long,lat
loc1, -56.547, -14.2427
loc2, -57.547, -15.2427
loc3, -58.547, -16.2427")
library(dplyr)
library(purrr)
all.weather <- data1 %>%
pmap(function(location, long, lat) get_power(community = "AG",
lonlat = c(long, lat),
dates = c("2015-01-01", "2015-01-10"),
temporal_average = "DAILY",
site_elevation = NULL,
pars = c("T2M_MAX"))) %>%
# Name list with locations
setNames(data1$location) %>%
# Add location names as identifiers
bind_rows(.id = "location")
head(all.weather)
#> NASA/POWER SRB/FLASHFlux/MERRA2/GEOS 5.12.4 (FP-IT) 0.5 x 0.5 Degree Daily Averaged Data
#> Dates (month/day/year): 01/01/2015 through 01/10/2015
#> Location: Latitude -14.2427 Longitude -56.547
#> Elevation from MERRA-2: Average for 1/2x1/2 degree lat/lon region = 379.25 meters Site = na
#> Climate zone: na (reference Briggs et al: http://www.energycodes.gov)
#> Value for missing model data cannot be computed or out of model availability range: NA
#>
#> Parameters:
#> T2M_MAX MERRA2 1/2x1/2 Maximum Temperature at 2 Meters (C)
#>
#> # A tibble: 6 x 9
#> location LON LAT YEAR MM DD DOY YYYYMMDD T2M_MAX
#> <chr> <dbl> <dbl> <dbl> <int> <int> <int> <date> <dbl>
#> 1 loc1 -56.5 -14.2 2015 1 1 1 2015-01-01 29.9
#> 2 loc1 -56.5 -14.2 2015 1 2 2 2015-01-02 30.1
#> 3 loc1 -56.5 -14.2 2015 1 3 3 2015-01-03 27.3
#> 4 loc1 -56.5 -14.2 2015 1 4 4 2015-01-04 28.7
#> 5 loc1 -56.5 -14.2 2015 1 5 5 2015-01-05 30
#> 6 loc1 -56.5 -14.2 2015 1 6 6 2015-01-06 28.7

pie chart for selected combobox item

I want a chart like this
I plot a pie chart in dashboard, but I want to plot a pie chart for the selected item in combobox, with the function plotly
my Data
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
My code to plot one pie-chart:
labels=c("unrecovered","death","recovered")
USA=filter(DTF,DTF$State=="USA" )
USA=c(USA$Totalcases,USA$Totaldeath,USA$Totalrecovred)
p1= plot_ly(labels = ~labels,
values = ~USA, type = 'pie',
marker = list(colors = brewer.pal(7,"Spectral")))
p1
Thanks.
The problem is: your dataset is a total mess.(; Try this:
library(plotly)
library(RColorBrewer)
library(dplyr)
library(tidyr)
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
dtf_long <- DTF %>%
pivot_longer(-State, names_to = "labels") %>%
mutate(labels = gsub("total", "", labels),
labels = ifelse(labels == "cases", "unrecovered", labels))
dtf_long
#> # A tibble: 12 x 3
#> State labels value
#> <fct> <chr> <dbl>
#> 1 USA unrecovered 553
#> 2 USA recovered 12
#> 3 USA death 48
#> 4 Belgium unrecovered 226
#> 5 Belgium recovered 22
#> 6 Belgium death 24
#> 7 France unrecovered 742
#> 8 France recovered 78
#> 9 France death 12
#> 10 Russia unrecovered 370
#> 11 Russia recovered 21
#> 12 Russia death 22
usa <- filter(dtf_long, State == "USA")
p1 <- usa %>%
plot_ly(labels = ~labels,
values = ~value, type = 'pie',
marker = list(colors = brewer.pal(7, "Spectral")))
p1
Created on 2020-04-04 by the reprex package (v0.3.0)

how to scrape text from a HTML body

I've never scraped. Would it be straightforward to scrape the text in the main, big gray box only from the link below (starting with header SRUS43 KMSR 271039, ending with .END)? My end goal is to basically have three tidy columns of data from all that text: the five digit codes, the values in inches, and the basin elevation descriptions, so any pointers with processing the text format are welcome, too.
https://www.nohrsc.noaa.gov/shef_archive/index.html?rfc=cnrfc&product=swe&year=2019&month=3&day=27&hour=6
thank you for any help.
Reading in the text is fairly easy (see #DiceBoyT answer). Cleaning up the format for three columns is a bit more involved. Below could use some clean-up (especially with the regex), but it gets the job done:
library(tidyverse)
library(rvest)
text <- read_html("https://www.nohrsc.noaa.gov/shef_archive/index.html?rfc=cnrfc&product=swe&year=2019&month=3&day=27&hour=6") %>%
html_node(".notes") %>%
html_text()
df <- tibble(txt = read_lines(text))
df %>%
mutate(
row = row_number(),
with_code = str_extract(txt, "^[A-z0-9]{5}\\s+\\d+(\\.)?\\d"),
wo_code = str_extract(txt, "^:?\\s+\\d+(\\.)?\\d") %>% str_extract("[:digit:]+\\.?[:digit:]"),
basin_desc = if_else(!is.na(with_code), lag(txt, 1), NA_character_) %>% str_sub(start = 2)
) %>%
separate(with_code, c("code", "val"), sep = "\\s+") %>%
mutate(
combined_val = case_when(
!is.na(val) ~ val,
!is.na(wo_code) ~ wo_code,
TRUE ~ NA_character_
) %>% as.numeric
) %>%
filter(!is.na(combined_val)) %>%
mutate(
code = zoo::na.locf(code),
basin_desc = zoo::na.locf(basin_desc)
) %>%
select(
code, combined_val, basin_desc
)
#> # A tibble: 643 x 3
#> code combined_val basin_desc
#> <chr> <dbl> <chr>
#> 1 ACSC1 0 San Antonio Ck - Sunol
#> 2 ADLC1 0 Arroyo De La Laguna
#> 3 ADOC1 0 Santa Ana R - Prado Dam
#> 4 AHOC1 0 Arroyo Honda nr San Jose
#> 5 AKYC1 41 SF American nr Kyburz
#> 6 AKYC1 3.2 SF American nr Kyburz
#> 7 AKYC1 42.2 SF American nr Kyburz
#> 8 ALQC1 0 Alamo Canal nr Pleasanton
#> 9 ALRC1 0 Alamitos Ck - Almaden Res
#> 10 ANDC1 0 Coyote Ck - Anderson Res
#> # ... with 633 more rows
Created on 2019-03-27 by the reprex package (v0.2.1)
This is pretty straightforward to scrape with rvest:
library(rvest)
text <- read_html("https://www.nohrsc.noaa.gov/shef_archive/index.html?rfc=cnrfc&product=swe&year=2019&month=3&day=27&hour=6") %>%
html_node(".notes") %>%
html_text()

Resources