I want a chart like this
I plot a pie chart in dashboard, but I want to plot a pie chart for the selected item in combobox, with the function plotly
my Data
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
My code to plot one pie-chart:
labels=c("unrecovered","death","recovered")
USA=filter(DTF,DTF$State=="USA" )
USA=c(USA$Totalcases,USA$Totaldeath,USA$Totalrecovred)
p1= plot_ly(labels = ~labels,
values = ~USA, type = 'pie',
marker = list(colors = brewer.pal(7,"Spectral")))
p1
Thanks.
The problem is: your dataset is a total mess.(; Try this:
library(plotly)
library(RColorBrewer)
library(dplyr)
library(tidyr)
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
dtf_long <- DTF %>%
pivot_longer(-State, names_to = "labels") %>%
mutate(labels = gsub("total", "", labels),
labels = ifelse(labels == "cases", "unrecovered", labels))
dtf_long
#> # A tibble: 12 x 3
#> State labels value
#> <fct> <chr> <dbl>
#> 1 USA unrecovered 553
#> 2 USA recovered 12
#> 3 USA death 48
#> 4 Belgium unrecovered 226
#> 5 Belgium recovered 22
#> 6 Belgium death 24
#> 7 France unrecovered 742
#> 8 France recovered 78
#> 9 France death 12
#> 10 Russia unrecovered 370
#> 11 Russia recovered 21
#> 12 Russia death 22
usa <- filter(dtf_long, State == "USA")
p1 <- usa %>%
plot_ly(labels = ~labels,
values = ~value, type = 'pie',
marker = list(colors = brewer.pal(7, "Spectral")))
p1
Created on 2020-04-04 by the reprex package (v0.3.0)
Related
Let's say that you have a list in R named Weather which is structured like this:
- Winter
- Europe
- Rainfall
- Temperature
- Asia
- Rainfall
- Temperature
- Africa
- Rainfall
- Temperature
- Summer
- Europe
- Rainfall
- Temperature
- Asia
- Rainfall
- Temperature
- Africa
- Rainfall
- Temperature
Each of the objects stored in Rainfall or Temperature is a data frame which contains two variables: Date and Temp in case of Temperature and Date and Rain in the case of Rainfall.
I want to turn this list of lists into a data frame which has the following variables: Date, Temp, Rain, Continent, Season.
I don't understand how can I automatically add the ID variable to the lower levels of the list, pulling the names from the lower levels of the list. For now I have to do it manually like this:
Weather$Winter$Europe$Rainfall$Continent <- "Europe"
Weather$Winter$Europe$Temperature$Continent <- "Europe"
Weather$Winter$Asia$Rainfall$Continent <- "Asia"
Weather$Winter$Asia$Temperature$Continent <- "Asia"
Weather$Winter$Africa$Rainfall$Continent <- "Africa"
Weather$Winter$Africa$Temperature$Continent <- "Africa"
Weather$Summer$Europe$Rainfall$Continent <- "Europe"
Weather$Summer$Europe$Temperature$Continent <- "Europe"
Weather$Summer$Asia$Rainfall$Continent <- "Asia"
Weather$Summer$Asia$Temperature$Continent <- "Asia"
Weather$Summer$Africa$Rainfall$Continent <- "Africa"
Weather$Summer$Africa$Temperature$Continent <- "Africa"
Then I use map_depth and reduce to join the lists. After that, I have to repeat this process to manually add the Season variable, and only then am I able to reduce all the data frames into a single one.
I am looking for a method that would allow me to automatically create the ID variables (Continent, Season), which are actually pulled from list names, while reducing the list.
You can use this as a starting point:
library(tidyverse)
data <- list(
Winter = list(
Europe = list(
Rainfall = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Rain = c(20, 10, 15)),
Temperature = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Temp = c(0, 0.5, 0.8))
),
Asia = list(
Rainfall = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Rain = c(30, 35, 34)),
Temperature = data.frame(Date = c("2022-06-01", "2022-06-02", "2022-06-03"), Temp = c(0, 0, 0.1))
)
)
)
data
#> $Winter
#> $Winter$Europe
#> $Winter$Europe$Rainfall
#> Date Rain
#> 1 2022-06-01 20
#> 2 2022-06-02 10
#> 3 2022-06-03 15
#>
#> $Winter$Europe$Temperature
#> Date Temp
#> 1 2022-06-01 0.0
#> 2 2022-06-02 0.5
#> 3 2022-06-03 0.8
#>
#>
#> $Winter$Asia
#> $Winter$Asia$Rainfall
#> Date Rain
#> 1 2022-06-01 30
#> 2 2022-06-02 35
#> 3 2022-06-03 34
#>
#> $Winter$Asia$Temperature
#> Date Temp
#> 1 2022-06-01 0.0
#> 2 2022-06-02 0.0
#> 3 2022-06-03 0.1
data %>%
rapply(as.list) %>%
enframe() %>%
separate(name, into = c("Season", "Continent", "table", "name"), sep = "[.]") %>%
mutate(name = name %>% str_remove("[0-9]+$")) %>%
select(-table) %>%
pivot_wider(values_fn = list) %>%
unnest(Date) %>%
unnest(Rain) %>%
unnest(Temp)
#> # A tibble: 108 × 5
#> Season Continent Date Rain Temp
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Winter Europe 2022-06-01 20 0
#> 2 Winter Europe 2022-06-01 20 0.5
#> 3 Winter Europe 2022-06-01 20 0.8
#> 4 Winter Europe 2022-06-01 10 0
#> 5 Winter Europe 2022-06-01 10 0.5
#> 6 Winter Europe 2022-06-01 10 0.8
#> 7 Winter Europe 2022-06-01 15 0
#> 8 Winter Europe 2022-06-01 15 0.5
#> 9 Winter Europe 2022-06-01 15 0.8
#> 10 Winter Europe 2022-06-02 20 0
#> # … with 98 more rows
Created on 2022-06-28 by the reprex package (v2.0.0)
There is an R package in development that I would like to use called streamstats. What it does is delineate a watershed (within the USA) for a latitude & longitude point along a body of water and provides watershed characteristics such as drainage area and proportions of various land covers. What I would like to do is extract some watershed characteristics of interest from a data frame of several lat & long positions.
I can get the package to do what I want for one point
devtools::install_github("markwh/streamstats")
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 3))
x <- c("state","lat","long")
colnames(dat1) <- x
dat1$state <- c("NJ","NY","VA")
dat1$lat <- c(40.99194,42.02458,38.04235)
dat1$long <- c(-74.28000,-75.11928,-79.88144)
test_dat <- dat1[1,]
ws1 <- delineateWatershed(xlocation = test_dat$long, ylocation = test_dat$lat, crs = 4326,
includeparameters = "true", includeflowtypes = "true")
chars1 <- computeChars(workspaceID = ws1$workspaceID, rcode = "MA")
chars1$parameters
However what I would like is to be able to give the delineateWatershed function several watersheds at once (i.e., all 3 locations found in dat1) and combine the chars1$parameters output variables DRNAREA,FOREST,LC11DEV, and LC11IMP into a data frame. Maybe this could be achieved with a for loop?
The ideal output would look like this
state lat long DRNAREA FOREST LC11DEV LC11IMP
1 NJ 40.99194 -74.28000 160 66.2 26.20 5.50
2 NY 42.02458 -75.11928 457 89.3 2.52 0.18
3 VA 38.04235 -79.88144 158 NA 4.63 0.20
I would put what you have in a function then use purrr::pmap_df() to loop through each row in dat1 then bind all the results together. See also this answer
library(dplyr)
library(purrr)
library(tidyr)
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 2))
colnames(dat1) <- c("state", "lat", "long")
dat1$state <- c("NJ", "NY")
dat1$lat <- c(40.99194, 42.02458)
dat1$long <- c(-74.28000, -75.11928)
dat1
#> state lat long
#> 1 NJ 40.99194 -74.28000
#> 2 NY 42.02458 -75.11928
Define a function for catchment delineation
catchment_delineation <- function(rcode_in, lat_y, long_x) {
print(paste0("Processing for lat = ", lat_y, " and long = ", long_x))
ws <- delineateWatershed(xlocation = long_x, ylocation = lat_y, crs = 4326,
includeparameters = "true", includeflowtypes = "true")
ws_properties <- computeChars(workspaceID = ws$workspaceID, rcode = rcode_in)
# keep only what we need
ws_properties_df <- ws_properties$parameters %>%
filter(code %in% c("DRNAREA", "FOREST", "LC11DEV", "LC11IMP")) %>%
mutate(ID = ws$workspaceID,
state = rcode_in,
long = long_x,
lat = lat_y)
return(ws_properties_df)
}
Apply the function to each row in dat1 data frame
catchment_df <- pmap_df(dat1, ~ catchment_delineation(..1, ..2, ..3))
#> https://streamstats.usgs.gov/streamstatsservices/watershed.geojson?rcode=NJ&xlocation=-74.28&ylocation=40.99194&includeparameters=true&includeflowtypes=true&includefeatures=true&crs=4326https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NJ&workspaceID=NJ20210923064141811000&includeparameters=truehttps://streamstats.usgs.gov/streamstatsservices/watershed.geojson?rcode=NY&xlocation=-75.11928&ylocation=42.02458&includeparameters=true&includeflowtypes=true&includefeatures=true&crs=4326https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NY&workspaceID=NY20210923064248530000&includeparameters=true
catchment_df
#> ID name
#> 1 NJ20210923064141811000 Drainage Area
#> 2 NJ20210923064141811000 Percent Forest
#> 3 NJ20210923064141811000 Percent Developed from NLCD2011
#> 4 NJ20210923064141811000 Percent_Impervious_NLCD2011
#> 5 NY20210923064248530000 Drainage Area
#> 6 NY20210923064248530000 Percent Forest
#> 7 NY20210923064248530000 Percent Developed from NLCD2011
#> 8 NY20210923064248530000 Percent_Impervious_NLCD2011
#> description
#> 1 Area that drains to a point on a stream
#> 2 Percentage of area covered by forest
#> 3 Percentage of developed (urban) land from NLCD 2011 classes 21-24
#> 4 Average percentage of impervious area determined from NLCD 2011 impervious dataset
#> 5 Area that drains to a point on a stream
#> 6 Percentage of area covered by forest
#> 7 Percentage of developed (urban) land from NLCD 2011 classes 21-24
#> 8 Average percentage of impervious area determined from NLCD 2011 impervious dataset
#> code unit value state long lat
#> 1 DRNAREA square miles 160.00 NJ -74.28000 40.99194
#> 2 FOREST percent 66.20 NJ -74.28000 40.99194
#> 3 LC11DEV percent 26.20 NJ -74.28000 40.99194
#> 4 LC11IMP percent 5.50 NJ -74.28000 40.99194
#> 5 DRNAREA square miles 457.00 NY -75.11928 42.02458
#> 6 FOREST percent 89.30 NY -75.11928 42.02458
#> 7 LC11DEV percent 2.52 NY -75.11928 42.02458
#> 8 LC11IMP percent 0.18 NY -75.11928 42.02458
Reshape the result to desired format
catchment_reshape <- catchment_df %>%
select(state, long, lat, code, value) %>%
pivot_wider(names_from = code,
values_from = value)
catchment_reshape
#> # A tibble: 2 x 7
#> state long lat DRNAREA FOREST LC11DEV LC11IMP
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 NJ -74.3 41.0 160 66.2 26.2 5.5
#> 2 NY -75.1 42.0 457 89.3 2.52 0.18
Created on 2021-09-22 by the reprex package (v2.0.1)
Since you mentioned the use a for loop I thought why not make a solution of it.
Here is your data:
library(dplyr)
library(purrr)
library(tidyr)
library(streamstats)
setTimeout(120)
dat1 <- data.frame(matrix(ncol = 3, nrow = 2))
colnames(dat1) <- c("state", "lat", "long")
dat1$state <- c("NJ", "NY")
dat1$lat <- c(40.99194, 42.02458)
dat1$long <- c(-74.28000, -75.11928)
dat1
Create an empty list to store the watershed characteristics:
water_shed <- list()
Loop through the dat1 and return the properties for each respective longitude and latitude:
for(i in 1:nrow(dat1)){
water_shed[[i]] <-
delineateWatershed(xlocation = dat1$long[i], ylocation = dat1$lat[i], crs = 4326,
includeparameters = "true", includeflowtypes = "true")
}
Now create a list to store the watershed properties:
ws_properties <- list()
Loop through the water_shed returning the parameters of each location:
for(i in 1:length(water_shed)){
ws_properties[[i]] <- computeChars(workspaceID = water_shed[[i]][[1]], rcode = dat1$state)
}
Finally, create a dataframe for your desired outputs then append the properties for each location looping through the list of watershed properties:
# data frame:
ws_properties_df <- data.frame(state=character(),long=integer(), lat=integer(),
DRNAREA = integer(), FOREST = integer(), LC11DEV = integer(), LC11IMP = integer(),
stringsAsFactors=FALSE)
#append properties for eact location
for(i in 1:length(ws_properties)){
ws_properties_df[i,] <- ws_properties[[i]]$parameters %>%
filter(code %in% c("DRNAREA", "FOREST", "LC11DEV", "LC11IMP")) %>%
mutate(state = dat1$state[i],
long = dat1$long[i],
lat = dat1$lat[i]) %>%
select(state, long, lat, code, value) %>%
pivot_wider(names_from = code,
values_from = value)
}
Desired Output:
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv',
stringsAsFactors = FALSE)
df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-
Scraping/master/countries.csv',
stringsAsFactors = FALSE)
install.packages("tidyverse")
library(tidyverse)
df %>%
left_join(df8, by = c("countryName" = "country_name")) %>%
mutate(population = as.numeric(str_remove_all(population, ","))) %>%
group_by(countryName) %>%
group_by(countryName) %>%
unique() %>%
summarize(population = sum(population, na.rm = TRUE),
confirmed = sum(confirmed, na.rm = TRUE),
recovered = sum(recovered, na.rm = TRUE),
death = sum(death, na.rm = TRUE),
death_prop = paste0(as.character(death), "/", as.character(population))
)
in this code
population / death rate was calculated.
highest population / death have rate
Finding 10 countries.
confirmed and recovered
dont will be available.
10 x 6
countryName population confirmed recovered death death_prop
<chr> <dbl> <int> <int> <int> <chr>
1 Afghanistan 4749258212 141652 16505 3796 3796/4749258212
2 Albania 351091234 37233 22518 1501 1501/351091234
3 Algeria 5349827368 206413 88323 20812 20812/5349827368
4 Andorra 9411324 38518 18054 2015 2015/9411324
5 Angola 4009685184 1620 435 115 115/4009685184
6 Anguilla 1814018 161 92 0 0/1814018
7 Antigua and Barbuda 11947338 1230 514 128 128/11947338
8 Argentina 5513884428 232975 66155 10740 10740/5513884428
9 Armenia 361515646 121702 46955 1626 1626/361515646
10 Aruba 13025452 5194 3135 91 91/13025452
data is an example.
the information is not correct.
The data is in cumulative format meaning all the values for today have all the values till yesterday. So take only max values of each column and calculate death_prop.
library(dplyr)
df %>%
left_join(df8, by = c("countryName" = "country_name")) %>%
mutate(population = as.numeric(str_remove_all(population, ","))) %>%
group_by(countryName) %>%
summarise_at(vars(population:death), max, na.rm = TRUE) %>%
mutate(death_prop = death/population * 100) %>%
arrange(desc(death_prop))
# A tibble: 215 x 5
# countryName population year death death_prop
# <chr> <dbl> <dbl> <int> <dbl>
# 1 San Marino 33860 2019 42 0.124
# 2 Belgium 11589623 2020 9312 0.0803
# 3 Andorra 77142 2019 51 0.0661
# 4 Spain 46754778 2020 28752 0.0615
# 5 Italy 60461826 2020 32877 0.0544
# 6 United Kingdom 67886011 2020 36914 0.0544
# 7 France 65273511 2020 28432 0.0436
# 8 Sweden 10099265 2020 4029 0.0399
# 9 Sint Maarten 42388 2019 15 0.0354
#10 Netherlands 17134872 2020 5830 0.0340
# … with 205 more rows
Here I have data that looks like this:
# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
"Makor Hospital", "Makor Hospital"),
"Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
"Medical admissions" = c(12,56,0,40,5,56),
"Surgical admissions" = c(10,2,0,50,20,56),
"Inpatient admissions" = c(9,5,6,0,60,96))
Now this data has a column called period which is monthy data for different years, 2018,2019 and 2020
if I plot this data, here is how it looks
library(ggplot2
# Melt data into long format
df2 <- melt(data = df,
id.vars = c("Hospital","Period"),
measure.vars = names(df[3:5]))
# Stacked barplot
ggplot( df2, aes(x = Period, y = value, fill = variable, group = variable)) +
geom_bar(stat = "identity") +
theme(legend.position = "none") +
ggtitle(unique(df2$Hospital))+
scale_x_date(date_labels = %Y)+
labs(x = "Month", y = "Number of People", fill = "Type")
It plots well but the x axis is not organized in ascending order, I have tried to use scale_x_date function but still the plot is the same. What I want is months for the year 2018 to start, then followed with months for 2019 and 2020. I mean x axis to be organized in ascending order based on years like this
Aug-18, Jul-18, Aug-19,Jul-19, Aug-20,Jul-20.
To solve your issue, you need to convert your Period in a date format.
For example, you can use parse_date function from lubridate package:
library(lubridate)
library(tidyr)
library(dplyr)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val")
# A tibble: 18 x 5
Hospital Period Date Var Val
<fct> <fct> <date> <chr> <dbl>
1 Buge Hospital Jul-18 2018-07-01 Medical.admissions 12
2 Buge Hospital Jul-18 2018-07-01 Surgical.admissions 10
3 Buge Hospital Jul-18 2018-07-01 Inpatient.admissions 9
4 Buge Hospital Aug-18 2018-08-01 Medical.admissions 56
5 Buge Hospital Aug-18 2018-08-01 Surgical.admissions 2
6 Buge Hospital Aug-18 2018-08-01 Inpatient.admissions 5
7 Greta Hospital Jul-19 2019-07-01 Medical.admissions 0
8 Greta Hospital Jul-19 2019-07-01 Surgical.admissions 0
9 Greta Hospital Jul-19 2019-07-01 Inpatient.admissions 6
10 Greta Hospital Aug-19 2019-08-01 Medical.admissions 40
11 Greta Hospital Aug-19 2019-08-01 Surgical.admissions 50
12 Greta Hospital Aug-19 2019-08-01 Inpatient.admissions 0
13 Makor Hospital Jul-20 2020-07-01 Medical.admissions 5
14 Makor Hospital Jul-20 2020-07-01 Surgical.admissions 20
15 Makor Hospital Jul-20 2020-07-01 Inpatient.admissions 60
16 Makor Hospital Aug-20 2020-08-01 Medical.admissions 56
17 Makor Hospital Aug-20 2020-08-01 Surgical.admissions 56
18 Makor Hospital Aug-20 2020-08-01 Inpatient.admissions 96
So, then, you can use scale_x_date to set appropriate labeling option on your x axis:
library(lubridate)
library(tidyr)
library(dplyr)
library(ggplot2)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val") %>%
ggplot(aes(x = Date, y = Val, fill= Var, group = Var))+
geom_col()+
scale_x_date(date_breaks = "month", date_labels = "%b %Y")+
labs(x = "Month", y = "Number of People", fill = "Type")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Does it answer your question ?
EDIT: Using `lubridate v1.7.8
On lubridate version 1.7.8, parse_date does not exist anymore. You will have to replace it by parse_date_time as follow:
library(lubridate)
library(dplyr)
df %>% mutate(Date = ymd(parse_date_time2(as.character(Period), orders = "%b-%y"))) %>% ....
I have a pretty basic df in which I have calculated the rank-change of values between two timestamps:
value rank_A rank_B group
1 A 1 1 A
2 B 2 3 A
3 C 3 2 B
4 D 4 4 B
5 E 5 8 A
6 F 6 5 C
7 G 7 6 C
8 H 8 7 A
What makes it a bit tricky (for me) is plotting the values on the Y-axis.
ggplot(df_alluvial, aes(y = value, axis1 = rank_A, axis2 = rank_B))+
geom_alluvium(aes(fill = group), width = 1/12)+
...
As of now, I can plot the rank-change and the groups successfully, but they are not linked to my value-names - there are no axis names and I don't know how to add them.
In the end it should look similiar to this:
https://www.reddit.com/r/GraphicalExcellence/comments/4imh5f/alluvial_diagram_population_size_and_rank_of_uk/
Thanks for your advice!
Your update made the question more clear to me.
The y parameter should be a numerical value, and the data should be in 'long' format. I'm not sure how to change your data to fulfill these requirements. Therefore, I create some new data in this example. I have tried to make the data similar to the data in the plot that you have linked to.
Labels and stratum refer to the city-names. You can use geom_text to label the strata.
# Load libraries
library(tidyverse)
library(ggalluvial)
# Create some data
df_alluvial <- tibble(
city = rep(c("London", "Birmingham", "Manchester"), 4),
year = rep(c(1901, 1911, 1921, 1931), each = 3),
size = c(0, 10, 100, 10, 15, 100, 15, 20, 100, 30, 25, 100))
# Notice the data is in long-format
df_alluvial
#> # A tibble: 12 x 3
#> city year size
#> <chr> <dbl> <dbl>
#> 1 London 1901 0
#> 2 Birmingham 1901 10
#> 3 Manchester 1901 100
#> 4 London 1911 10
#> 5 Birmingham 1911 15
#> 6 Manchester 1911 100
#> 7 London 1921 15
#> 8 Birmingham 1921 20
#> 9 Manchester 1921 100
#> 10 London 1931 30
#> 11 Birmingham 1931 25
#> 12 Manchester 1931 100
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5)+
geom_alluvium()+
geom_text(stat = "stratum", size = 3)
If you want to sort the cities based on their size, you can add decreasing = TRUE to all layers in the plot.
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5, decreasing = TRUE)+
geom_alluvium(decreasing = TRUE)+
geom_text(stat = "stratum", size = 3, decreasing = TRUE)
Created on 2019-11-08 by the reprex package (v0.3.0)