Linking Shiny Reactive inputs and input updates - r

I have a dataset of baby names that are ranked by popularity for each year.
What I currently have: a simple shiny app that filters on year based on a slider and a select button that identifies which column is the rank column to use (which also creates a color highlight). This will be two datasets in actuality, one for gender marked as M or gender marked as F, but I've left it simple for the example here.
What I would like to do: update this to be reactive to the values of a slider, which then updates a select option for which file column to sort and highlight.
The current approach works for simplicity, but the focus selector for the year obviously throws an error if it's a value that no longer exists in the slider selected range.
I've dug around and tried a few approaches, but I just haven't been able to get the reactivity portion to work successfully. I'm sure I'm missing something elementary but hitting a wall. Thank you for any input.
Example:
library(shiny)
library(tidyverse)
library(DT)
#Fake Data
dat <- structure(list(Name = c("Bill", "Sean", "Kirby", "Philbert",
"Bob", "Lucius", "Fry", "Tyron", "Lionel", "Alister", "Newt",
"Craig", "A-Aron", "Bill", "Sean", "Kirby", "Philbert", "Bob",
"Lucius", "Fry", "Tyron", "Lionel", "Alister", "Newt", "Craig",
"A-Aron", "Bill", "Sean", "Kirby", "Philbert", "Bob", "Lucius",
"Fry", "Tyron", "Lionel", "Alister", "Newt", "Craig", "A-Aron"
), rank = c(8L, 1L, 2L, 3L, 4L, 6L, 5L, 9L, 7L, 25L, 10L, 35L,
99L, 4L, 1L, 3L, 2L, 5L, 6L, 7L, 11L, 5L, 12L, 8L, 9L, 10L, 4L,
2L, 3L, 10L, 8L, 11L, 5L, 6L, 12L, 7L, 13L, 9L, 1L), year = c(2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L)), class = "data.frame", row.names = c(NA, -39L))
#Get years
years <- unique(dat$year)
ui <- fluidPage(
titlePanel("Top Ten Male Baby Names"),
sliderInput("range",
label = "Choose year range",
min = min(as.numeric(years)),
max = max(as.numeric(years)),
sep = "",
value = c(max(as.numeric(years))-1,max(as.numeric(years)))
),
selectInput("year",
label = "Choose year for rank",
choices = as.numeric(years),
selected = max(as.numeric(years))
)
,
mainPanel(
dataTableOutput("DataTable")
)
)
server <- function(input, output) {
output$DataTable <- renderDataTable({
dat1 <- dat %>%
filter((year >= input$range[1] & year <= input$range[2]) ) %>%
pivot_wider(id_cols = Name,
values_from = rank,
names_from = year) %>%
filter(.[colnames(.) == as.character(input$year)] <11) %>%
arrange(.[colnames(.)== as.character(input$year)])
datatable(dat1,
options = list(ordering=F,
lengthChange = F,
pageLength = -1)) %>%
formatStyle(input$year,
backgroundColor = "lightgreen"
)
})
}
shinyApp(ui, server)

You could set up an observeEvent to watch for changes to the sliderInput. Then if your select input is not in the range of the slider. Update the selection.
Note: you need to add the session param to the server function.
Also since output$DataTable is filtered by the range and the year. I've added a validate statement incase the user were to manually chose a year that is not in the current range.
server <- function(input, output, session) {
# Observe for a change to slider input
observeEvent(input$range, {
sel = input$year
# update selection if original selected year is not in range
if(!(sel %in% input$range)) {
sel = min(input$range)
updateSelectInput(session, "year", selected = sel)
}
})
output$DataTable <- renderDataTable({
validate(need(input$year %in% input$range,"Current selection not in range"))
dat1 <- dat %>%
filter((year >= input$range[1] & year <= input$range[2]) ) %>%
pivot_wider(id_cols = Name,
values_from = rank,
names_from = year) %>%
filter(.[colnames(.) == as.character(input$year)] <11) %>%
arrange(.[colnames(.)== as.character(input$year)])
datatable(dat1,
options = list(ordering=F,
lengthChange = F,
pageLength = -1)) %>%
formatStyle(input$year,
backgroundColor = "lightgreen"
)
})
}
shinyApp(ui, server)

Related

How to plot Unequal Interval on x axis of a time series data?

I am using plotrix package to visualize changes in the data using colors. The data is available here.
I am using below code for plotting the data.
library(plotrix)
my_colors1=c("red", "green","blue")
a<-read.csv("DataSt.csv")
x<-a$Year
y<-a$TP
clplot(x, y, main="",lwd=5,labels=y,levels=c(37,964,4377),col=my_colors1, showcuts=T, bty="n",xlab="Year", ylab = "numbers", axes=F)
axis(1, at = a$Year, las=2)
axis(2, at = seq(0, 4400, by = 100), las=2)
I am getting the above chart
I want to reduce the axis space between the year 1975 and 1989. Please help me to get unequal interval at the x axis.
It's a bit dangerous to do this give that the viewer might not realize the inconsistent spacing among the x-axis values. Nevertheless, the following example shows a possible solution by treating the x-values as factor levels. The problem is that that plotting function only allows numeric values. I thus plot with factors, but then need to use numeric values to plot some sort of interpolated values in between using segments:
a <- structure(list(Year = c(2021L, 2020L, 2019L, 2018L, 2017L, 2016L,
2015L, 2014L, 2013L, 2012L, 2011L, 2010L, 2009L, 2008L, 2007L,
2006L, 2005L, 2004L, 2003L, 2002L, 2001L, 2000L, 1999L, 1998L,
1997L, 1996L, 1995L, 1994L, 1993L, 1992L, 1991L, 1990L, 1989L,
1975L), TP = c(785L, 848L, 1067L, 1079L, 1263L, 678L, 1204L,
542L, 661L, 387L, 3534L, 4377L, 964L, 244L, 237L, 145L, 86L,
37L, 39L, 23L, 14L, 11L, 7L, 9L, 6L, 3L, 7L, 7L, 6L, 1L, 1L,
1L, 2L, 1L)), class = "data.frame", row.names = c(NA, -34L))
a$Year <- factor(a$Year)
a <- a[order(a$Year),]
head(a)
my_colors1=c("red", "green","blue")
plot(TP ~ Year, a, col = NA, border = NA, las = 2)
for(i in 2:nrow(a)){
b <- as.data.frame(approx(x = as.numeric(a$Year[(i-1):i]), y = a$TP[(i-1):i], n = 100))
b$col <- my_colors1[as.numeric(cut(b$y, breaks = c(-Inf,37,964,4377,Inf)))]
segments(x0 = b$x[-nrow(b)], x1 = b$x[-1], y0 = b$y[-nrow(b)], y1 = b$y[-1], col = b$col[-1])
}
abline(h = c(37,964), lty = 2)

Numeric year but month as character. How to change months into numeric?

So imagine I have a dataset where the column "date" contains years 2011-2017 and months for each year, however months are written out in letters. For example:
date: 11-Jan
I would like to make the months numeric so I get:
date: 11-01
Any suggestions on how I can tackle this problem?
Kind regards!
Make your input proper dates, parse them, then format them.
x <- c("11-Jan", "12-Feb")
Sys.setlocale("LC_TIME", "C") #parsing of months depends on locale
format(
as.Date(paste0(x, "-1"), format = "%y-%b-%d"),
"%y-%m"
)
#[1] "11-01" "12-02"
See help("strptime") for details on format strings.
Assuming your data is like:
df1 <- structure(list(day_mon = c("16-Dec", "18-Nov", "12-Oct", "8-Oct",
"15-May", "29-Jun", "22-Feb", "25-May", "23-Jan", "24-Oct", "23-May",
"27-Sep", "9-Apr", "28-Oct", "18-Jan", "8-Apr", "7-Jan", "13-Dec",
"28-Nov", "24-May"), year = c(2012L, 2014L, 2011L, 2015L, 2015L,
2015L, 2011L, 2015L, 2012L, 2015L, 2011L, 2012L, 2014L, 2012L,
2013L, 2011L, 2017L, 2016L, 2014L, 2014L)),
row.names = c(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
You can:
# Format the month and day: mon_day_fmt => character vector
df1$mon_day_fmt <- paste(
sprintf(
"%02d",
match(
gsub(
"\\d+\\-(\\w+)",
"\\1",
with(
df1,
day_mon
)
),
month.abb
)
),
sprintf(
"%02d",
as.integer(
gsub(
"^(\\d+)\\-\\w+$",
"\\1",
with(
df1,
day_mon
)
)
)
),
sep = "-"
)
# Create a date vector: date => Date Vector
df1$date <- as.Date(
paste(
df1$year,
df1$mon_day_fmt,
sep = "-"
)
)

How can I calculate the distance of a state within a cluster from the center of the cluster and visualize it on a graph?

I have a sample of 28 states. I want to plot them in one cluster, identify the center, and then calculate the distance of every state from the center, per year.
my input file resemble the following: first column: Country second column: Year (from 2008 to 2017) third column: PI (index)
Question 1: I am getting the error:
Error in eval(e, x, parent.frame()) : object 'mydata.year' not found when I run: table_2008 = subset(table1, mydata.year ==2008)
Question 2: Which code is best suited to calculate the distance of a state from the center of the cluster.
Initially, I designed four aggregate indices to analyse progress in EU member states (economic, political, social, and institutional). I then clustered the state 6 and visualized them as per the image below (k.means and ggplot). I was calculating the distance between clusters, and the distance between states within a cluster by using withinss and totss.
However, I decide to analyse the state from another perspective. Instead of redesigning an index, I decided to use the Prosperity Index (aggregate index including the four issue areas (economic, political, social, and institutional). Therefor I ended up with 1 variable. what I wish to do is consider that EU member states form one cluster - so I have one cluster of 28 states for every year 2008 to 2017), and I want to calculate the distance of every state to the center. I need a specific numeric value for every state for every year.
Lastly, I want to be able to visualize in a graph similar to the one in the image attached. I will have ten graphs for every year (2008 to 2017). Each graph will contain all 28 state with a center.
My code is:
mydata = read.csv("C:/Users/TA/Desktop/R4./PI4.csv",sep = ",", header=TRUE)
mydata$Country
mydata$Category
mydata$PI
data_cluster = data.frame(mydata$Country,mydata$Category,mydata$PI)
write.csv(data_cluster,"C:/Users/TA/Desktop/R4./OutputPI.csv", row.names = FALSE)
table1 = data_cluster
#plot(uk_line[,4])
table1 = na.omit(table1)
within_results = ts(,start = c(2008), end = c(2017), frequency = 1)
within_resultsbetweenss = ts(,start = c(2008), end = c(2017), frequency = 1)
within_results_withinss = matrix(data= NA, nrow = 10, ncol = 4)
#nrow = years, ncols = number of clusters
#colnames(mydata, c("Country","Year"))
#YEAR 2008
#SELECTING A GIVEN YEAR (subset of rows such that year = 2008)
table_2008 = subset(table1, mydata.year ==2008)
table_2008
data2008_clus = table_2008[,3:ncol(table_2008)]
#NAMING THE ROWS USING THE COUNTRY NAMES
rownames(data2008_clus) = table_2008$mydata.Country
data2008_clus
plot(table_2008)
wss <- (nrow(data2008_clus)-1)*sum(apply(data2008_clus,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(data2008_clus,
centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")
# Compute k-means with k = 1
fit1=kmeans(x = data2008_clus,centers = 1)
fit1$cluster
fviz_cluster(fit1,data = data2008_clus)
fit1$withinss
fit1$totss
fit1$betweenss
table_2008$cluster = factor(fit1$cluster)
centers=as.data.frame(fit1$centers)
table_2008
within_results[1] = fit1$totss
within_resultsbetweenss[1] = fit1$betweenss
within_results_withinss[1,] = fit1$withinss
within_results_withinss[1,] = fit1$withinss
plot(within_results)
plot(within_resultsbetweenss)
plot(within_results_withinss)
# Print the results
print(km.res)
table_2008
As for my data set,
mydata_struct = structure( list( Year = c( 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2009L ), Country = structure( c( 1L, 2L, 3L, 4L, –5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 1L ), .Label = c( "Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", –"Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", –"Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden", "United Kingdom" ), class = "factor" ), Prosperity.Index = c( 79.4, 76.1, 62, 65.1, – 69.9, 70.9, 83.2, 73.5, 81.2, 75.9, 79.9, 66, 66.7, 78.9, 69.6, 67.7, 66.6, 79.9, 73.4, 81.2, 66.9, 71, 62.6, 68.2, 72.7, 72.6, 82.8, 78, 79.4 ) ), row.names = c(NA, 29L), class = "data.frame" )

How to use a string function argument to subset a data frame within that function

I would like to write a reusable function whose input parameters would be the dataframe and the names of columns i want to subset it with.
The function is defined as below:
funct <- function(df, colnames){
df_subset = df[ , colnames]
return(df_subset) }
flights_subset <- funct(flights, c("MONTH","YEAR") ) #1st arg is a df, 2nd arg is a string
To give more clarity, i have given code to create the input file 'flights' and the returned file i am expecting from the function 'flights_subset'
flights <- structure(list(YEAR = c(2011L, 2011L, 2011L, 2011L, 2011L), MONTH = c(1L,
1L, 1L, 1L, 1L), DAYOFMONTH = 1:5, DAYOFWEEK = c(6L, 7L, 1L,
2L, 3L), DEPTIME = c(1400L, 1401L, 1352L, 1403L, 1405L)), .Names = c("YEAR",
"MONTH", "DAYOFMONTH", "DAYOFWEEK", "DEPTIME"), row.names = 5424:5428, class = "data.frame")
flights_subset <- structure(list(MONTH = c(1L, 1L, 1L, 1L, 1L), YEAR = c(2011L,
2011L, 2011L, 2011L, 2011L)), .Names = c("MONTH", "YEAR"), class = "data.frame", row.names = 5424:5428)
Your version is fine, you just need to fix the typo in function. Also, there's no need for a return statement.
funct <- function(df, colnames){
df[ , colnames]
}
or you can use dplyr
library(dplyr)
funct <- function(df, colnames){
df %>% select_(colnames)
}
flights_subset <- funct(flights, c("MONTH", "YEAR"))

Group by and conditionally count

I am still learning data management in R. I know I am really close, but can't get the precise syntax. I have looked at
count a variable by using a condition in R
and
Conditional count and group by in R
but can't quite translate to my work. I am trying to get a count of dist.km that equal 0 by ST. Eventually I will want to add columns with counts of various distance ranges, but should be able to get it after getting this. The final table should have all states and a count of 0s. Here is a 20 row sample.
structure(list(ST = structure(c(12L, 15L, 13L, 10L, 15L, 16L,
11L, 12L, 8L, 14L, 10L, 14L, 6L, 11L, 5L, 5L, 15L, 1L, 6L, 4L
), .Label = c("CT", "DE", "FL", "GA", "MA", "MD", "ME", "NC",
"NH", "NJ", "NY", "PA", "RI", "SC", "VA", "VT", "WV"), class = "factor"),
Rfips = c(42107L, 51760L, 44001L, 34001L, 51061L, 50023L,
36029L, 42101L, 37019L, 45079L, 34029L, 45055L, 24003L, 36027L,
25009L, 25009L, 51760L, 9003L, 24027L, 1111L), zip = c(17972L,
23226L, 2806L, 8330L, 20118L, 5681L, 14072L, 19115L, 28451L,
29206L, 8741L, 29020L, 20776L, 12545L, 1922L, 1938L, 23226L,
6089L, 21042L, 36278L), Year = c(2010L, 2005L, 2010L, 2008L,
2007L, 2006L, 2005L, 2008L, 2009L, 2008L, 2010L, 2006L, 2007L,
2008L, 2011L, 2011L, 2008L, 2005L, 2008L, 2009L), dist.km = c(0,
42.4689368078209, 28.1123394088972, 36.8547005648639, 0,
49.7276501081775, 0, 30.1937156926235, 0, 0, 31.5643658415831,
0, 0, 0, 0, 0, 138.854136893762, 0, 79.4320981205195, 47.1692144550079
)), .Names = c("ST", "Rfips", "zip", "Year", "dist.km"), row.names = c(132931L,
105670L, 123332L, 21361L, 51576L, 3520L, 47367L, 99962L, 18289L,
126153L, 19321L, 83224L, 6041L, 46117L, 49294L, 48951L, 109350L,
64465L, 80164L, 22687L), class = "data.frame")
Here are a couple chunks of code I have tried.
state= DDcomplete %>%
group_by(ST) %>%
summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))
state= aggregate(dist.km ~ ST, function(x) sum(dist.km==0, data=DDcomplete))
state = (DDcomplete[DDcomplete$dist.km==0,], .(ST), function(x) nrow(x))
If you want to add it as a column you can do:
DDcomplete %>% group_by(ST) %>% mutate(count = sum(dist.km == 0))
Or if you just want the counts per state:
DDcomplete %>% group_by(ST) %>% summarise(count = sum(dist.km == 0))
Actually, you were very close to the solution. Your code
state= DDcomplete %>%
group_by(ST) %>%
summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))
is almost correct. You can remove the DDcomplete$ from within the call to sum because within dplyr chains, you can access variables directly.
Also note that by using summarise, you will condense your data frame to 1 row per group with only the grouping column(s) and whatever you computed inside the summarise. If you just want to add a column with the counts, you can use mutate as I did in my answer.
If you're only interested in positive counts, you could also use dplyr's count function together with filter to first subset the data:
filter(DDcomplete, dist.km == 0) %>% count(ST)
I hope I'm not missing something, but it sounds like you just want table after doing some subsetting:
table(df[df$dist.km == 0, "ST"])
#
# CT DE FL GA MA MD ME NC NH NJ NY PA RI SC VA VT WV
# 1 0 0 0 2 1 0 1 0 0 2 1 0 2 1 0 0
Other approaches might be:
## dplyr, since you seem to be using it
library(dplyr)
df %>%
filter(dist.km == 0) %>%
group_by(ST) %>%
summarise(n())
## aggregate, since you tried that too
aggregate(dist.km ~ ST, df, function(x) sum(x == 0))
## data.table
library(data.table)
as.data.table(df)[dist.km == 0, .N, by = ST]

Resources