How to use loop to generate the data in a table in Shiny? - r

I just started to learn shiny few days, and I have been troubled by this problem for a long time.
I need to generate a table(Two-column table), and the data in the table needs to be calculated based on the input (then I can use this table to generate a scatter plot in ggplot()).
I try to make the code more visible, so I want to use for loop to replace potentially hundreds of lines of highly repetitive code. Otherwise, it will look like (input$meansy1)-1)^2, (input$meansy1)-2)^2......(input$meansy1)-100)^2.
I don't know why it can't be used correctly in data.frame().
This is part of the code,
shinyUI(fluidPage(
numericInput("y1", "y1:", sample(1:100,1), min = 1, max = 100)),
tableOutput("tb")
))
shinyServer(function(input, output,session) {
list <-c()
for (i in 1:100) {
local({
list[[i]] <-reactive(((input$y1)-i)^2)}
)}
dt = data.frame(y_roof = 1:100, B=list)
output$tb <- renderTable({
dt
})
})

When developing a feature for a shiny app it makes sense to look at the underlying operation separately from the shiny context. That way you can figure out if you have a shiny specific issue or not.
Let's look at the operation you want to do first: Iteratively subtracting the values 1 to 100 from x and squaring the result.
You can do this in base R, like this:
x <- 1
dt1 <- data.frame(y_roof = 1:100)
(x - dt1$y_roof)^2
#> [1] 0 1 4 9 16 25 36 49 64 81 100 121 144 169 196
#> [16] 225 256 289 324 361 400 441 484 529 576 625 676 729 784 841
#> [31] 900 961 1024 1089 1156 1225 1296 1369 1444 1521 1600 1681 1764 1849 1936
#> [46] 2025 2116 2209 2304 2401 2500 2601 2704 2809 2916 3025 3136 3249 3364 3481
#> [61] 3600 3721 3844 3969 4096 4225 4356 4489 4624 4761 4900 5041 5184 5329 5476
#> [76] 5625 5776 5929 6084 6241 6400 6561 6724 6889 7056 7225 7396 7569 7744 7921
#> [91] 8100 8281 8464 8649 8836 9025 9216 9409 9604 9801
To store the results in a dataframe change the last line to:
dt1$col2 <- (x - dt1$y_roof)^2
head(dt1)
#> y_roof col2
#> 1 1 0
#> 2 2 1
#> 3 3 4
#> 4 4 9
#> 5 5 16
#> 6 6 25
Doing the same in the tidyverse would look like this:
library(dplyr)
dt2 <-
data.frame(y_roof = 1:100) %>%
mutate(col2 = (x - y_roof)^2)
head(dt2)
#> y_roof col2
#> 1 1 0
#> 2 2 1
#> 3 3 4
#> 4 4 9
#> 5 5 16
#> 6 6 25
Now we can work this into the shiny app:
library(shiny)
library(dplyr)
ui <-
shinyUI(fluidPage(
numericInput("y1", "y1:", sample(1:100, 1), min = 1, max = 100),
tableOutput("tb")
))
server <-
shinyServer(function(input, output, session) {
output$tb <- renderTable({
data.frame(y_roof = 1:100) %>%
mutate(col2 = (input$y1 - y_roof) ^ 2)
})
})
shinyApp(ui, server, options = list(launch.browser = TRUE))

Related

str_match based on vector with count issue

I havent got a reprex but my data are stored in a csv file
https://transcode.geo.data.gouv.fr/services/5e2a1fbefa4268bc25628f27/feature-types/drac:site?format=CSV&projection=WGS84
library(readr)
bzh_sites <- read_csv("site.csv")
I want to count row based on characters matching (column NATURE)
pattern<-c("allée|aqueduc|architecture|atelier|bas|carrière|caveau|chapelle|château|chemin|cimetière|coffre|dépôt|dolmen|eau|église|enceinte|enclos|éperon|espace|exploitation|fanum|ferme|funéraire|groupe|habitat|maison|manoir|menhir|monastère|motte|nécropole|occupation|organisation|parcellaire|pêcherie|prieuré|production|rue|sépulture|stèle|thermes|traitement|tumulus|villa")
test2 <- bzh_sites %>%
drop_na(NATURE) %>%
group_by(NATURE = str_match( NATURE, pattern )) %>%
summarise(n = n())
gives me :
NATURE n
1 allée 176
2 aqueduc 73
3 architecture 68
4 atelier 200
AND another test with the same data (NATURE)
pattern <- c("allée|aqueduc|architecture|atelier")
test2 <- bzh_sites %>%
drop_na(NATURE) %>%
group_by(NATURE = str_match( NATURE, pattern )) %>%
summarise(n = n())
gives me :
NATURE n
1 allée 178
2 aqueduc 74
3 architecture 79
4 atelier 248
I have no idea about the différences of count.
I tried to find out where the discrepancy is for first group i.e "allée". This is what I found :
library(stringr)
pattern1<-c("allée|aqueduc|architecture|atelier|bas|carrière|caveau|chapelle|château|chemin|cimetière|coffre|dépôt|dolmen|eau|église|enceinte|enclos|éperon|espace|exploitation|fanum|ferme|funéraire|groupe|habitat|maison|manoir|menhir|monastère|motte|nécropole|occupation|organisation|parcellaire|pêcherie|prieuré|production|rue|sépulture|stèle|thermes|traitement|tumulus|villa")
#Get indices where 'allée' is found using pattern1
ind1 <- which(str_match(bzh_sites$NATURE, pattern1 )[, 1] == 'allée')
pattern2 <- c("allée|aqueduc|architecture|atelier")
#Get indices where 'allée' is found using pattern1
ind2 <- which(str_match(bzh_sites$NATURE, pattern2)[, 1] == 'allée')
#Indices which are present in ind2 but absent in ind1
setdiff(ind2, ind1)
#[1] 3093 10400
#Get corresponding text
temp <- bzh_sites$NATURE[setdiff(ind2, ind1)]
temp
#[1] "dolmen allée couverte" "coffre funéraire allée couverte"
What happens when we use pattern1 and pattern2 on temp
str_match(temp, pattern1)
# [,1]
#[1,] "dolmen"
#[2,] "coffre"
str_match(temp, pattern2)
# [,1]
#[1,] "allée"
#[2,] "allée"
As we can see using pattern1 certain values are classified in another group since they occur first in the string hence we have a mismatch.
A similar explanation can be given for mismatches in other groups.
str_match only returns first match, to get all the matches in pattern we can use str_match_all
table(unlist(str_match_all(bzh_sites$NATURE, pattern1)))
# allée aqueduc architecture atelier bas
# 178 76 79 252 62
# carrière caveau chapelle château chemin
# 46 35 226 205 350
# cimetière coffre dépôt dolmen eau
# 275 155 450 542 114
# église enceinte enclos éperon space
# 360 655 338 114 102
#exploitation fanum ferme funéraire groups
# 1856 38 196 1256 295
# habitat maison manoir menhir monastère
# 1154 65 161 1036 31
# motte nécropole occupation organisation parcellaire
# 566 312 5152 50 492
# pêcherie prieuré production rue sépulture
# 69 66 334 44 152
# stèle thermes traitement tumulus villa
# 651 50 119 1232 225

Creating data continuously using rnorm until an outlier occurs in R

Sorry for the confusing title, but i wasn't sure how to title what i am trying to do. My objective is to create a dataset of 1000 obs each would be the length of the run. I have created a phase1 dataset, from which a set of control limits are produced. What i am trying to do now is create a phase2 dataset most likely using rnorm. what im trying to do is create a repeat loop that will continuously create values in the phase2 dataset until one of those values is outside of the control limits produced from the phase1 dataset. for example if i had 3.0 and -3.0 as control limits the phase2 dataset would create a bunch of observations until obs 398 when the value here happens to be 3.45, thus stopping the creation of data. my objective is then to record the number 398. Furthermore, I am then trying to loop the code back to the phase1 dataset/ control limits portion and create a new set of control limits and then run another phase2, until i have 1000 run lengths recorded. the code i have for the phase1/ control limits works fine and looks like this:
nphase1=50
nphase2=1000
varcount=1
meanshift= 0
sigmashift= 1
##### phase1 dataset/ control limits #####
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- apply(phase1, 2, mean)
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
I have previously created this code in SAS and it looks like this. might be a better reference for what i am trying to achieve then me trying to explain it.
%macro phase2_dataset (n=,varcount=, meanshift=, sigmashift=, nphase1=,simID=,);
%do z=1 %to &n;
%phase1_dataset (n=&nphase1, varcount=&varcount);
data phase2; set control_limits n=lastobs;
call streaminit(0);
do until (phase2_var1<Lower_SPC_limit_method1_var1 or
phase2_var1>Upper_SPC_limit_method1_var1);
phase2_var1 = rand("normal", &meanshift, &sigmashift);
output;
end;
run;
ods exclude all;
proc means data=phase2;
var phase2_var1;
ods output summary=x;
run;
ods select all;
data run_length; set x;
keep Phase2_var1_n;
run;
proc append base= QA.Phase2_dataset&simID data=Run_length force; run;
%end;
%mend;
Also been doing research about using a while loop in replace of the repeat loop.
Im new to R so Any ideas you are able to throw my way are greatly appreciated. Thanks!
Using a while loop indeed seems to be the way to go. Here's what I think you're looking for:
set.seed(10) #Making results reproducible
replicate(100, { #100 is easier to display here
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- colMeans(phase1) #Slightly better than apply
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
#Phase 2
x <- 0
count <- 0
while(x > Lower_SPC_Limit_Method1 && x < Upper_SPC_Limit_Method1) {
x <- rnorm(1)
count <- count + 1
}
count
})
The result is:
[1] 225 91 97 118 304 275 550 58 115 6 218 63 176 100 308 844 90 2758
[19] 161 311 1462 717 2446 74 175 91 331 210 118 1517 420 32 39 201 350 89
[37] 64 385 212 4 72 730 151 7 1159 65 36 333 97 306 531 1502 26 18
[55] 67 329 75 532 64 427 39 352 283 483 19 9 2 1018 137 160 223 98
[73] 15 182 98 41 25 1136 405 474 1025 1331 159 70 84 129 233 2 41 66
[91] 1 23 8 325 10 455 363 351 108 3
If performance becomes a problem, perhaps it would be interesting to explore some improvements, like creating more numbers with rnorm() at a time and then counting how many are necessary to exceed the limits and repeat if necessary.

how to properly use checkboxInput on Shiny - R code

I have a working app that I would like to enhance with a checkboxInput.
1> Here is a sample of the data:
StudentID StudentGender GradeName TermName MeasurementScaleName TestPercentile GoalRITScore1 GoalRITScore2 GoalRITScore3 GoalRITScore4
1 1374 M 3 Fall 2009 Reading 32 188 181 179 NA
50 1297 F 8 Fall 2009 Language Usage 48 224 214 209 228
101 1608 F 8 Fall 2009 Mathematics 40 225 210 211 244
1500 1286 M 1 Fall 2011 Language Usage NA 218 225 238 221
2345 1196 F 8 Fall 2012 Language Usage 78 230 227 239 223
5498 1376 F 3 Spring 2010 Reading 24 188 194 185 NA
8954 486 M 2 Spring 2014 Reading 2 146 152 174 NA
9000 577 F 2 Spring 2014 Reading 71 196 189 207 NA
GoalRITScore5 GoalRITScore6
1 NA NA
50 NA NA
101 233 227
1500 NA NA
2345 NA NA
5498 NA NA
8954 NA NA
9000 NA NA
2> Here is part of the working script.
Shiny UI
library(shiny)
shinyUI(navbarPage("MAP results",
tabPanel("Summaries",
sidebarLayout(
sidebarPanel(
selectInput("testname",
"Select the test to visualize",
levels(mapdata$MeasurementScaleName)),
selectInput("termname",
"Select the term the test was taken",
levels(mapdata$TermName)),
selectInput("ritorpercent",
"Display RIT scores or percentiles",
choices = c("RIT Scores", "Percentiles")),
checkboxInput("gender", "Display Gender differences"),
),
mainPanel(
plotOutput("mapgraph")
)
)
),
tabPanel("Growth visualizations")
)
)
And part of the Server.R script.
Server.R
library(shiny)
library(dplyr)
library(tidyr)
library(ggplot2)
mapdata <- read.csv("MAP data raw.csv")
shinyServer(function(input, output) {
output$mapgraph <- renderPlot({
graph1RIT <- reactive (mapdata %>%
filter(TermName == input$termname, MeasurementScaleName == input$testname) %>%
group_by(GradeName) %>%
summarise(meanPer = mean(TestPercentile)))
ggplot(graph1RIT(), aes(as.factor(GradeName), meanPer, fill = as.factor(GradeName))) +
geom_bar(stat="identity") +
#coord_cartesian(ylim = c(150, 250)) +
labs(x = "Grade Level", y = "Mean RIT Percentile") +
guides(fill = FALSE)
})
})
Now I want to use my checkboxInput("gender"), to make the same bargraph but with gender segregation. ... and I thought I could just add this into the server.r
if(input$gender) {
graph3RIT <- reactive (mapdata %>%
filter(TermName == input$termname, MeasurementScaleName == input$testname) %>%
group_by(GradeName, StudentGender) %>%
summarise(meanPer = mean(TestPercentile)))
ggplot(graph3RIT(), aes(as.factor(GradeName), meanPer, fill = as.factor(StudentGender))) +
geom_bar(stat="identity", position = "dodge") +
labs(x = "Grade Level", y = "Mean RIT Percentile")
}
But if I do that, then the first graph doesn't show up anymore. I've tried to look on the showmeshiny website for similar situation, but all the ones I could find didn't have the code available.
Any guidance on how I could use that checkbox, to change the graph
Franky found the answer on his own. He wrote in the comments
OK ... thanks NicE. I did figure it out in the meantime. All I needed
to do was to put my code in between else {}. Then it worked nicely. –
Franky Feb 10 '15 at 9:43

Binning a dataframe with equal frequency of samples

I have binned my data using the cut function
breaks<-seq(0, 250, by=5)
data<-split(df2, cut(df2$val, breaks))
My split dataframe looks like
... ...
$`(15,20]`
val ks_Result c
15 60 237
18 70 247
... ...
$`(20,25]`
val ks_Result c
21 20 317
24 10 140
... ...
My bins looks like
> table(data)
data
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35]
0 0 0 7 128 2748 2307
(35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
1404 11472 1064 536 7389 1008 1714
(70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105]
2047 700 329 1107 399 376 323
(105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140]
314 79 1008 77 474 158 381
(140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175]
89 660 15 1090 109 824 247
(175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210]
1226 139 531 174 1041 107 257
(210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245]
72 671 98 212 70 95 25
(245,250]
494
When I mean the bins, I get on an average of ~900 samples
> mean(table(data))
[1] 915.9
I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe.
I also tried Hmisc package, unfortunately the bins don't contain equal frequency!!
library(Hmisc)
data<-split(df2, cut2(df2$val, g=30, oneval=TRUE))
data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like:
df <- data.frame(var=runif(500, 0, 100)) # make data
cut.vec <- cut(
df$var,
breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles
include.lowest=T
)
df.split <- split(df, cut.vec)
Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
data<-split(df2, EqualFreq2(df2$val, 25))

How to divide a set of overlapping ranges into non-overlapping ranges? but in R

Let's say we have two datasets:
assays:
BHID<-c(127,127,127,127,128)
FROM<-c(950,959,960,961,955)
TO<-c(958,960,961,966,969)
Cu<-c(0.3,0.9,2.5,1.2,0.5)
assays<-data.frame(BHID,FROM,TO,Cu)
and litho:
BHID<-c(125,127,127,127)
FROM<-c(940,949,960,962)
TO<-c(949,960,961,969)
ROCK<-c(1,1,2,3)
litho<-data.frame(BHID,FROM,TO,ROCK)
and I want to join the two sets and the results after running the algorithm would be:
BHID FROM TO CU ROCK
125 940 970 - 1
127 949 950 - 1
127 950 958 0.3 1
127 958 959 - 1
127 959 960 0.9 1
127 960 961 2.5 2
127 961 962 1.2 -
127 962 966 1.2 3
127 966 969 - 3
128 955 962 0.5 -
Use merge
merge(assays, litho, all=T)
In essence, all=T is the SQL equivalent for FULL OUTER JOIN. I haven't specified any columns, because in this case merge function will perform the join across the column with same names.
Tough one but the code seems to work. The idea is to first expand each row into many, each representing a one-increment from FROM to TO. After merging, identify contiguous rows and un-expand them... Obviously it is not a very efficient approach so it may or may not work if your real data has very large FROM and TO ranges.
library(plyr)
ASSAYS <- adply(assays, 1, with, {
SEQ <- seq(FROM, TO)
data.frame(BHID,
FROM = head(seq(FROM, TO), -1),
TO = tail(seq(FROM, TO), -1),
Cu)
})
LITHO <- adply(litho, 1, with, {
SEQ <- seq(FROM, TO)
data.frame(BHID,
FROM = head(seq(FROM, TO), -1),
TO = tail(seq(FROM, TO), -1),
ROCK)
})
not.as.previous <- function(x) {
x1 <- head(x, -1)
x2 <- tail(x, -1)
c(TRUE, !is.na(x1) & !is.na(x2) & x1 != x2 |
is.na(x1) & !is.na(x2) |
!is.na(x1) & is.na(x2))
}
MERGED <- merge(ASSAYS, LITHO, all = TRUE)
MERGED <- transform(MERGED,
gp.id = cumsum(not.as.previous(BHID) |
not.as.previous(Cu) |
not.as.previous(ROCK)))
merged <- ddply(MERGED, "gp.id", function(x) {
out <- head(x, 1)
out$TO <- tail(x$TO, 1)
out
})
merged
# BHID FROM TO Cu ROCK gp.id
# 1 125 940 949 NA 1 1
# 2 127 949 950 NA 1 2
# 3 127 950 958 0.3 1 3
# 4 127 958 959 NA 1 4
# 5 127 959 960 0.9 1 5
# 6 127 960 961 2.5 2 6
# 7 127 961 962 1.2 NA 7
# 8 127 962 966 1.2 3 8
# 9 127 966 969 NA 3 9
# 10 128 955 969 0.5 NA 10
Note that the first row is not exactly the same as in your expected output, but I think mine makes more sense.

Resources