I'm learning Shiny and I am trying to write an app to explore Simpson's paradox graphically. I built the plot in R first, with customizable options set a the beginning. The plot shows data that are positively correlated but that can be factored into groups which are negatively correlated. The options at the beginning include how many groups to have, how they're spaced, whether to show the grouped/ungrouped regression lines, etc. See the working simpson.R script here.
The intent of the app is to get the user to play with these options to make the groupings more or less obvious and to highlight (or not) the negative correlations.
The complete code for my app files is on github. I don't have sufficient reputation to post more than two links, so please find ui.R and server.R there in the repo.
The error I can't seem to get past is non-numeric argument to binary operator apparently somewhere in this line:
anchors_x <- rep(seq(2, 2 + (input$number_of_groups - 1) * input$separation,
by = input$separation)
and also
p <- ggplot(data = dfserver(), aes(x, y))
Specific question:
Why isn't R treating these variables as numeric?
More generally:
Was it smart of me to build the plot in a plain R script first? Or would a more typical workflow be just to build the thing in Shiny from the beginning? If so, how does one debug/build R inside of Shiny if one can't get into the environment to poke around at the objects? Or is there a way I don't know to access objects that live inside ui.R and server.R?
input$number_of_groups and friends is a string.
Try
ng = as.integer(input$number_of_groups)
sep = as.integer(input$separation)
ppg = as.integer(input$points_per_group)
anchors_x <- rep(seq(2, 2 + (ng - 1) * sep, by = sep), ppg)
and similar further below
Related
I am super new to coding with R, Im taking is as part of a bachelors degree program. I am super stuck on something I feel should be basic but I cannot get my code to work and I am not sure why. The prompt is:
"In this problem we will be using the mpg data set, to get access to the data set you need to load the tidyverse library.
Complete the following steps:
Create a histogram for the cty column with 10 bins"
and for my code I have:
library(tidyverse)
print(mpg)
df <- mpg[ , c("city")]
histo <- ggplot(data = df, aes(x=median)) + geom_histogram(bins=10)
print(histo)
The first print was just to make sure the data loaded correctly, which it did. I am not sure about the second print function, the histo one. Ive gotten various error messages or bugs so Ive been just moving stuff around and trying different commands to get it to work. Im following the steps previously outlined in our reading, but I cannot seem to get this to work. Any help would be appreciated.
I have tried removing the print(histo) function and just leaving the ggplot, but that give me a blank white box instead of a plot, or no plot is printed.
I'm brand new to programming and an picking up Rstudio as a stats tool.
I have a dataset which includes multiple questionnaires divided by weeks, and I'm trying to organize the data into meaningful chunks.
Right now this is what my code looks like:
w1a=table(qwest1,talm1)
w2a=table(qwest2,talm2)
w3a=table(quest3,talm3)
Where quest and talm are the names of the variable and the number denotes the week.
Is there a way to compress all those lines into one line of code so that I could make w1a,w2a,w3a... each their own object with the corresponding questionnaire added in?
Thank you for your help, I'm very new to coding and I don't know the etiquette or all the vocabulary.
This might do what you wanted (but not what you asked for):
tbl_list <- mapply(table, list(qwest1, qwest2, quest3),
list(talm1, talm2, talm3) )
names(tbl_list) <- c('w1a', 'w2a','w3a')
You are committing a fairly typical new-R-user error in creating multiple similarly named and structured objects but not putting them in a list. This is my effort at pushing you in that direction. Could also have been done via:
qwest_lst <- list(qwest1, qwest2, quest3)
talm_lst <- list(talm1, talm2, talm3)
tbl_lst <- mapply(table, qwest_lst, talm_lst)
names(tbl_list) <- paste0('w', 1:3, 'a')
There are other ways to programmatically access objects with character vectors using get or wget.
THis is probably a very silly question, but how can I check if a function written by myself will work or not?
I'm writing a not very simple function involving many other functions and loops and was wondering if there are any ways to check for errors/bugs, or simply just check if the function will work. Do I just create a simple fake data frame and test on it?
As suggested by other users in the comment, I have added the part of the function that I have written. So basically I have a data frame with good and bad data, and bad data are marked with flags. I want to write a function that allows me to produce plots as usual (with the flag points) when user sets flag.option to 1, and remove the flag points from the plot when user sets flag.option to 0.
AIR.plot <- function(mydata, flag.option) {
if (flag.option == 1) {
par(mfrow(2,1))
conc <- tapply(mydata$CO2, format(mydata$date, "%Y-%m-%d %T"), mean)
dates <- seq(mydata$date[1], mydata$date[nrow(mydata(mydata))], length = nrow(conc))
plot(dates, conc,
type = "p",
col = "blue",
xlab = "day",
ylab = "CO2"), error = function(e) plot.new(type = "n")
barplot(mydata$lines, horiz = TRUE, col = c("red", "blue")) # this is just a small bar plot on the bottom that specifies which sample-taking line (red or blue) is providing the samples
} else if (flag.option == 0) {
# I haven't figured out how to write this part yet but essentially I want to remove all
# of the rows with flags on
}
}
Thanks in advance, I'm not an experienced R user yet so please help me.
Before we (meaning, at my workplace) release any code to our production environment we run through a series of testing procedures to make sure our code behaves the way we want it to. It usually involves several people with different perspectives on the code.
Ideally, such verification should start before you write any code. Some questions you should be able to answer are:
What should the code do?
What inputs should it accept? (including type, ranges, etc)
What should the output look like?
How will it handle missing values?
How will it handle NULL values?
How will it handle zero-length values?
If you prepare a list of requirements and write your documentation before you begin writing any code, the probability of success goes up pretty quickly. Naturally, as you begin writing your code, you may find that your requirements need to be adjusted, or the function arguments need to be modified. That's okay, but document those changes when they happen.
While you are writing your function, use a package like assertthat or checkmate to write as many argument checks as you need in your code. Some of the best, most reliable code where I work consists of about 100 lines of argument checks and 3-4 lines of what the code actually is intended to do. It may seem like overkill, but you prevent a lot of problems from bad inputs that you never intended for users to provide.
When you've finished writing your function, you should at this point have a list of requirements and clearly documented expectations of your arguments. This is where you make use of the testthat package.
Write tests that verify all of the requirements you wrote are met.
Write tests that verify you can no put in unintended inputs and get the results you want.
Write tests that verify you get the output you intended on your test data.
Write tests that test any edge cases you can think of.
It can take a long time to write all of these tests, but once it is done, any further development is easier to check since anything that violates your existing requirements should fail the test.
That being said, I'm really bad at following this process in my own work. I have the tendency to write code, then document what I did. But the best code I've written has been where I've planned it out conceptually, wrote my documentation, coded, and then tested against my documentation.
As #antoine-sac pointed out in the links, some things cannot be checked programmatically; for example, if your function terminates.
Looking at it pragmatically, have a look at the packages assertthat and testthat. assertthat will help you insert checks of results "in between", testthat is for writing proper tests. Yes, the usual way of writing tests is creating a small test example including test data.
I have created a plot in R using googleVis, specifically gvisMotionChart, plotting a number of variables.
I am primarily using the line graph and it is all good when I view the graph with all variables, however when I select some of the individual variables it zooms in sunch that some of the plot for this variable is no longer on the graph. I know it should zoom in just to view this variable and can exclude other variables (which is a good feature) but it zooms in too much so that the variable I am after is not entirely on the graph.
This doesn't happen with all variables, and I can get around it by also selecting other variables either side of the one which I want to view, but it would be good if I could fix this. Has anyone come across a similar problem before and know a way around it?
Thanks in advance
EDIT: I have an example of this using the data Batting from the Lahman package. (I know nothing about basaeball so the analysis probably doesn't make sense, in fact looking at the results it almost certainly doesn't but it displays my point). If you run the following code:
library(Lahman)
recent <- subset(Batting, yearID > 2000)
homeruns <- aggregate(HR ~ stint + yearID, data = recent, FUN = sum)
avgHR <- mean(homeruns$HR)
homeruns$HR <- homeruns$HR - avgHR
m <- gvisMotionChart(data = homeruns, idvar = "stint", timevar = "yearID")
plot(m)
Then select the line graph, then subset on number 2, the top part of the graph is cut off
It seems to be Google's bug. I could even reproduce this same error in their "Visualization Playground" (https://code.google.com/apis/ajax/playground/?type=visualization#motion_chart) making part of the data negative.
I've already reported the issue as a bug: https://code.google.com/p/google-visualization-api-issues/issues/detail?id=1479
Might the force be with them!
I just had the same problem w/ a Sankey plot. I resolved it by deleting entries with value==0. However, I just tried to reproduce your example and could not reproduce your bug, so perhaps this has already been solved?
I have a question about Reference Classes. My question is in the context of an R package I am developing rCharts. It uses reference classes to create interactive plots from R.
Creating a plot involves a series of calls. Here is an example, where a scatterplot is created at first and then a line plot gets added.
p1 <- rPlot(mpg ~ cyl, data = mtcars, type = 'point')
p1$layer(copy_layer = T, type = 'line')
Now, since a Reference Class is like a closure, I was wondering if it was possible to log the calls made. The idea is that if I can log the sequence of calls made, then I can automagically insert the source code used to create a visualization, along with the html.
I was trying to see if I could make use of sys.function or match.call, but am not getting anywhere. If someone can point me to how I can approach this, it would be much appreciated.
As #hadley stated:
calls <<- c(calls, list(match.call()))
Glad that looks to have worked. Let's get this closed. :)