ggplot iterate several columns - r

lapply(7:12, function(x) ggplot(mydf)+geom_histogram(aes(mydf[,x])))
will give an error Error in [.data.frame(mydf, , x) : undefined columns selected.
I have used several SO questions (e.g. this) as guidance, but can't figure out my error.

The code below works with the mtcars dataset. Just replace mtcars with mydf.
library(ggplot2)
lapply(1:3,function(i) {
ggplot(data.frame(x=mtcars[,i]))+
geom_histogram(aes(x=x))+
ggtitle(names(mtcars)[i])
})
Notice how the reference to i (the column index) was moved from the mapping argument (the call to aes(...)), to the data argument.
Your problem is actually quite subtle. ggplot evaluates the arguments to aes(...) first in the context of your data - e.g. it looks for column names in mydf. If that fails it jumps to the global environment. It does not look in the function's environment. See this post for another example of this behavior and some discussion.
The bottom line is that it is a really bad idea to use external variables in a call to aes(...). However, the data=... argument does not suffer from this. If you must refer to a column number, etc., do it in the call to ggplot(data=...).

Related

How to use apply() with my function

bmi<-function(x,y){
(x)/((y/100)^2)
}
bmi(70,177) it can work
but with apply() it does't work
apply(Student,1:2,bmi(Student$weight,Student$height))
Error in match.fun(FUN) :
'bmi(Student$weight, Student$height)' is not a function, character or symbol
It's a bit unclear what the goal is. If it's just to get an answer, then the comments do answer it. If on the other hand, the goal is to understand what you are doing wrong, then read on. I'd say the first error going from left to right is passing the whole dataframe. I would have only passed the 'height' and 'weight' columns.
The next error, again going from left to right, is the use of 1:2 as the second argument to apply. You obviously want to do this "by rows" which mean you should use only 1, i.e. the first dimension of the dataframe.
And the third error is using a function call rather than the function name. Functions with arguments in parentheses don't work when an R function (meaning apply in this case) is expecting a function name or an anonymous function as illustrated in comments.
Fourth error is not assigning the value to a column in your dataframe. So this probably would have succeeded in making the desired extra column via the apply method. But, as noted in comments this is not the most efficient method.:
Student$bmi_val <- apply(Student[ ,c("weight", "height")], bmi)
# didn't want my column name to be the same as the function name
The apply function was actually designed to work with matrices and arrays, so for many purposes it is ill-suited when used with dataframes. In this case where all the arguments to the bmi function are numeric and you can control the order of argument in the first argument to match the x and y positions, it's arguably an acceptable strategy, but not most R-ish method. When working with dates or factor variables, you should definitely avoid apply.

Why is one_of() called that?

Why is dplyr::one_of() called that? All the other select_helpers names make sense to me, so I'm wondering if there's an aspect of one_of() that I don't understand.
My understanding of one_of() is that it just lets you select variables using a character vector of their names instead of putting their names into the select() call, but then you get all of the variables whose names are in the vector, not just one of them. Is that wrong, and if it's correct, where does the name one_of() come from?
one_of allows for guessing or subset-matching
Let's say I know in general my column names will come from c("mpg","cyl","garbage") but I don't know which columns will be present because of interactivity/reactivity
mtcars %>% select(one_of(c("mpg","cyl","garbage")))
evaluates but provides a message
Warning message:
Unknown variables: `garbage`
In contrast
mtcars %>% select(mpg, cyl, garbage)
does not evaluate and gives the error
Error in overscope_eval_next(overscope, expr) :
object 'garbage' not found
The way I think about it is that select() eventually evaluates to a logical vector. So if you use starts_with it goes through the variables in the dataframe and asks whether the variable name starts with the right set of characters. one_of does the same thing but asks whether the variable name is one of the names listed in the character vector. But as they say, naming things is hard!
The reason for its name seems to be that it allows you to look for, at least, one of the variables that are contained in the vector.
For example:
select(flights, dep, arr_delay, sched_dep_time) won't work because the variable "dep" does not exits. It will produce no result.
select(flights, one_of(c("dep", "arr_delay", "sched_dep_time"))) will work, even due the variable "dep" does not exist. In this case, "arr_delay" and "sched_dep_time" will be shown.
The helper should be read as: at least one_of() the variables will be shown :)

R shiny ERROR: object of type 'closure' is not subsettable [duplicate2] [duplicate]

I have a shiny app and when I run it I get an error saying that an object of type ‘closure’ is not subsettable. What is that and how can I fix it?
Note: I wrote this question as this comes up a lot, and the possible dupes are either not shiny related or so specific that it is not obvious that the answers are broadly applicable.
See also this question which covers this error in a non-Shiny context.
How to fix this:
This is a very common error in shiny apps. This most typically appears when you create an object such as a list, data.frame or vector using the reactive() function – that is, your object reacts to some kind of input. If you do this, when you refer to your object afterwards, you must include parentheses.
For example, let’s say you make a reactive data.frame like so:
MyDF<-reactive({ code that makes a data.frame with a column called “X” })
If you then wish to refer to the data.frame and you call it MyDF or MyDF$X you will get the error. Instead it should be MyDF() or MyDF()$X You need to use this naming convention with any object you create using reactive().
Why this happens:
When you make a reactive object, such as a data.frame, using reactive() it is tempting to think of it as just like any other non-reactive data.frame and write your code accordingly. However, what you have created is not really a data.frame. Rather, what you have made is instructions, in the form of a function, which tell shiny how to make the data.frame when it is needed. When you wish to actually use this function to get the data.frame you have to use the parenthesis, just like you would any other function in R. If you forget to use the parenthesis, R thinks you are trying to use part of a function and gives you the error. Try typing:
plot$x
at the command line and you will get the same error.
You may not see this error right when your app starts. Reactive objects have what is called “lazy” evaluation. They are not evaluated until they are needed for some output. So if your data.frame is only used to make a plot, the data.frame will not exist until the user sees the plot for the first time. If when the app starts up the user is required to click a button or change tabs to see the plot, the code for the data.frame will not be evaluated until that happens. Once that happens, then and only then will shiny use the current values of the inputs to run the function that constructs the data.frame needed to make the plot. If you have forgotten to use the parentheses, this is when shiny will give you the error. Note that if the inputs change, but the user is not looking at the plot, the function that makes the data.frame will not be re-run until the user looks at the plot again.

Working with "..." input in R function

I am putting together an R function that takes some undefined input through the ... argument described in the docs as:
"..." the special variable length argument ***
The idea is that the user will enter a number of column names here, each belonging to a dataset also specified by the user. These columns will then be cross-tabulated in comparison to the dependent variable by tapply. The function is to return a table (independent variable x indedependent variable).
Thus, I tried:
plotter=function(dataset, dependent_variable, ...)
{
indi_variables=list(...); # making a list of the ... input as described in the docs
result=with (dataset, tapply(dependent_variable, indi_variables, mean); # this fails
}
I figured this should work as tapply can take a list as input.
But it does not in this case ('Error in tapply...arguments must have same length') and I think it is because indi_variables is a list of strings.
If I input the contents of the list by hand and leave out the quotation marks, everything works just fine.
However, if the user feeds the function the column names as non-strings, R will interpret them as variable names; and I cannot figure out how to transform the list indi_variables in the right way, unsuccessfully trying things like this:
indi_variables=lapply(indi_variables, as.factor)
So I am wondering
What causes the error described above? Is my interpretation correct?
How would one go about transforming the list created through ... in the right way?
Is there an overall better way of doing this, in the input or the implementation of tapply?
Any help is much appreciated!
Thanks to Joran's helpful reading, I have come up with these improvements than make things work out...
indi_variables=substitute(list(...));
result=with (dataset, tapply(dependent_variable, eval(indi_variables, dataset), FUN=mean));

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

Resources