How to use summarise function in JuliaDB? - julia

I am following the tutorial published at https://github.com/JuliaComputing/JuliaDB.jl/blob/master/docs/src/tutorial.md
a) While executing:
summarize(mean ∘ skipmissing, flights, :Dest, select = (:Cancelled, :Diverted))
getting:
Error: UndefVarError: mean not defined
b) Also tried:
summarize(mean, dropna(flights), select = :dep_delay)
getting:
Error: UndefVarError: dropna not defined
Please help me in resolving the issue!

In order to use mean you must first import Statistics or StatsBase.
The other problem is due to the fact that dropna should be dropmissing. Also you have a wrong variable name in the second operation.
The lines that work are:
using Statistics
summarize(mean ∘ skipmissing, flights, :Dest, select = (:Cancelled, :Diverted))
summarize(mean, dropmissing(flights), select = :DepDelay)

Related

call a variable name from a dataframe within a function in R

I can't figure out a simple problem of how to call a variable name from a dataframe passed as string( function input).
I have a function defined as:
box_lambda = function(Valuename,data1){
data1[,Valuename]=ifelse(data1[,Valuename]==0,0.000001,data1[,Valuename])
b= boxcox(get(Valuename) ~ Age.Group+Sex , data = data1)
lambda <- b$x[which.max(b$y)]
return(lambda)
}
But this doesn't work as I get error:
Error in eval(f): 'list' object cannot be coerced to type 'double'
I tried
data1[[Valuename]]=ifelse(data1[Valuename]]==0,0.000001,data1[[Valuename]])
Any help is appreciated!
First you lost a bracket necessary to address a field:
data1[[Valuename]]
You can also use a seties of other approaches from [here][1] and from [here][2]. For instance you can use:
library(dplyr)
data %>%
filter(!!as.name(Valuename) == 0)
So finally you can use :
data1[[Valuename]][data1[[Valuename]]==0] <-0.000001
This script will replace 0 with epsilon and leave the other values.
[1]: https://stackoverflow.com/a/74173690/5043424
[2]: https://stackoverflow.com/a/48219802/5043424

Why can't I pass an anonymous function to dplyr's filter without getting errors about vector size?

Coming from base R, I expected the following to work as a way to pass anonymous functions
library(tidyverse)
starwars %>%
select(height) %>%
filter(function(x) x > 100)
It does not. It reports the following:
Error in `filter()`:
! Problem while computing `..1 = function(x) x > 100`.
Caused by error in `vec_size()`:
! `x` must be a vector, not a function.
Run `rlang::last_error()` to see where the error occurred.
I suspect that I'm making a fundamental error, but I see no comparable examples in the documentation. Where is my misunderstanding of how dplyr handles such cases?

how to combine curve() function with do.call() function? [duplicate]

Is it possible to have a the software ignore the fact that there are unused arguments defined when a module is run?
For example, I have a module multiply(a,b), which returns the product of a and b. I will receive an error if I call the module like so:
multiply(a=20,b=30,c=10)
Returning an error on this just seems a bit unnecessary, since the required inputs a and b have been specified. Is it possible to avoid this bad behaviour?
An easy solution would be just to stop specifying c, but that doesn't answer why R behaves like this. Is there another way to solve this?
Change the definition of multiply to take additional unknown arguments:
multiply <- function(a, b, ...) {
# Original code
}
The R.utils package has a function called doCall which is like do.call, but it does not return an error if unused arguments are passed.
multiply <- function(a, b) a * b
# these will fail
multiply(a = 20, b = 30, c = 10)
# Error in multiply(a = 20, b = 30, c = 10) : unused argument (c = 10)
do.call(multiply, list(a = 20, b = 30, c = 10))
# Error in (function (a, b) : unused argument (c = 10)
# R.utils::doCall will work
R.utils::doCall(multiply, args = list(a = 20, b = 30, c = 10))
# [1] 600
# it also does not require the arguments to be passed as a list
R.utils::doCall(multiply, a = 20, b = 30, c = 10)
# [1] 600
One approach (which I can't imagine is good programming practice) is to add the ... which is traditionally used to pass arguments specified in one function to another.
> multiply <- function(a,b) a*b
> multiply(a = 2,b = 4,c = 8)
Error in multiply(a = 2, b = 4, c = 8) : unused argument(s) (c = 8)
> multiply2 <- function(a,b,...) a*b
> multiply2(a = 2,b = 4,c = 8)
[1] 8
You can read more about ... is intended to be used here
You could use dots: ... in your function definition.
myfun <- function(a, b, ...){
cat(a,b)
}
myfun(a=4,b=7,hello=3)
# 4 7
I had the same problem as you. I had a long list of arguments, most of which were irrelevant. I didn't want to hard code them in. This is what I came up with
library(magrittr)
do_func_ignore_things <- function(data, what){
acceptable_args <- data[names(data) %in% (formals(what) %>% names)]
do.call(what, acceptable_args %>% as.list)
}
do_func_ignore_things(c(n = 3, hello = 12, mean = -10), "rnorm")
# -9.230675 -10.503509 -10.927077
Since there are already a number of answers directly addressing the question, and R is often used by technically skilled non-programmers, let me quickly outline why the error exists, and advise against suppression workarounds.
The number of parameters is an important aspect defining a function. If the number of parameters mismatches, that's a good indication that there is a mismatch between the callers intent and what the function is about to do. For this reason, this would be a compilation error in many programming languages, including Java, Python, Haskell, and many others. Indeed, stricter type checking in many of these languages will also cause errors if types mismatch.
As a program grows in size, and code ages, it becomes harder to spot whether mismatches of this kind are intended or are genuine bugs. This is why an idea of "clean code" - simple to read code with no errors or warnings - is often a standard professional programmers work towards.
Accordingly, I recommend reworking the code to remove the unnecessary parameter. It will be simpler to understand and debug for yourself and others in the future.
Of course I understand that R users are often working on small scripts with limited lifespans, and the usual tradeoffs of large software engineering projects don't always apply. Maybe for your quick script, that will only be used for a week, it made sense to just suppress the error. However, it is widely observed (and I have seen in my own experience) that what code endures is rarely obvious at the time of writing. If you are pursuing open science, and publishing your code and data, it is especially helpful for that code to be useful in the future, to others, so they can reproduce your results.
A similar error is also thrown when using the select() function from the dplyr package and having loaded the MASS package too.
Minimal sample to reproduce:
library("dplyr")
library("MASS")
iris %>% select(Species)
will throw:
Error in select(., Species) : unused argument (Species)
To circumvent use:
library("dplyr")
library("MASS")
iris %>% dplyr::select(Species)
EXPLANATION:
When loading dplyr, a select function is defined and when loading MASS afterwards, the select function is overwritten. When the select function is called, MASS::select() is executed which needs a different number of arguments.
R has a function prod() which does multiplication really well. The example the asker gave works fine with the prod() function without returning an error.`
prod(a=20,b=30,c=10)
# 6000
In any case, an error highlighted is an opportunity to rectify it, so not a bad behaviour.

SWI-Prolog YALL conflict with dicts

SWI-Prolog version 8.0.3 for x64-win64, using yall for lambdas. (use_module(library(yall)).)
Trying to access a value in a dict, within a labmda, causes an error.
I think this is less a problem with yall, and more a problem with dicts in...let's call them "goal-as-value"s, because I'm not sure of the correct term. (For example, X = (Y = 1).) An example representative of my actual problem would be ?- L = [S]>>(S=a{x:_},S.x = 10)., but I'll give a simpler example to start.
Consider:
?- L = (S=a{x:_},S.x = 10).
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [11] throw(error(instantiation_error,_11412))
ERROR: [8] '<meta-call>'(user:(...,...)) <foreign>
ERROR: [7] <user>
ERROR:
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.
when I would instead expect something like the following:
?- L = (S=a{x:_},S.x = 10).
L = (S=a{x:_14168}, S.x=10).
Going back to lambdas, note that my intent can be accomplished, with e.g.
?- L = [S]>>(S=a{x:_},(.(S,x,10))).
L = [S]>>(S=a{x:_8692}, '.'(S, x, 10)).
It's just kinda horrible.
(Calling this lambda yields S = a{x:10}, as expected.)
This seems like a bug in SWI-Prolog, or at least an undocumented limitation. Have I missed something, or should I file a bug report?
As your L = (S=a{x:_},S.x = 10) query shows, the error have nothing to do with library(yall) but with dicts semantics. When using functional notation, as in S.x = 10, SWI-Prolog performs eager evaluation of S.x during query compilation, i.e. before the S=a{x:_} goal is proved. Hence the instantiation error. As you found, not using functional notation by switching to the '.'(S, x, 10) goal solves the problem as it becomes the second goal being proved on the conjunction.

Julia: How to do a by with columns defined as CategoricalArrays.NullableCategoricalArray{String,1,Int32}?

I have been struggling with a dataframe loaded in from Feather.jl when I try to do a by
using Feather,DataFrames, DataFramesMeta, CategoricalArrays
a = Feather.read("some_file.feather")
# the below fails
aaa = by(a, :some_col, df -> sum(df[:some_val]))
it gives an error
MethodError: Cannot convert an object of type String to an object of type CategoricalArrays.CategoricalValue{String,Int32}
The type information is as per below
typeof(a)
# DataFrames.DataFrame
typeof(a[:some_col])
# CategoricalArrays.NullableCategoricalArray{String,1,Int32}
typeof(a[:some_val])
# NullableArrays.NullableArray{Float64,1}
The documentation for CategoricalArrays doesn't contain a lot of documentation on working with DataFrames (nor should it I guess)
However I tried to replace the column with a test value then the by works.
a[:some_col] = ["Testing" for i in 1:nrow(a)]
#this works
by(a,:some_col, df -> sum(df[:some_val]))
so it must be something wrong with CategoricalArrays. But I can't figure out how to do this simple summary. Please help

Resources