Operator | for choosing the data to load - r

I would like to use if to load the data from csv files which I determine at the beginning of my script.
I use this function:
if(which_data == "data1") {tbl <- read.csv("aaa.csv")}
but I would like to add operator OR | to load the data which I want if I put two different names to which_data.
The function should look like:
if(which_data == "data1" | "data2") {tbl <- read.csv("aaa.csv")}
but the problem is that such operator can be used only for numeric, logical or complex types. What else can I do ?

Test if your variable is "in" one of the values:
if(which_data %in% c("data1" ,"data2")) {tbl <- read.csv("aaa.csv")}
Note that | doesn't do what maybe you think it does with numeric types:
> 3 == 2|3
[1] TRUE
> 3 == 2|1
[1] TRUE
Its testing (3==2) or (1), and in R, 1 evaluates as TRUE, so the expression 3==2|1 is TRUE.

Related

Combining logical functions in R

I'm running several tests for a given object x. For a given test (being a test a function that returns TRUE or FALSE when applied to an object) it is quite easy, as you can do lapply(x, test). For example:
# This would return TRUE
lapply('a', is.character)
However, I would like to create a function pass_tests, which would be able to combine multiple tests, i.e. that it could run something like this:
pass_tests('a', is.character | is.numeric)
Therefore, it should combine multiple functions given in an argument of the function, combining its result when testing an object x. In this case, it would return whether 'a' is character OR numeric, which would be TRUE. The following line should return FALSE:
pass_tests('a', is.character & is.numeric)
The idea is that it could be flexible for different combinations , e.g.:
pass_tests(x, test1 & (test2 | test3))
Any idea if functions can be logically combined this way?
Another option would be to use the pipes
library(magrittr) # or dplyr
"a" %>% {is.character(.) & is.numeric(.)}
#FALSE
"a" %>% {is.character(.) | is.numeric(.)}
#TRUE
1 %>% {is.finite(.) & (is.character(.) | is.numeric(.))}
#TRUE
Edit: used in a function with string
pass_test <- function(x, expr) {
x %>% {eval(parse(text = expr))}
}
pass_test(1, "is.finite(.) & (is.character(.) | is.numeric(.))")
#TRUE
The argument expr can be a string or an expression as in expression(is.finite(.) & (is.character(.) | is.numeric(.))).
Here's another way to do it by creating infix operators.
`%and%` <- function(lhs, rhs) {
function(...) lhs(...) & rhs(...)
}
`%or%` <- function(lhs, rhs) {
function(...) lhs(...) | rhs(...)
}
(is.character %and% is.numeric)('a')
#> [1] FALSE
(is.character %or% is.numeric)('a')
#> [1] TRUE
These can be chained together. However, it will not have the normal AND/OR precedence. It will be evaluated left-to-right.
(is.double %and% is.numeric %and% is.finite)(12)
#> [1] TRUE

OR Operator as function switch

I feel a bit embarrassed to ask that rather simple question but I'm searching for a couple of hours now and can't get my head around.
I'm trying to build a switch for my function:
output <- "both"
if (output== "both" | "partone")
{cat("partone")}
if (output=="both" | "parttwo")
{cat("parttwo")}
This should produce partone and parttwo. Whereasoutput <- "partone" just partone.
How could this work?
Use something like this.
if (output %in% c("both","partone"))
{cat("partone")}
if (output %in% c("both","parttwo"))
{cat("parttwo")}
It will produce your desired output.
If we check the logical condition
output== "both" | "partone"
Error in output == "both" | "partone" : operations are possible
only for numeric, logical or complex types
As we need to check for either 'both' or 'partone', use the %in% on a vector of string elements
output %in% c('both', 'partone')
#[1] TRUE
Now, create a function for reusability
f1 <- function(out, vec) {
if(out %in% vec) cat(setdiff(vec, 'both'), '\n')
}
output <- 'both'
f1(output, c('both', 'partone'))
#partone
f1(output, c('both', 'parttwo'))
#parttwo
output <- 'partone'
f1(output, c('both', 'partone'))
#partone
f1(output, c('both', 'parttwo'))
This syntax is incorrect:
if (output== "both" | "partone")
{cat("partone")}
You can write like this:
if (output == "both" || output == "partone")
{cat("partone")}
Or like this:
if (output %in% c("both", "partone"))
{cat("partone")}

Using lapply to list percentage of null variables in every column in R

I was given a large csv that is 115 columns across and 1000 rows. The columns have a variety of data, some is character-based, some is integer, etc. However, the data has a LOT of null variables of varying types (NA, -999, NULL, etc.).
What I want to do is write a script that will generate a LIST of columns where over 30% of the data in the column is a NULL of some type.
To do this, I wrote a script to give me the null percentage (as decimal) for one column. This script works fine for me.
length(which(indata$ObservationYear == "" | is.na(indata$ObservationYear) |
indata$ObservationYear == "NA" | indata$ObservationYear == "-999" |
indata$ObservationYear == "0"))/nrow(indata)
I want to write a script to do this for all columns. I believe I need to use the lapply function.
I attempted to do this here, however, I can't seem to get this script to work at all:
Null_Counter <- lapply(indata, 2, length(x),
length(which(indata == "" | is.na(indata) | indata == "NA" | indata == "-999" | indata == "0")))
names(indata(which(0.3>=Null_Counter / nrow(indata))))
I get the following errors:
Error in match.fun(FUN) : '2' is not a function, character or symbol
and:
Error: could not find function "indata"
Ideally, what I want it to give me is a vector LIST of all column names where the percentage of all null variables (NA, -999, 0, NULL) is over 30%.
Can anyone help?
I believe you want to use apply rather than lapply which apply a function to a list.
Try this:
Null_Counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "NA" | x == "-999" | x == "0"))/length(x))
Null_Name <- colnames(indata)[Null_Counter >= 0.3]
Here's a different way to do this in data.table:
#first, make a reproducible example:
library(data.table)
#make it so that all columns have ~30% "NA" as you define it
dt<-as.data.table(replicate(
115,sample(c(1:100,"",NA,"NA",-999,0),size=1000,replace=T,
prob=c(rep(.007,100),rep(.06,5)))))
Now, figure out which are troublesome:
x<-as.matrix(dt[,lapply(.SD,function(x){
mean(is.na(x) | x %in% c("","NA","-999","0"))})])
colnames(x)[x>.3]
There's probably a more concise way of doing this, but it's eluding me.
If you're trying to drop those columns, this could be adjusted:
dt[,!colnames(x)[x>.3],with=F]

if function and length of the logical vector

I have a dataframe where the dates are given as hydrological years (October to September). To change this I am trying to use a if statement:
if(cet$month== 10|cet$month==11|cet$month==12)
cet$year <- substr(as.character(cet[,2]),1,4) else
cet$year <- substr(as.character(cet[,2]),6,9)
but I get an error:
the condition has length > 1 and only the first element will be used
Reading the "if" help file I realized that the condition has to be a length-one logical vector. Is there no way of using an "or" with an "if"? All I want is to apply that expression if the month is October, November or December.
ifelse is the vectorised version. You can also use %in% to reduce the number of statements.
cet$year <- ifelse(cet$month%in%(10:12), substr(as.character(cet[,2]),1,4), substr(as.character(cet[,2]),6,9))
Ok, here's a reproducible example that should help to clarify things:
# generate some vector
x <- c(1,2,4,4,5,5,6,6,6)
# have a check using OR, return values
x[x == 2 | x == 1]
## or return TRUE / FALSE
(x == 2 | x == 1)
or check ?ifelse
EDIT: Note that for characters you need to use "", like x == "yourchars" | x == "someotherchars"
Here's also some simple reference and how to work with operators: QuickR
the OR instruction is double pipes
| => || in the if()

how do i pass parameters to subset()?

I am building a gui which lets me select a subset of a data.frame by clicking on the various factor names.
After having received user input, how do i pass it to the subset function?
e.g.: I have a dataframe df with factors MORNING and EVENING in column timeofday and RECEIVE and SEND in column optype. From the GUI I know that the user wants to a subset containing only RECEIVE operations, so i have the following strings as well:
RequestedFactor1 which equals "optype"
RequestedRelationship1 equals "=="
RequestedValue1 which equals "RECEIVE"
What can i do to those strings to pass them to subset, so that I will receive the same output as if i had called subset(df,optype=="RECEIVE") ?
TIA
For this you can use an eval-parse construct, but again I warn that this is actually tricky business. Please read the help files about these two very carefully. So in your case this becomes :
subset(df,eval(parse(text=paste(RF1,RR1,RV1))))
An example to illustrate some tricky parts :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "\"RECEIVE\""
> optype <- c("RECEIVE","Not")
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
Mind the escaped quote-marks (\"). This is necessary as you want to test against a string, and not the RECEIVE object. Alternatively you can do :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "Text"
> optype <- c("RECEIVE","Not")
> Text <- "RECEIVE"
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
The comparison operators in R are actually special functions, so you can use do.call to run the functions, no need for eval and parse and the potential headaches that can come from there. e.g.:
rf1 <- 'Species'
rr1 <- '=='
rv1 <- 'setosa'
subset(iris, do.call(rr1, list( get(rf1), rv1 ) ) )
You need to "get" the variable so that you have the variable value rather than the name, the rest can be the character strings.

Resources