problem applying a function to a list in R - r

I have created a function that converts "YYYYQQ" to integer YYYYMMDD. The function works well with individual values in a list but not on the whole list. I am not unable to understand the warning message.
GetProperDate <- function(x) {
x <- as.character(x)
q<-substr(x, 5, 6)
y<-substr(x, 1,4) %>% as.numeric()
if(q=="Q1"){
x <- as.integer(paste0(y,"03","31"))
}
if(q=="Q2"){
x <- as.integer(paste0(y,"06","30"))
}
if(q=="Q3"){
x <- as.integer(paste0(y,"09","30"))
}
if(q=="Q4"){
x <- as.integer(paste0(y,"12","31"))
}
return(x)
}
> GetProperDate("2019Q1")
[1] 20190331
> GetProperDate("2019Q2")
[1] 20190630
> GetProperDate("2019Q3")
[1] 20190930
> GetProperDate("2019Q4")
[1] 20191231
> date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
> date.list.converted<- date.list %>% GetProperDate()
Warning messages:
1: In if (q == "Q1") { :
the condition has length > 1 and only the first element will be used
2: In if (q == "Q2") { :
the condition has length > 1 and only the first element will be used
3: In if (q == "Q3") { :
the condition has length > 1 and only the first element will be used
4: In if (q == "Q4") { :
the condition has length > 1 and only the first element will be used
> date.list.converted
[1] 20190331 20190331 20190331 20190331
>
As shown above I am getting a warning message and the output is not as expected.

The issue is you have written a function GetProperDate which is not vectorised. if is used for scalar inputs and not vector. You may switch to ifelse which is vectorised and rewrite your function.
Apart from that you can also use as.yearqtr from zoo which is used to handle quarterly dates and get the last date of the quarter by using frac = 1.
as.Date(zoo::as.yearqtr(date.list), frac = 1)
#[1] "2019-03-31" "2019-06-30" "2019-09-30" "2019-12-31"

When you pass a vector to the function,it is comparing vector with a scalar. R automatically takes the first element of the vector. thats why you get warning as the condition has length > 1 and only the first element will be used..Try this
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
date.list.converted <- sapply(date.list, function(s) GetProperDate(s))

Try this:
library(tidyverse)
GetProperDate <- function(x) {
x <- as.character(x)
q <- substr(x, 5, 6)
y <- substr(x, 1,4) %>%
as.numeric()
x <- case_when(
q=="Q1" ~ as.integer(paste0(y,"03","31")),
q =="Q2" ~ as.integer(paste0(y,"06","30")),
q == "Q3" ~ as.integer(paste0(y,"09","30")),
TRUE ~ as.integer(paste0(y,"12","31")))
return(x)
}
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
GetProperDate(date.list)
> GetProperDate(date.list)
[1] 20190331 20190630 20190930 20191231

Related

Create indicator variables within a list

I have a list containing sequences of numbers. I want to create a list that indicates all non-zero elements up to the first element that matches a defined limit. I also want to create a list that indicates all non-zero elements after the first element to match the defined limit.
I prefer a base R solution. Presumably the solution will use lapply, but I have not been able to come up with a simple solution.
Below is a minimally reproducible example in which the limit is 2:
my.limit <- 2
my.samples <- list(0,c(1,2),0,c(0,1,1),0,0,0,0,0,c(1,1,2,2,3,4),c(0,1,2),0,c(0,0,1,1,2,2,3))
Here are the two desired lists:
within.limit <- list(0,c(1,1),0,c(0,1,1),0,0,0,0,0,c(1,1,1,0,0,0),c(0,1,1),0,c(0,0,1,1,1,0,0))
outside.limit <- list(0,c(0,0),0,c(0,0,0),0,0,0,0,0,c(0,0,0,1,1,1),c(0,0,0),0,c(0,0,0,0,0,1,1))
We can use match with nomatch argument as a very big number (should be greater than any length of the list, for some reason I couldn't use Inf here.)
within.limit1 <- lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= match(my.limit, x, nomatch = 1000)))
outside.limit1 <- lapply(my.samples, function(x)
+(seq_along(x) > match(my.limit, x, nomatch = 1000)))
Checking if output is correct to shown one :
all(mapply(function(x, y) all(x == y), within.limit, within.limit1))
#[1] TRUE
all(mapply(function(x, y) all(x == y), outside.limit, outside.limit1))
#[1] TRUE
I would do
within.limit <- lapply(my.samples, function(x)
+(x!=0 & (x<limit | cumsum(x == limit)==1)))
outside.limit <- lapply(my.samples, function(x)
+(x!=0 & (x>limit | cumsum(x == limit)>1)))
foo <- function(samples, limit, within = TRUE) {
`%cp%` <- if (within) `<=` else `>`
lapply(samples, function(x) pmin(x, seq_along(x) %cp% match(my.limit, x, nomatch = 1e8)))
}
> all.equal(foo(my.samples, my.limit, FALSE), outside.limit)
# [1] TRUE
> all.equal(foo(my.samples, my.limit, TRUE), within.limit)
# [1] TRUE
We can use findInterval
lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= findInterval(my.limit, x)-1))
and
lapply(my.samples, function(x) +(seq_along(x) > findInterval(my.limit, x)-1))

Returning absent values without inducing integer (0)

I want to identify which values in one vector are present in another vector. Sometimes, in my application, none of the values of the first vector are present; in such cases I would like NA. My current approach returns integer(0) when this occurs:
l <- 1:3
m <- 2:5
n <- 4:6
l[l %in% m]
1] 2 3
l[l %in% n]
integer(0)
This post discusses how to capture integer(0) using length, but is there a way to avoid integer(0) in the first place, and do this operation in just one step? Answers to the previous question suggest that any could be used but I fail to see how that would work in this example.
You could catch the integer(0) with a custom function:
l <- 1:3
m <- 2:5
n <- 4:6
returnsafe <- function(a, b) {
result <- a[a %in% b]
if(is.integer(result) && length(result) == 0L) {
return(NA)
} else {
return(result)
}
}
> returnsafe(l, n)
[1] NA
You can do:
l[match(l, n)]
[1] NA NA NA
Or:
any(l[match(l, n)])
[1] NA

Creating an object of a custom class and assigning methods to it

I am trying to create an object of class "weeknumber", which would have the following format: "2019-W05"
Additionally, I need to be able to use this object with +- operators. Similarly like "Date" variables behave in base R. For instance:
"2019-W05" + 1 = "2019-W06"
"2019-W01" - 1 = "2018-W52"
"2019-W03" - "2019-W01" = 2
I managed to partially achieve my goal. This is what I got so far:
weeknum <- function(date){
# Function that creates weeknumber object from a date
weeknumber <- paste(isoyear(date), formatC(isoweek(date), width = 2, format = "d", flag = "0"), sep = "-W")
class(weeknumber) <- c("weeknumber", class(weeknumber))
weeknumber
}
week2date <- function(weeknumber, weekday = 4) {
# Wrapper around ISOweek2date function from the 'ISOweek' package
ISOweek2date(paste(weeknumber, weekday, sep = "-"))
}
"+.weeknumber" <- function(x, ...) {
# Creating a method for addition
x <- week2date(x) + sum(...)*7
weeknum(x)
}
"-.weeknumber" <- function(x, ...) {
# Creating a method for subtraction
x <- week2date(x) - sum(...)*7
weeknum(x)
}
What works:
> x <- weeknum("2019-01-01")
> x
[1] "2019-W01"
attr(,"class")
[1] "weeknumber" "character"
> x + 1
[1] "2019-W02"
attr(,"class")
[1] "weeknumber" "character"
> x - 1
[1] "2018-W52"
attr(,"class")
[1] "weeknumber" "character"
Works as expected! The only annoying thing is that calling the variable also
prints out the attributes. Any way to hide them in the default print out?
What doesn't work:
> 1 + x
Error: all(is.na(weekdate) | stringr::str_detect(weekdate, kPattern)) is not TRUE
> y <- weeknum("2019-03-01")
> y - x
Error in as.POSIXlt.default(x) :
do not know how to convert 'x' to class “POSIXlt”
Any help appreciated!
Edit:
Figured out a solution how to make 1 + x (where x is a weeknumber) work. Not very elegant but does the job.
"+.weeknumber" <- function(...) {
# Creating a method for addition
vector <- c(...)
week_index <- which(unlist(lapply(list(...), function(x) class(x)[1]))=="weeknumber")
week <- vector[week_index]
other_values <- sum(as.numeric(c(...)[-week_index]))
x <- week2date(week) + other_values*7
weeknum(x)
}
> x <- weeknum("2019-01-01")
> x
[1] "2019-W01"
> 5 + x + 1 + 2 - 1
[1] "2019-W08"
For the first part: Define a custom print-method for your class:
print.weeknumber <- function(x,...)
{
attributes(x) <- NULL
print(x)
}

Logical not correctly evaluated?

I wondering why my first if statement returns Error when my input data is an object of class numeric?
I have clearly stated for the first if statement to only turn on IF the data class is "data.frame", but when data class is numeric, this first if statement return an error! am I missing anything here?
Update:
I have changed instances of & to && but when data is a data.frame, the function doesn't produce any output? For example, run: standard(mtcars)
standard <- function(data){
if(class(data) == "data.frame" && ncol(data) > 1){
data[paste0(names(data), ".s")] <- scale(data)
data
}
if(class(data) == "data.frame" && ncol(data) == 1){
colnames(data) <- paste0(names(data), ".s")
data <- scale(data)
data
}
if(class(data) != "data.frame"){
d <- as.data.frame(data)
colnames(d) <- paste0("Var", ncol(d), ".s")
data <- scale(d)
data
}
}
###### EXAMPLES: #######
standard(mtcars[,2]) ##Problem: `Error in if(class(data) == "data.frame" & ncol(data) > 1)`
standard(mtcars["wt"]) ## OK
standard(mtcars) ## after UPDATE, doesn't give any output
am I missing anything here?
& evaluate both elements while && does not
FALSE && stop("boh")
#R> [1] FALSE
TRUE && stop("boh")
#R> Error: boh
FALSE & stop("boh")
#R> Error: boh
See help("Logic")
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
After your edits
You do not get any results because you do not call return or use if else. See help("function") and help("if"). Here is a small example
f1 <- function(x){
if(x < 0){
x <- -1
x
}
if(x > 0){
x <- 1
x
}
}
f1(-1)
f2 <- function(x){
if(x < 0){
x <- -1
x
}
else if(x > 0){
x <- 1
x
}
}
f2(-1)
#R> [1] -1
f3 <- function(x){
if(x < 0){
x <- -1
return(x)
}
if(x > 0){
x <- 1
return(x)
}
}
f3(-1)
#R> [1] -1
tl;dr you should use && rather than & when doing flow-control, because & always evaluates its second argument, while && short-circuits if the first argument is false. If the argument isn't a data frame (or matrix) then ncol(x) doesn't make sense: see e.g. this question for more information.
Go ahead and unpack it with a simple example.
x <- 1:5
The first part is fine:
class(x) ## "integer"
class(x)=="data.frame" ## TRUE
(although note that you have to be careful, because class(x) might be a vector with more than one element: inherits(x,"data.frame") is safer).
The second part causes the problem:
ncol(x) ## NULL (uh-oh)
ncol(x)>1 ## numeric(0) (uh-oh)
Put them together:
class(x)=="data.frame" & ncol(x)>1 ## logical(0)
What does this do?
if (logical(0)) print("hello")
Error in if (logical(0)) print("hello") : argument is of length zero

Why is my unlist result shorter than my list?

Why does unlisting my list data structure result in a different length? The length of the list is 13951. The length of the unlisted result is 13654. There are no NULL's or NA's.
> class(price)
[1] "list"
> head(price)
$`570`
[1] 0
$`440`
[1] 0
$`730`
[1] 1499
$`304930`
[1] 0
$`550`
[1] 1999
$`230410`
[1] 0
length(names(price)) # 13951
length(price) # 13951
length(unlist(price)) # 13654
> sum(is.na(price))
[1] 0
> sum(is.null(price))
[1] 0
How do I ensure the unlist length is the same as the list length?
-- ATTEMPTED SOLUTION BELOW:
> out <- do.call(c, lapply(price, (function(x) {
+ if (is.null(x)) {NA} else { x }
+ })))
> length(out) #2
[1] 13654
> table(sapply(price, class))
numeric
13951
One thing is to count elements in the list. Another is to count their lengths...
a.list <- list(a= 1, b= NULL, c = list())
length(a.list) # 3
sum(lengths(a.list)) # 1 {as suggested by Nicola}
# same as: sum(sapply(a.list, length)) # 1
I assume you have named list elements that are NULL (or length == 0). When you unlist, those should be lost.
length(unlist(a.list)) #1
If you want to extract something from all named elements (replacing a NULL with a NA) you could do as follows.
out <- do.call(c, lapply(a.list, (function(x) {
if (is.null(x) | length(x) == 0) {NA} else { x }
})))
length(out) #3
This assumes you have no 2-level lists.

Resources