R unwantedly converting character to date instead of numeric - r

I'm passing arguements from a shell script to an R script and R.
library(rLandsat) #used later in the script
args<-commandArgs(trailingOnly=TRUE)
max_date = Sys.Date()-as.numeric(args[1])
min_date = Sys.Date()-(as.numeric(args[1])+as.numeric(args[2]))
path<-as.numeric(as.character(args[4]))
row<-as.numeric(as.character(args[5]))
cloud<-as.numeric(as.character(args[6]))
foldername<-as.character(args[7])
for(i in args){
print(typeof(i))
}
print(args)
print(c(max_date,min_date,path,row,cloud,foldername))
for(i in c(max_date,min_date,path,row,cloud,foldername)){
print(typeof(i))
}
and R is for some reason converting the arguments to some type of date that is still a character. Here is the output from the script. args[3] is used later but I should probably check that too. I know the arg is already a character but it returned the same values with only as.numeric() for path row and cloud. The first two arguments are returned correctly "2016-12-31" "2016-01-01" but the others I would like the same value as the original argument returned. Will check out list() instead of c()
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "1048" "365" "Yellowstone" "38" "29"
[6] "20" "2016"
[1] "2016-12-31" "2016-01-01" "1970-02-08" "1970-01-30" "1970-01-21"
[6] "1975-07-10"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"

c() was causing the error. See MichaelChirico's comment

You really should look into lubridate package to make operations involving dates https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf
As a rule of thumb when you combine data with different classes they are coerced to a format. This happens frequently in the c() command. In your case mixing dates and numerics might get you mixed results.

Related

VarSelLCM returns error but input variables seem to be factors and integer

I have this simple dataset
https://www.mediafire.com/file/ntmu0tvtpm73h2i/data.xlsx/file
library(readxl)
library(VarSelLCM)
data <- read_xlsx("C:/User/data.xlsx")
data$type <- as.factor(data$type)
data$T3 <-as.integer(data$T3)
data$T5 <-as.integer(data$T5)
data$T14 <-as.integer(data$T14)
data$T15 <-as.integer(data$T15)
data$T18 <-as.integer(data$T18)
data$T22 <-as.integer(data$T22)
When I run lapply(data, class)
I get
$`T3`
[1] "integer"
$T5
[1] "integer"
$T14
[1] "integer"
$T15
[1] "integer"
$T18
[1] "integer"
$T22
[1] "integer"
$type
[1] "factor"
So everything seems to be OK.
But when I run
res_with <- VarSelCluster(data, gvals = 1:4, nbcores = 1, crit.varsel = "BIC")
I get the Error
Error in VSLCMdataMixte(x) :
At least one variable is neither numeric, integer nor factor!
The problem perhaps could be that when I run
> print(typeof(data$type))
I get this:
[1] "integer"
So it seems that data$type is integer?? though I converted it to factor? And it indeed is a factor as shown by lapply??
But even if the data$type variable was integer, it should be correct for VarSelCluster, because it requires integer variables, this is totally confusing.
How could I solve this?
VarSelCluster requires data as a data.frame, so first you need to set class(data)="data.frame"

Loop over dates in R

I want to loop over a series of dates in R. Here's some sample code:
myDates <- seq.Date(as.Date("2020-01-01"), as.Date("2020-01-03"), by = "day")
myDates[1]
class(myDates[1])
This creates a vector of dates, and I confirm this by printing and checking the class of the first element.
However, when I run this loop:
for (myDate in myDates) print(myDate)
I get this output:
[1] 18262
[1] 18263
[1] 18264
Having checked out this question I've got some workarounds to solve my immediate issue, but can anyone explain to me why this happens, and if there's a simple way to iterate directly over a vector of dates?
The reason has been explained by #r2evans in the comments of your post. Actually you have a couple of methods to circumvent the issue, e.g.,
> d <- Map(print,myDates)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"
or
> for (myDate in as.character(myDates)) print(myDate)
[1] "2020-01-01"
[1] "2020-01-02"
[1] "2020-01-03"

R: What are dates in a dates vector: dates or numeric values? (difference between x[i] and i)

Could anyone explain please why in the first loop each element of my dates vector is a date while in the second each element of my dates vector is numeric?
Thank you!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
The elements are Date, the first loop is correct.
Unfortunately R does not consistently have the style of the second loop. I believe that the issue is that the for (i in x) syntax bypasses the Date methods for accessors like [, which it can do because S3 classes in R are very thin and don't prevent you from not using their intended interfaces. This can be confusing because something like for (i in 1:4) print(i) works directly, since numeric is a base vector type. Date is S3, so it is coerced to numeric. To see the numeric objects that are printing in the second loop, you can run this:
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
for (i in x) print(i)
#> [1] 17532
#> [1] 17533
#> [1] 17533
#> [1] 17657
which is giving you the same thing as the unclassed version of the Date vector. These numbers are the days since the beginning of Unix time, which you can also see below if you convert them back to Date with that origin.
unclass(x)
#> [1] 17532 17533 17533 17657
as.Date(unclass(x), "1970-01-01")
#> [1] "2018-01-01" "2018-01-02" "2018-01-02" "2018-05-06"
So I would stick to using the proper accessors for any S3 vector types as you do in the first loop.
When you run:
for (i in seq_along(x)) print(class(x[i]))
You're using an iterator i over each element of x. Which means that each time you get the class of each iterated member of x.
However, when you run:
for (i in x) print(class(i))
You're looking for the class of each member. Using the ?Date:
Dates are represented as the number of days since 1970-01-01
Which is the reason why you get numeric as your class.
Moreover, if you'll use print() for each loop you'll get dates and numbers:
for (i in seq_along(x)) print(x[i])
[1] "2018-01-01"
[1] "2018-01-02"
[1] "2018-01-02"
[1] "2018-05-06"
and
for (i in x) print(i)
[1] 17532
[1] 17533
[1] 17533
[1] 17657
Lastly, if you want to test R's logic we can do something like that:
x[1] - as.Date("1970-01-01")
Taking the first element of x ("2018-01-01") and subtract "1970-01-01", which is the first date. Our output will be:
Time difference of 17532 days
If you look at ?'for', you'll see that for(var in seq) is only defined when seq is "An expression evaluating to a vector", and is.vector(x) is FALSE. So the documentation says (maybe not so clearly) that the behavior here is undefined, which is why the behavior is unexpected.
As joran mentions, as.vector(x) returns a numeric vector, same as unclass(x) mentioned by Calum You.

How to avoid implicit character conversion when using apply on dataframe

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?
Let's wrap up multiple comments into an explanation.
the use of apply converts a data.frame to a matrix. This
means that the least restrictive class will be used. The least
restrictive in this case is character.
You're supplying 1 to apply's MARGIN argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply designed for matrices
and data.frames on a vector. This is not the right tool for the job.
In ths case I'd use lapply or sapply as rmk points out to grab the classes of
the single t2 column as seen below:
Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply family of functions do.
Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"

How to read logical data from a file in R

I generated a file which contains a logical value either a "TRUE" or a "FALSE" on each line. Now I would like to read the logical data from the file into R. However the data that are read in are of mode "character" not logical values. I was wondering how to read the data as logical values from the file.
My R code is
cat(FALSE,"\n", file="1.txt", append=FALSE);
for (i in 2:5) cat(TRUE,"\n",file="1.txt", append=TRUE);
a=scan(file="1.txt", what="logical")
The output is:
> mode(a)
[1] "character"
> mode(a[1])
[1] "character"
> a[1]
[1] "FALSE"
I want a[1] to be logical value.
Thanks and regards!
Ah, now I get it. You have to read ?scan very carefully to see that what you've done is not what scan() wants for the what argument. I missed this first time and then wondered why your code wasn't working. This is the key section:
what: the type of ‘what’ gives the type of data to be read. The
supported types are ‘logical’, ‘integer’, ‘numeric’,
‘complex’, ‘character’, ‘raw’ and ‘list’.
and the key phrase is type. So you need to pass an object of the correct type to argument what.
In your example:
> typeof("logical")
[1] "character"
So scan() reads in an object of type "character".
The solution is simply to use what = TRUE, or indeed anything that R considers a logical (see comments to this answer), instead
> typeof(TRUE)
[1] "logical"
> ## or
> typeof(logical())
[1] "logical"
## So now read in with what = TRUE
> a <- scan(file="1.txt", what = TRUE)
Read 5 items
> class(a)
[1] "logical"
> typeof(a)
[1] "logical"
read.table() is more logical in how you tell it what the data to be read in are. The equivalent call there would be:
> b <- read.table("1.txt", colClasses = "logical")[,]
> class(b)
[1] "logical"
> typeof(b)
[1] "logical"
HTH
Use a=='TRUE'->a.

Resources