How to avoid implicit character conversion when using apply on dataframe - r

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?

Let's wrap up multiple comments into an explanation.
the use of apply converts a data.frame to a matrix. This
means that the least restrictive class will be used. The least
restrictive in this case is character.
You're supplying 1 to apply's MARGIN argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply designed for matrices
and data.frames on a vector. This is not the right tool for the job.
In ths case I'd use lapply or sapply as rmk points out to grab the classes of
the single t2 column as seen below:
Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply family of functions do.

Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"

Related

R unwantedly converting character to date instead of numeric

I'm passing arguements from a shell script to an R script and R.
library(rLandsat) #used later in the script
args<-commandArgs(trailingOnly=TRUE)
max_date = Sys.Date()-as.numeric(args[1])
min_date = Sys.Date()-(as.numeric(args[1])+as.numeric(args[2]))
path<-as.numeric(as.character(args[4]))
row<-as.numeric(as.character(args[5]))
cloud<-as.numeric(as.character(args[6]))
foldername<-as.character(args[7])
for(i in args){
print(typeof(i))
}
print(args)
print(c(max_date,min_date,path,row,cloud,foldername))
for(i in c(max_date,min_date,path,row,cloud,foldername)){
print(typeof(i))
}
and R is for some reason converting the arguments to some type of date that is still a character. Here is the output from the script. args[3] is used later but I should probably check that too. I know the arg is already a character but it returned the same values with only as.numeric() for path row and cloud. The first two arguments are returned correctly "2016-12-31" "2016-01-01" but the others I would like the same value as the original argument returned. Will check out list() instead of c()
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "1048" "365" "Yellowstone" "38" "29"
[6] "20" "2016"
[1] "2016-12-31" "2016-01-01" "1970-02-08" "1970-01-30" "1970-01-21"
[6] "1975-07-10"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
c() was causing the error. See MichaelChirico's comment
You really should look into lubridate package to make operations involving dates https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf
As a rule of thumb when you combine data with different classes they are coerced to a format. This happens frequently in the c() command. In your case mixing dates and numerics might get you mixed results.

Date format changes

I am just preparing the some table like cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30")) , and I am not getting the expected output. Any one please help.
Output : "Metrics" "17927" "17934"
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
cols<- c("Metrics",as.Date(Sys.Date()-8,origin="1899-12-
30"),as.Date(Sys.Date()-1,origin="1899-12-30"))
Expected Output:
"Metrics" "2019-01-31" "2019-02-07"
1) character output If you are looking for a character vector as the result then convert the Date class components to character. Also note that the as.Date shown in the question is not needed since Sys.Date() and offsets from it are already of Date class. Further note that if Sys.Date() were called twice right at midnight it is possible that the two calls might occur on different days. To avoid this possibility we create a today variable so that it only has to be called once.
today <- Sys.Date()
cols <- c("Metrics", as.character(today-8), as.character(today-1))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
1a) This could be made even shorter like this.
cols <- c("Metrics", as.character(Sys.Date() - c(8, 1)))
cols
## [1] "Metrics" "2019-01-31" "2019-02-07"
2) list output Alternately if what you want is a list with one character component and two Date components then:
today <- Sys.Date()
L <- list("Metrics", today - 8, today - 1)
L
giving:
[[1]]
[1] "Metrics"
[[2]]
[1] "2019-01-31"
[[3]]
[1] "2019-02-07"
If we already had L and wanted a character vector then we could further convert it like this:
sapply(L, as.character)
## [1] "Metrics" "2019-01-31" "2019-02-07"

How do you make a vector of character strings from a vector of objects?

I want to generate a vector of character strings from a vector of objects.
Bob <- c(1,2,3,4)
Anne <- c(3,5,7,1)
Tim <- c(4,2,1,1)
People <- c(Bob, Anne, Tim)
Now what I want is:
> Names
[1] "Bob" "Anne" "Tim"
I know you can do this individually with
> deparse(substitute(Bob))
[1] "Bob"
So I tried to do
Names <- lapply(People,function(k){deparse(substitute(k))})
but this did not give the expected results. It produced a long list where each element is:
[[1]]
[1] "X[[i]]"
[[2]]
[1] "X[[i]]"
...
I'm sure this is a fairly easy task, but I can't get it working. Thanks!

Remove quotes from paste() output in R while preserving the class

I have a dataframe "c1" with one column as "region".
sum(is.na(c1$region))
[1] 2
class(c1$region)
[1] "factor"
However, when I use paste()
f1<-paste("c1","$","region",sep="")
> f1
[1] "c1$region"
> sum(is.na(f1))
[1] 0
I tried as.name(f1) and as.symbol(f1). Both convert f1 to the "name" class. noquote(f1) converts the char[1] element to the "noquote" class.
> f2<-as.name(f1)
> f2
`c1$region`
> sum(is.na(f2))
[1] 0
Warning message:
In is.na(f2) : is.na() applied to non-(list or vector) of type 'symbol'
> class(f2)
[1] "name"
I want to retain the class of c1$region while being able to use it in queries such as sum(is.na(f2)). Please help.
I'm not 100% sure I understand what you are trying to do, but maybe this will help:
c1 <- data.frame(region=c(letters[1:3], NA))
clust <- 1
variable <- "region"
f1 <- get(paste0("c", clust))[[variable]] # <--- key step
class(f1)
# [1] "factor"
sum(is.na(f1))
# [1] 1
In the key step, we use get to fetch the correct cluster data frame using its name as a character vector, and then we use [[, which unlike $, allows us to use a character variable to specify which column we want.

Why doesn't setDT have any effect in this case?

Consider the following code
library(data.table) # 1.9.2
x <- data.frame(letters[1:2])
setDT(x)
class(x)
## [1] "data.table" "data.frame"
Which is an expected result. Now if I run
x <- letters[1:2]
setDT(data.frame(x))
class(x)
## [1] "character"
The class of x remained unchanged for some reason.
One possibility is that setDT changes only the classes of objects in the global environment, so I've tried
x <- data.frame(letters[1:2])
ftest <- function(x) setDT(x)
ftest(x)
class(x)
##[1] "data.table" "data.frame"
Seems like setDT don't care much about the environment of an object in order to change its class.
So what's causing the above behaviour? Is it just a bug or there is some common sense behind it?
setDT changes the data.frame and returns it invisibly. Since you don't save this data.frame, it is lost. All you need to do is somehow save the data.frame, so that the data.table is also saved. E.g.
setDT(y <- data.frame(x))
class(y)
## [1] "data.table" "data.frame"
or
z <- setDT(data.frame(x))
class(z)
## [1] "data.table" "data.frame"

Resources