xts convert data frame to character - r

I have a csv file and extract data using
banknifty <- as.xts(read.zoo("banknifty.csv",sep=",",tz="" ,header=T))
read.zoo() extracts the data frame with numeric values but as I apply as.xts(), the data. frame's numeric values get converted to characters.
# banknifty[1,] gives
2008-01-01 09.34:00 "10" "12" "13"
I want as.xts should return data.frame with numeric values.
How to avoid this problem?

You're confused about the nature of xts/zoo objects. They are matrices with an ordered index attribute, therefore you cannot mix types in xts/zoo objects like you can in a data.frame.
The reason your object is being converted to character is because some of the values in your file are not numeric. This is also why you get the NAs introduced by coercion error when you tried hd1's solution.
So the answer to your question is, "fix your CSV file", but we can't help you fix it unless you show us the file's contents.

I just ran into a similar problem. In my case, the issue was that the as.xts() function tries to convert the date column along with the numeric columns. Because R does not consider dates to be numeric values, it automatically converts the entire data frame to character. I'm assuming that happens in your example as well (you can check this using your .csv-file).
Something like this should help:
data.in <- read.csv("banknifty.csv",sep=",",header=T)
data.in[,1] <- format(as.Date(data.in[,1]), format="%Y-%m-%d", tz="GMT", usetz=TRUE) #change tz to whatever applies
data.in[,1] <- as.POSIXct(data.in[,1], "GMT")
data.ts <- xts(data.in[,c(2,3,4,5)], order.by = data.in[,1])
(Note that data.ts <- xts(data.in, order.by = data.in[,1]) would replicate the unwanted conversion. Also, apologies that this is probably not the cleanest / most concise code, I'm still learning.)

Use as.numeric and your code will be:
> data.in <- as.xts(read.zoo("banknifty.csv",sep=",",tz="" ,header=T);
> sapply(c(1:4), function(n) { data.in[,n] <- as.numeric(data.in[,n]) }, simplify=TRUE )
[,1] [,2] [,3] [,4]
[1,] 6032.25 6040.50 6032.17 6036.29
[2,] 6036.29 6036.29 6020.00 6025.05
[3,] 6025.05 6026.00 6020.10 6023.12
[4,] 6023.12 6034.45 6022.73 6034.45
[5,] 6034.45 6034.45 6030.00 6030.00
[6,] 6030.00 6038.00 6028.25 6038.00
> data.in
V2 V3 V4 V5
2007-01-02 10:00:00 6032.25 6040.50 6032.17 6036.29
2007-01-02 10:05:00 6036.29 6036.29 6020.00 6025.05
2007-01-02 10:10:00 6025.05 6026.00 6020.10 6023.12
2007-01-02 10:15:00 6023.12 6034.45 6022.73 6034.45
2007-01-02 10:20:00 6034.45 6034.45 6030.00 6030.00
2007-01-02 10:25:00 6030.00 6038.00 6028.25 6038.00
>

> sapply(c(1:4), function(n) { data.in[,n] <- as.numeric(data.in[,n]) }, simplify=TRUE )
This command does not make any change to data.in. It returns the data in same format with quotes
> data.in
V2 V3 V4 V5
2007-01-02 10:00:00 "6032.25" "6040.50" "6032.17" "6036.29"
2007-01-02 10:05:00 "6036.29" "6036.29" "6020.00" "6025.05"
2007-01-02 10:10:00 "6025.05" "6026.00" "6020.10" "6023.12"

Related

Merging many lists of different XTS objects in R

I have 3 lists of large XTS objects: "SMA"; "L", "Marubozu". Quick look how it looks:
> names(Marubozu)
[1] "TSLA" "AAPL" "NTES" "GOOGL" "ASML" "GOOG" "NFLX" "ADBE" "AMZN" "MSFT" "ADI" "FB"
> names(SMA)
[1] "TSLA" "AAPL" "NTES" "GOOGL" "ASML" "GOOG" "NFLX" "ADBE" "AMZN" "MSFT" "ADI" "FB"
> names(L)
[1] "TSLA" "AAPL" "NTES" "GOOGL" "ASML" "GOOG" "NFLX" "ADBE" "AMZN" "MSFT" "ADI" "FB"
> head(Marubozu$AAPL, n = 2)
WhiteMarubozu BlackMarubozu
2000-01-03 FALSE FALSE
2000-01-04 FALSE FALSE
> head(SMA$AAPL, n = 2)
UpTrend NoTrend DownTrend Trend
2000-01-03 NA NA NA NA
2000-01-04 NA NA NA NA
> head(L$AAPL, n =2)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2000-01-03 0.936384 1.004464 0.907924 0.999442 535796800 0.856887
2000-01-04 0.966518 0.987723 0.903460 0.915179 512377600 0.784643
I want to merge corresponding XTS objects in that lists so that it creates one big lig list. For example, the output for New_List$AAPL would be:
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted WhiteMarubozu BlackMarubozu UpTrend NoTrend DownTrend Trend
2000-01-03 0.936384 1.004464 0.907924 0.999442 535796800 0.856887 0 0 NA NA NA NA
2000-01-04 0.966518 0.987723 0.903460 0.915179 512377600 0.784643 0 0 NA NA NA NA
I tried to create a list of lists and merging it, but it didnt work. Here you can see:
#That works for a single ticker AAPL
full <- merge.xts(L$AAPL, Marubozu$AAPL, SMA$AAPL)
#This doesn't work
out3 <- Map(function(x) {full$x <- merge.xts(lista[[1]]$x, lista[[2]]$x)}, lista)
I guess it is just some simple 2-lines thing but can't really find the solution, thanks for any responses!
We could do this with Map - as the list of xts elements have the same tickers in the same order, just use Map instead of creating a list of lists
library(xts)
out <- Map(merge.xts, L, Marubozu, SMA)
Here's a small function u() that binds the xts-index to an xts object and converts to 'data.frame'.
u <- function(x) cbind.data.frame(index=index(x), unclass(x))
To test it, we create some data using sample_matrix which comes with xts. We split first two and last two columns into two separate xts objects with same index.
library(xts)
data(sample_matrix)
sample.xts <- as.xts(sample_matrix, descr='my new xts object')
S1 <- as.xts(sample_matrix[,1:2]) ##
S2 <- as.xts(sample_matrix[,3:4])
Now we may easily apply merge and create a new xts object out of it.
res <- merge(u(S1), u(S2)) |>
(\(x) xts(x[-1], x$index, descr='my new xts object'))()
class(res)
# [1] "xts" "zoo"
stopifnot(all.equal(res, sample.xts)) ## proof

Trouble coercing data frame to look like an object from the package "DOSE"; my as.numeric() object looks different

I'm having some trouble trying to coerce a dataframe into data provided from the R package DOSE so I can run GO over-representation test from the package clusterProfiler
The clusterProfiler vignette uses geneList from the DOSE package; it's class is numeric, but it's somehow linked a gene expression count with gene id.
You can see the data by doing the following:
source("https://bioconductor.org/biocLite.R")
biocLite("DOSE")
data(geneList, package="DOSE")
class(geneList)
[1] "numeric"
dput(head(geneList))
structure(c(4.57261268231107, 4.51459371540294, 4.41821798112707,
4.14407518193211, 3.87625800905113, 3.67785700608222), .Names = c("4312",
"8318", "10874", "55143", "55388", "991"))
I have a data.frame with two columns:
dput(df)
structure(list(Gene_symbol = c(5339L, 1778L, 79026L, 5591L, 23224L,
23195L), HAP1_pc = c(170, 253, 221.5, 231, 163.5, 172)), .Names = c("Gene_symbol",
"HAP1_pc"), class = "data.frame", row.names = c(NA, -6L))
When I try to coerce this data frame into a numeric class it gives me an error:
df2 <- as.numeric(df)
Error: (list) object cannot be coerced to type 'double'
I searched Stack Overflow for this error and found a suggestion from this here:
as.numeric(df[[1]])
[1] 5339 1778 79026 5591 23224 23195
Of course this only returns column one (gene id).
By reason, if I replaced it with 2 it should return column 2 (gene expression)
as.numeric(df[[2]])
[1] 170.0 253.0 221.5 231.0 163.5 172.0
Indeed it did.
What confuses me is when I try specifying a range it returns the second row of column one:
as.numeric(df[[1:2]])
[1] 1778
At the bottom of this post someone said the above solution only works for one column (which appears to be the case); however, their suggestion for multiple columns (below) does not work as it returns class matrix:
apply(df, 2 , as.numeric)
Gene_symbol HAP1_pc
[1,] 5339 170.0
[2,] 1778 253.0
[3,] 79026 221.5
[4,] 5591 231.0
[5,] 23224 163.5
[6,] 23195 172.0
I tried converting this matrix to numeric but this doesn't work either.
foo <- apply(df, 2 , as.numeric)
bar <- as.numeric(foo)
bar
[1] 5339.0 1778.0 79026.0 5591.0 23224.0 23195.0 170.0 253.0 221.5 231.0 163.5 172.0
Comparing the dput of geneList and my data the first apparent difference is that mine is list so i try to unlist() before as.numeric()
as.numeric(unlist(df))
[1] 5339.0 1778.0 79026.0 5591.0 23224.0 23195.0 170.0 253.0 221.5 231.0 163.5 172.0
Same result as converting the matrix to numeric above.
Next apparent thing from the two dput()'s is that while my data has .Name as the headers the geneList .Name is one for every value in column 1. This is likely where the problem lies; however, I'm not sure what to do about it.
How can one do this with base R or using `dplyr'? You're help would be greatly appreciated.

Saving dates in a matrix ("origin must be supplied") with r

I am writing my bachelor thesis and I have not much experience with r so far.
My problem is that my dates which I made with this commands :
t<-strptime(x, "%d.%m.%Y %H.%M")
don't work anymore when I save them in a matrix with the other information on those specific dates.
I am a bit confused because it works just fine when I don't put them in a matrix like this t[1:10]
But that happens as soon as I try to save them in a matrix
matrix1<-matrix(c(t,v2,v3,v4),nrow=length(v2))
Fehler in as.POSIXct.numeric(X[[i]], ...) : 'origin' muss angegeben werden
It's German but it means origin must be supplied.
Any ideas what I have to do to fix it? I am a bit frustrated :)
Roland is right. You can't have Posixlt objects in a matrix. What you can do is save those dates as numeric timestamps in the matrix and convert them back to dates while accessing
Converting to numeric timestamp:
>date<- as.numeric(as.POSIXct("2014-02-16 2:13:46 UTC",origin="01-01-1970"))
>date
[1] 1392545626
Then save those timestamps in a matrix as you do and to convert it back to date, use the above command again without converting it into a numeric.
t (terrible name by the way, easily confused with the t function) is a POSIXlt object, which internally is a list. First you should check, what c(t,v2,v3,v4) returns (I don't know how v2 etc are defined).
Then we can look into the documentation in help("matrix"):
data
an optional data vector (including a list or expression vector). Non-atomic classed R objects are coerced by as.vector and all attributes discarded.
The important bit is "all attributes discarded". This is what you get if you discard the attributes (which include the class attribute) of a POSIXlt object:
x <- strptime(c("2016-05-09 12:00:00", "2016-05-09 13:00:00"), format = "%Y-%m-%d %H:%M:%S")
attributes(x) <- NULL
print(x)
# [[1]]
# [1] 0 0
#
# [[2]]
# [1] 0 0
#
# [[3]]
# [1] 12 13
#
# [[4]]
# [1] 9 9
#
# [[5]]
# [1] 4 4
#
# [[6]]
# [1] 116 116
#
# [[7]]
# [1] 1 1
#
# [[8]]
# [1] 129 129
#
# [[9]]
# [1] 1 1
#
# [[10]]
# [1] "CEST" "CEST"
#
# [[11]]
# [1] NA NA
A matrix can't contain POSIXlt objects (or any objects, i.e., anything with an explicit class).

Getting mysterious NA's when trying to parse date

I have not had experience with using dates in R. I have read all of the docs but I still can't figure out why I am getting this error. I am trying to take a vector of strings and convert that into a vector of dates, using some specified format. I have tried both using for loops and converting each date indicidually, or using vector functions like sapply, but neither is working. Here is the code using for loops:
dates = rawData[,ind] # get vector of date strings
print("single date example")
print(as.Date(dates[1]))
dDates = rep(1,length(dates)) # initialize vector of dates
class(dDates)="Date"
for (i in 1:length(dates)){
dDates[i]=as.Date(dates[i])
}
print(dDates[1:10])
EDIT: info on "dates" variables
[1] "dates"
V16 V17 V18 V19 V36
[1,] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16 12:00"
[2,] "2014-01-04" "2014-01-18" "2014-01-04" "2014-01-08" "1998-09-04 12:00"
[3,] "2014-03-05" "2014-03-19" "2014-03-05" "2014-03-07" "1996-09-30 05:00"
[4,] "2014-01-21" "2014-02-04" "2014-01-22" "2014-01-24" "1995-08-21 12:00"
[5,] "2014-01-07" "2014-01-21" "2014-01-07" "2014-01-09" "1994-04-07 12:00"
[1] "class(dates)"
[1] "matrix"
[1] "class(dates[1,1])"
[1] "character"
[1] "dim(dates)"
[1] 56557 8
The result I am getting is as follows:
[1] "single date example"
[1] "2014-01-16"
Error in charToDate(x) :
character string is not in a standard unambiguous format
So basically, when I try to parse a signle element of the date string into a date, it works fine. But when I try to parse the dates in a loop, it breaks. How could this be so?
The reason why I am using a loop instead of sapply is because that was returning an even stranger result. When I try to run:
dDates = sapply(dDates, function(x) as.Date(x, format = "%Y-%m-%d"))
I am getting the following output:
2014-01-16 2014-01-04 2014-03-05 2014-01-21 2014-01-07 2014-01-02 2014-01-08
NA NA NA NA NA NA NA
2014-02-22 2014-01-09 2014-02-22
NA NA NA
Which is very strange. As you can see, since my format was correct, it was able to parse out the dates. But for some reason, it is also giving a time value of NA (or at least that is what I think the NA means). Maybe this is happening because some of my date strings have times, while others don't. But the thing is I left the time out of the format because I don't care about time.
Does anyone know why this is happening or how to fix it? I can't find anywhere online where you can "set" the time value of a date object easily -- I just can't seem to get rid of that NA. And somehow even a for loop doesn't work! Either was, the output is strange and I am not getting the expected results, even though my format is correct. Very frustrating that a simple thing like parsing a vector of dates is so much more difficult than in matlab or java.
Any help please?
EDIT: when I try simply
dDates = as.Date(dates,format="%m/%d/%Y")
I get the output
"dDates[1:10]"
[1] NA NA NA NA NA NA NA NA NA NA
still those mysterious NA's. I am also getting an error
Error in as.Date.default(value) :
do not know how to convert 'value' to class “Date”
Using a subset of your data,
v <- c("2014-01-16", "2014-01-30", "2014-01-16", "2014-01-17", "1999-03-16 12:00")
these statements are equivalent, since your format is the default one:
as.Date(v)
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
as.Date(v, format = "%Y-%m-%d")
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
If you would like to format the output of your date, use format:
format(as.Date(v), format = "%m/%d/%Y")
[1] "01/16/2014" "01/30/2014" "01/16/2014" "01/17/2014" "03/16/1999"

Character to date with as.Date

I have a vector (length=1704) of character like this:
[1] "1871_01" "1871_02" "1871_03" "1871_04" "1871_05" "1871_06" "1871_07" "1871_08" "1871_09" "1871_10" "1871_11" "1871_12"
[13] "1872_01" "1872_02" "1872_03" ...
.
.
.
[1681] "2011_01" "2011_02" "2011_03" "2011_04" "2011_05" "2011_06" "2011_07" "2011_08" "2011_09" "2011_10" "2011_11" "2011_12"
[1693] "2012_01" "2012_02" "2012_03" "2012_04" "2012_05" "2012_06" "2012_07" "2012_08" "2012_09" "2012_10" "2012_11" "2012_12"
I want to convert this vector into a vector of dates.
For that I use:
as.Date(vector, format="%Y_%m")
But it returns "NA"
I tried for one value:
b <- "1871_01"
as.Date(b, format="%Y_%m")
[1] NA
strptime(b, "%Y_%m")
[1] NA
I don't understand why it doesn't work...
Does anyone have a clue?
If you do regular work in year+month format, the zoo package can come in handy since it treats yearmon as a first class citizen (and is compatible with Date objects/functions):
library(zoo)
my.ym <- as.yearmon("1871_01", format="%Y_%m")
print(my.ym)
## [1] "Jan 1871"
str(my.ym)
## Class 'yearmon' num 1871
my.date <- as.Date(my.date)
print(my.date)
## [1] "1871-01-01"
str(my.date)
## Date[1:1], format: "1871-01-01"

Resources