Export a simple R dataframe to txt tsv or csv - r

I am trying to do something apparently obvious, but have no way to solve it. From a dataframe in R downloaded from the web as follows I need to save the data. Here is how I do download it:
library(tseries)
library(zoo)
ts <- get.hist.quote(instrument="DJIA",
start="2008-07-01", end="2017-03-05",
quote="Close", provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
Then, returns object "ts" with a two columns table; the first of dates (with no header as R prefers) and the other with the "Close" value of DJIA
> ts
Close
2008-07-01 11382.26
2008-07-02 11215.51
2008-07-03 11288.53
2008-07-07 11231.96
.
.
.
2016-03-03 16943.90
2016-03-04 17006.77
I need this data exported in txt or similar format and import the list later; (because I will try to process health information, with no internet access) but when I try to save it; the date column with no header is missing. Additionally a "number of row" column is added. I do appologize if the question is obvious but have no other option to solve it

The date column has no header, because the date is imported as rownames/index. The default of write.csv has row.names = FALSE. Try:
write.csv(ts, file = "ts.csv",row.names=TRUE)
EDIT
Strangly, this doesn't work with an object of class "zoo"
According tot ? write.table:
write.table prints its required argument x (after converting it to a
data frame if it is not one nor a matrix) to a file or connection.
Apparently this conversion fails somehow. However, this works:
write.csv(data.frame(ts), file = "ts.csv",row.names=TRUE)

The ts object is a zoo object (not a two column table). In this case the zoo object is internally represented by a one column matrix of data and an "index" attribute holding the dates.
1) save/load If the only thing you want to do with the output file is to read it back into R later then there is no reason to require text and any format will do. In particular you could do this:
save(ts, file = "ts.Rda")
Now in a later session:
library(zoo)
load("ts.Rda")
1a) This would also work and produces an R source file that when sourced reconstructs the zoo object:
dump("ts", "ts.R")
and in a later session:
library(zoo)
source("ts.R")
2) write.zoo/read.zoo This will give a text file:
write.zoo(ts, "ts.dat")
and it can be written back in another session using:
library(zoo)
ts <- cbind( read.zoo("ts.dat", header = TRUE) )

Related

When writing a DateTime value using openxlsx, a "x" is written followed by DateTime value in the next row instead of just the DateTime value

I would like to write DateTime values to an excel sheet using openxlsx. When I try to do this, instead of just the DateTime value, I get a lowercase "x" on one row followed by the DateTime in the subsequent row. This occurs whether I use write.xlsx or writeData. I also tried converting the DateTime using as.POSIXlt or as.POSIXct, converting the date with timezone specified or not, and get the same result.
The UTC DateTime values are coming from a PerkinElmer microplate reader file.
Below is a code snippet that gives me this result. Any advice or help is appreciated, Thanks!
library(openxlsx)
library(lubridate)
date <- as_datetime("2022-04-07T22:15:08+0000", tz = "America/Los_Angeles")
options(openxlsx.datetimeFormat = "yyyy-mm-dd hh:mm:ss")
write.xlsx(date,"test.xlsx",overwrite = TRUE)
The documentation of write.xlsx says in section Arguments that x is (my emphasis)
A data.frame or a (named) list of objects that can be handled by writeData() or writeDataTable() to write to file.
So apparently an atomic vector is first coerced to data.frame and since the data argument name is x, so is its column header.
This also happens when writing a named list date_list <- list(date = date). A workbook with a sheet named date is created and the data in it has a column header x.

Date formatted cell in xlsx files to R

I have an excel file which has date information in some cells. like :
I read this file into R by the following command :
library(xlsx)
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 9,header = T))
Everything is correct except the cells with date! instead of having the xlsx information into those cell, I always have 42948 as a date :
Does anybody know how could I fix this ?
As you can see, after importing your files, dates are represented as numeric values (here 42948). They are actually the internal representation of the date information in Excel. Those values are the ones that R presents instead of the “real” dates.
You can get those dates in R with as.Date(42948 - 25569, origin = "1970-01-01")
Notice that you can also use a vector containing the internal representation of the dates, so this should also work
vect <- c(42948, 42949, 42950)
as.Date(vect - 25569, origin = "1970-01-01")
PS: To convert an Excel datetime colum, see this (p.31)

filtering while downloading a dataset R

There is a large dataset that I need to download over the web using R, but I would like to learn how to filter it at the same time while downloading to the Dates that I need. Right now, I have it setup to download and .unzip and then I create another data set with a filter. The file is a text ";" delimited file
There is a Date column with format 1/1/2009 and I need to only select two dates, 3/1/2009 and 3/2/2009, how to do that in R ?
When I import it, R set it as a factor, since I only need those two dates and there is no need to do a Between, I just select the two factors and call it a day.
Thanks!
I don't think you can filter while downloading. To select only these dates you can use the subset function:
# do not convert string to factors
d.all = read.csv(file, ..., stringsAsFactors = FALSE, sep = ';')
# Date column is called DATE:
d.filter = subset(d.all, DATE %in% c("1/1/2009", "3/1/2009"))

How to prevent write.csv from changing POSIXct, dates and times class back to character/factors?

I have a .csv file with one field each for datetime, date and time.
Originally they are all character fields and I have converted them accordingly.
At the end of my code, if I do:
str(data)
I will get
datetime: POSIXct
date: Date
time: Class 'times' atomic [1:2820392] (....) attr(*, "format")= chr "h:m:s"
Now, I am very happy with this and I want to create a .csv file, so this is what I have:
write.csv(data, file = "data.csv", row.names = FALSE)
I have also tried
write.table(data, "data.csv", sep = ",", row.names = FALSE)
And I get the same result with both, which is all my convertion gets lost when writing the new .csv: everything is back to being a character.
I suspect I am missing some argument in the write function, but I have been searching all afternoon and I can't find out what. Can some please help?
If you want to preserve all of the time information so it can be read in again, this recipe should work:
dat <- data.frame(time=as.POSIXlt("2013-04-25 09:00 BST"), quantity=1)
dat2 <- dat
dat2$time <- format(dat2$time, usetz=TRUE)
write.csv(dat2, "time.csv", row.names=FALSE)
It gives the following CSV file:
"time","quantity"
"2013-04-25 09:00:00 BST",1
in which the timezone information is presented explicitly; if you apply write.csv to the original dat, the formatting is lost.
According to ?write.table:
Any columns in a data frame which are lists or have a class (e.g.
dates) will be converted by the appropriate 'as.character' method:
such columns are unquoted by default.
Simply put, you can only write text/characters to text files. Use save if you want to preserve the binary R representation of your object(s).
If you are willing to add dplyr and lubridate as dependencies you can generate the CSV with dates in ISO8601 (so you don't lose any information) like this:
#install.packages("tidyverse")
#install.packages("dplyr")
library(dplyr)
library(lubridate, warn.conflicts = FALSE)
dat <- data.frame(time=as.POSIXlt("2013-04-25 09:00 BST"), quantity=1) # example data
write.csv(mutate(dat, time=format(time, "%FT%H:%M:%S%z")), file="test.csv", row.names=FALSE)
That will generate a CSV file with the following content:
"time","quantity"
"2013-04-25T09:00:00+0200",1
As you can see the CSV contain the date in ISO8601 with the timezone information so no information is lost.
If you want to read back that CSV you can
df2 <- read.csv("test.csv") %>% mutate(time=ymd_hms(time))

Using R to create and merge zoo object time series from csv files

I have a large set of csv files in a single directory. These files contain two columns, Date and Price. The filename of filename.csv contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function, I can fill in the missing values with the most recent observations.
I want to automate the process of.
loading the *.csv file columnar Date and Price data into R dataframes.
establishing each distinct time series within the Merged zoo "portfolio of time series" objects with an identity that is equal to each of their s.
merging these zoo objects time series using MergedData <- na.locf(merge( )).
The ultimate goal, of course, is to use the fPortfolio package.
I've used the following statement to create a data frame of Date,Price pairs. The problem with this approach is that I lose the <filename> identifier of the time series data from the files.
result <- lapply(files, function(x) x <- read.csv(x) )
I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.
Try this:
z <- read.zoo(files, header = TRUE, sep = ",")
z <- na.locf(z)
I have assumed a header line and lines like 2000-01-31,23.40 . Use whatever read.zoo arguments are necessary to accommodate whatever format you have.
You can have better formatting using sapply( keep the files names). Here I will keep lapply.
Assuming that all your files are in the same directory you can use list.files.
it is very handy for such workflow.
I would use read.zoo to get directly zoo objects(avoid later coercing)
For example:
zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv', ## I look for csv files,
## which names start with zoo_
full.names=T), ## to get full names path+filename
read.zoo)
I use now list.files again to rename my result
names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv')

Resources