Convert .csv file for further manipulation using 'highfrequency' package on R

Convert .csv file for further manipulation using 'highfrequency' package on R - r

The highfrequency package has been created in a way to transform .txt and .csv files from the NYSE TAQ and WRDS TAQ respectively into .RData files of xts objects, which then can be easily manipulated through the package.
The problem is that I have limited access to the WRDS database which only enables me to download tick-data from the CRSP (The Center for Research in Security Prices) database but not the TAQ (Trades and Quotes) database. So my data look like this. The downloadable file contains tick-data for the REIT index from 2014-01-01 to 2014-01-05. I changed manually the ticker header for the header PRICE as it is proposed by Kris Boudt, one of the main authors.
The code that I use is the following:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from = from,to=to,datasource = datasource,datadestination = datadestination,
trades=TRUE,quotes=FALSE,ticker="REIT",dir=FALSE,extension="csv",header = TRUE,
tradecolnames = NULL, quotecolnames = NULL,format = "%Y%m%d %H:%M:%S",onefile=TRUE)
I suspect that the problem lies at the line format = "%Y%m%d %H:%M:%S", as at the .csv file the date and the time are comma separated. I tried to put a comma between %d and %H like this format = "%Y%m%d,%H:%M:%S" but nothing.
The error reads
Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
replacement has 0 rows, data has 1048575
All the suggestions are welcomed.

Thanks to Joshua Ulrich I was able to gain some additional intuition and solve the problem(s). Actually, there is no need to manipulate the .csv file itself and add extra columns. Instead of setting tradecolnames = NULL you let the machine know which columns are contained into your file by setting tradecolnames = c("DATE","TIME","PRICE"). The problem with the non-existent directories is fixed by setting dir=TRUE . The final code looks like this:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from,to,datasource,datadestination,trades=TRUE,quotes=FALSE,ticker="REIT",dir=TRUE,extension="csv",header= TRUE,tradecolnames=c("DATE","TIME","PRICE"),format = "%Y%m%d %H:%M:%S",onefile=TRUE)

The highfrequency::convert function calls highfrequency:::makeXtsTrades, which expects the following columns in your text file: DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127.
I added empty columns to your text file, and did not get the error in your question. The edited text file looks like:
DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127
20140102,9:30:00,1123.77,,,,,,
20140102,9:30:01,1122.81,,,,,,
20140102,9:30:02,1122.77,,,,,,
I got another error though.
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file '/home/josh/Desktop/z_xts/2014-01-02/REIT_trades.RData', probable reason 'No such file or directory'
So it looks like the convert function expects all the daily output directories to exist before you run it. The function runs and creates the output after I create those directories.

Related

How to unzip file with different pattern name like alphanumeric folder name that changes everyday

I have plenty of zip files and I want to load only the ones that met the name condition
for example, unzip any file that has a name like this "Query Transaction History_20221122"
I was able to achieve that with the script below
zip_files <-list.files(path ="C:/Users/Guest 1/Downloads",
pattern =".*Query Transaction History_20221122.*zip",full.names = TRUE )
Now I want to extract to the specified folder with the code below using the plyr package
ldply(.data = zip_files,.fun = unzip,exdir =my_dir )
and it extracts fine to the specified folder with no issue
The problem now is that the name of the folder is alphanumeric, which means that it comes with a name and also a date that is formatted as numeric please see the sample below
Query Transaction History_20221122
since it's something I will keep doing on a daily basis, I want to write a code that periodically changes the numeric part of the zip file name.
I tried using glue from the glue package see the sample below
checks<-format(Sys.Date(),"%Y%m%d")
zip_files <-list.files(path ="C:/Users/Guest 1/Downloads",
pattern =glue(".*Query Transaction History_{checks}.*zip",full.names = TRUE ))
it run fine but when I tried to extract the file using the second script
ldply(.data = zip_files,.fun = unzip,exdir =my_dir )
it then returned the error below
In addition: Warning message:
In FUN(X[[i]], ...) : error 1 in extracting from zip file
Kindly assist
Thank you

How to read everyday a data frame, produced daily, and compare to compare with the previous one

I have a question, hopefully somebody might help me.
Everyday, I create a dataframe (df) which I save in my working directory. This dataframe is saved as a ‘.csv’ file. The name of the file is the date in which that file was created, like this:
setwd("~/Documents/daily_data/data_August2021")
library(writexl)
write_xlsx(as.data.frame(df), paste0(Sys.Date(), '.csv'))#To create the file with the current date
Then, my intention is to compare this dataframe with the one I created the day before. So, I do this:
#Compare with previous day
yesterday <- Sys.Date()-1
library(readxl)
df_yesterday <- read.csv(file.path("~/Documents/daily_data/data_August2021",yesterday,".csv"),
header = TRUE, sep=",", stringsAsFactors=TRUE)
Consequently, I have this output as a mistake:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '/Users/Rafa/Documents/daily_data/data_August2021/2021-08-19/.csv': No such file or directory
Basically, it seems my code adds an extra slash (“/”) between the name of the file and the extension of the file.
Any help with this problem?? Thanks a lot!!

The issue is that it creates the / before the .csv
file.path("~/Documents/daily_data/data_August2021",yesterday, ".csv")
[1] "~/Documents/daily_data/data_August2021/2021-08-20/.csv"
^
We could use paste
file.path("~/Documents/daily_data/data_August2021",paste0(yesterday, ".csv"))
[1] "~/Documents/daily_data/data_August2021/2021-08-20.csv"
Or another option is to change the fsep to blank ("")
file.path("~/Documents/daily_data/data_August2021/",yesterday, ".csv", fsep = "")
[1] "~/Documents/daily_data/data_August2021/2021-08-20.csv"

How to get passed the following error "Error in readLines(filestocopy) : 'con' is not a connection"?

I am new to coding and very new to this forum, so I hope my request makes sense.
I am trying to select images listed in a .csv file and to copy them to a new folder. The pictures and the .csv file are both in the folder GRA04. The .csv file contain only one column with the picture names.
I used the following code:
#set working directory
setwd("E:/2019/GRA04")
#create and identify a new folder in R
targetdir <- dir.create("GRA04_age")<br/>
#find the files you want to copy
filestocopy <- read.csv("age.csv", header=FALSE) #read csv as data table (only one column, each raw being a file name)
filestocopy_v <- readLines(filestocopy)#convert data table in character vector
filestocopy_v #shows the character vector
#copy the files to the new folder
file.copy(filestocopy_v, targetdir, recursive = TRUE)
When reaching the line
filestocopy_v <- readLines(filestocopy)
I get this error message:
Error in readLines(filestocopy) : 'con' is not a connection
I looked online for solutions with no luck. I ran this code before (or else something similar... didn't back it up...) and it worked fine, so I am not sure what is happening...
Thanks!

Out of interest, would the following now do what you're trying to achieve?
filestocopy_v <- filestocopy[[1]]

R save() not producing any output but no error

I am brand new to R and I am trying to run some existing code that should clean up an input .csv then save the cleaned data to a different location as a .RData file. This code has run fine for the previous owner.
The code seems to be pulling the .csv and cleaning it just fine. It also looks like the save is running (there are no errors) but there is no output in the specified location. I thought maybe R was having a difficult time finding the location, but it's pulling the input data okay and the destination is just a sub folder.
After a full day of extensive Googling, I can't find anything related to a save just not working.
Example code below:
save(data, file = "C:\\Users\\my_name\\Documents\\Project\\Data.RData", sep="")

Hard to believe you don't see any errors - unless something has switched errors off:
> data = 1:10
> save(data, file="output.RData", sep="")
Error in FUN(X[[i]], ...) : invalid first argument
Its a misleading error, the problem is the third argument, which doesn't do anything. Remove and it works:
> save(data, file="output.RData")
>
sep is used as an argument in writing CSV files to separate columns. save writes binary data which doesn't have rows and columns.

Reading a csv file from aws datalake

I am trying to read a csv file from aws datalake using R.
I used the below code to read the data, unfortunately I am getting an error
Error in read.table(file = file, header = header, sep = sep, quote =
quote, : no lines available in input
I am using the below code,
aws.signature::use_credentials()
c<- get_object("s3://datalake-1/x-data/")
cobj<- rawToChar(c)
con<- textConnection(cobj)
data <- read.csv(con)
close(con)
data

It looks like the file is not present at the address/URI provided. Unable to reproduce this error so, maybe look for your CSV's correct location.
Apart from that I'd also put the read statement within tryCatch as referenced in an already existing answer linked here

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Convert .csv file for further manipulation using 'highfrequency' package on R - r

Related

How to unzip file with different pattern name like alphanumeric folder name that changes everyday

How to read everyday a data frame, produced daily, and compare to compare with the previous one

How to get passed the following error "Error in readLines(filestocopy) : 'con' is not a connection"?

R save() not producing any output but no error

Reading a csv file from aws datalake

Categories

Resources