Converting CSV file to xts - r

I have a CSV file containing data with the first column in Unix time stamp. How can I convert it to xts form directly? Currently I am trying to read the file and convert using as.xts, but I get error messages every way I try.
An example of a code I used:
Data <- read.zoo("data.csv", index.column = 1, origin="01/01/1970",
sep = ",", header = TRUE, FUN = as.POSIXct)
as.xts(Data)
1st 2 lines of the csv:
1366930371 143.7 0.25275
1366930368 143.7 0.02664867

There could be several things wrong. First is that the first 2 lines of your "csv" are tab-separated, not comma-separated. Next, you specify header=TRUE, but the first 2 lines do not have a header. Third, origin= is in the wrong format. It should be yyyy-mm-dd.
This works:
library(xts)
Lines <- "1366978862,133.08,0.48180896
1366978862,133.08,0.5"
tc <- textConnection(Lines)
Data <- read.zoo(tc, sep=",", FUN=function(i) as.POSIXct(i, origin="1970-01-01"))
close(tc)
Data <- as.xts(Data)

Related

NA introduced by coercion

I have a file a notepad txt file inflation.txt that looks something like this:
1950-1 0.0084490544865279
1950-2 −0.0050487986543660
1950-3 0.0038461526886055
1950-4 0.0214293914558992
1951-1 0.0232839389540449
1951-2 0.0299121323429455
1951-3 0.0379293285389640
1951-4 0.0212773984472849
From a previous stackoverflow post, I learned how to import this file into R:
data <- read.table("inflation.txt", sep = "" , header = F ,
na.strings ="", stringsAsFactors= F, encoding = "UTF-8")
However, this code reads the file as a character. When I try to convert this file to numeric format, all negative values are replaced with NA:
b=as.numeric(data$V2)
Warning message:
In base::as.numeric(x) : NAs introduced by coercion
> head(b)
[1] 0.008449054 NA 0.003846153 0.021429391 0.023283939 0.029912132
Can someone please show me what I am doing wrong? Is it possible to save the inflation.txt file as a data.frame?
I would read the file using space as a separator, then spin out two separate columns for the year and quarter from your R script:
data <- read.table("inflation.txt", sep = " ", header=FALSE,
na.strings="", stringsAsFactors=FALSE, encoding="UTF-8")
names(data) <- c("ym", "vals")
data$year <- as.numeric(sub("-.*$", "", data$ym))
data$month <- as.numeric(sub("^\\d+-", "", data$ym))
data <- data[, c("year", "month", "vals")]
The issue is that "−" that you have in your data is not minus sign (it is a dash), hence the data is being read as character.
You have two options.
Open the file in any text editor and find and replace all the "−" with negative sign and then using read.table would work directly.
data <- read.table("inflation.txt")
If you can't change the data in the original file then replace them with sub after reading the data into R.
data$V2 <- as.numeric(sub('−', '-', data$V2, fixed = TRUE))

Reading data from text file and combining it with date in r

I downloaded data from the internet. I wanted to extract the data and create a data frame. You can find the data in the following filtered data set link: http://www.esrl.noaa.gov/gmd/dv/data/index.php?category=Ozone&type=Balloon . At the bottom of the site page from the 9 filtered data sets you can choose any station. Say Suva, Fiji (SUV):
I have written the following code to create a data frame that has Launch date as part of the data frame for each file.
setwd("C:/Users/")
path = "~C:/Users/"
files <- lapply(list.files(pattern = '\\.l100'), readLines)
test.sample<-do.call(rbind, lapply(files, function(lines){
data.frame(datetime = as.POSIXct(sub('^.*Launch Date : ', '', lines[grep('Launch Date :', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('Sonde Total', lines) + 1):length(lines)]))
}))
The files are from FTP server. The pattern of the file doesn't look familiar to me even though I tried it with .txt, it didn't work. Can you please tweak the above code or any other code to get a data frame.
Thank you in advance.
I think the problem is that the search string does not match "Launch Date :" does not match what is in the files (at least the one I checked).
This should work
lines <- "Launch Date : 11 June 1991"
lubridate::dmy(sub('^.*Launch Date.*: ', '', lines[grep('Launch Date', lines)]))
Code would probably be easier to debug if you broke the problem down into steps rather than as one sentence
I took the following approach:
td <- tempdir()
setwd(td)
ftp <- 'ftp://ftp.cmdl.noaa.gov/ozwv/Ozonesonde/Suva,%20Fiji/100%20Meter%20Average%20Files/'
files <- RCurl::getURL(ftp, dirlistonly = T)
files <- strsplit(files, "\n")
files <- unlist(files)
dat <- list()
for (i in 1:length(files)) {
download.file(paste0(ftp, files[i]), 'data.txt')
df <- read.delim('data.txt', sep = "", skip = 17)
ld <- as.character(read.delim('data.txt')[9, ])
ld <- strsplit(ld, ":")[[1]][2]
df$launch.date <- stringr::str_trim(ld)
dat[[i]] <- df ; rm(df)
}

Combining files from a list based on date

I have a list of files that are all named similarly: "FlightTrackDATE.txt" where the date is expressed in YYYYMMDD. I read in all the files with the list.files() command, but this gives me all the files in that folder (only flight track files are in this folder). What I would like to do is create a new file that will combine all the files from the last 90 days (or three months, whichever is easier) and ignore the other files.
You can try this :
#date from which you want to consolidate (replace with required date)
fromDate = as.Date("2015-12-23")
for (filename in list.files()){
#extract the date from filename using substr ( characters 12- 19)
filenameDate = as.Date(substr(filename,12,19), format = "%Y%m%d")
#read and consolidate if the filedate is on or after from date
if ((filenameDate - fromDate) >=0){
#create consolidated list from first file
if (!exists('consolidated')){
consolidated <- read.table(filename, header = TRUE)
} else{
data = read.table(filename, header = TRUE)
#row bind to consolidate
consolidated = rbind(consolidated, data)
}
}
}
OUTPUT:
I have three sample files :
FlightTrack20151224.txt
FlightTrack20151223.txt
FlightTrack20151222.txt
Sample data:
Name Speed
AA101 23
Consolidated data:
Name Speed
1 AA102 24
2 AA101 23
Note:
1. Create the From date by subtracting from current date or using a fixed date like above.
2. Remember to clean up the existing consolidated data if you are running the script again. Data duplication might occur otherwise.
3. Save consolidated to file :)
Consider an lapply() solution without a need for list.files() since you know ahead of time the directory and file name structure:
path = "C:/path/to/txt/files"
# LIST OF ALL LAST 90 DATES IN YYYYMMDD FORMAT
dates <- lapply(0:90, function(x) format(Sys.Date()-x, "%Y%m%d"))
# IMPORT ALL FILES INTO A LIST OF DATAFRAMES
dfList <- lapply(paste0(path, "FlightTrack", dates, ".txt"),
function(x) if (file.exists(x)) {read.table(x)})
# COMBINE EACH DATA FRAME INTO ONE
df <- do.call(rbind, dfList)
# OUTPUT FINAL FILE TO TXT
write.table(df, paste0(path, "FlightTrack90Days.txt"), sep = ",", row.names = FALSE)

Read csv from specific row

I have daily data starting from 1980 in csv file. But I want to read data only from 1985. Because the other dataset in another file starts from 1985. How can I skip reading the data before 1985 in R language?
I think you want to take a look at ?read.csv to see all the options.
It's a bit hard to give an exact answer without seeing a sample of your data.
If your data doesn't have a header and you know which line the 1985 data starts on, you can just use something like...
impordata <- read.csv(file,skip=1825)
...to skip the first 1825 lines.
Otherwise you can always just subset the data after you've imported it if you have a year variable in your data.
impordata <- read.csv("skiplines.csv")
impordata <- subset(impordata,year>=1985)
If you don't know where the 1985 data starts, you can use grep to find the first instance of 1985 in your file's date variable and then only keep from that line onwards:
impordata <- read.csv("skiplines.csv")
impordata <- impordata[min(grep(1985,impordata$date)):nrow(impordata),]
Here are a few alternatives. (You may wish to convert the first column to "Date" class afterwards and possibly convert the entire thing to a zoo object or other time series class object.)
# create test data
fn <- tempfile()
dd <- seq(as.Date("1980-01-01"), as.Date("1989-12-31"), by = "day")
DF <- data.frame(Date = dd, Value = seq_along(dd))
write.table(DF, file = fn, row.names = FALSE)
read.table + subset
# if file is small enough to fit in memory try this:
DF2 <- read.table(fn, header = TRUE, as.is = TRUE)
DF2 <- subset(DF2, Date >= "1985-01-01")
read.zoo
# or this which produces a zoo object and also automatically converts the
# Date column to Date class. Note that all columns other than the Date column
# should be numeric for it to be representable as a zoo object.
library(zoo)
z <- read.zoo(fn, header = TRUE)
zw <- window(z, start = "1985-01-01")
If your data is not in the same format as the example you will need to use additional arguments to read.zoo.
multiple read.table's
# if the data is very large read 1st row (DF.row1) and 1st column (DF.Date)
# and use those to set col.names= and skip=
DF.row1 <- read.table(fn, header = TRUE, nrow = 1)
nc <- ncol(DF.row1)
DF.Date <- read.table(fn, header = TRUE, as.is = TRUE,
colClasses = c(NA, rep("NULL", nc - 1)))
n1985 <- which.max(DF.Date$Date >= "1985-01-01")
DF3 <- read.table(fn, col.names = names(DF.row1), skip = n1985, as.is = TRUE)
sqldf
# this is probably the easiest if data set is large.
library(sqldf)
DF4 <- read.csv.sql(fn, sql = 'select * from file where Date >= "1985-01-01"')
A data.table method which will offer speed and memory performance:
library(data.table)
fread(file, skip = 1825)

Read xts from CSV file in R

I'm trying to read time series from CSV file and save them as xts to be able to process them with quantmod. The problem is that numeric values are not parsed.
CSV file:
name;amount;datetime
test1;3;2010-09-23 19:00:00.057
test2;9;2010-09-23 19:00:00.073
R code:
library(xts)
ColClasses = c("character", "numeric", "character")
Data <- read.zoo("c:\\dat\\test2.csv", index.column = 3, sep = ";", header = TRUE, FUN = as.POSIXct, colClasses = ColClasses)
as.xts(Data)
Result:
name amount
2010-09-23 19:00:00 "test1" "3"
2010-09-23 19:00:00 "test2" "9"
See amount column contains character data but expected to be numeric. What's wrong with my code?
The internal data structure of both zoo and xts is matrix, so you cannot mix data types.
Just read in the data with read.table:
Data <- read.table("file.csv", sep=";", header=TRUE, colClasses=ColClasses)
I notice your data have subseconds, so you may be interested in xts::align.time. This code will take Data and create one object with a column for each "name" by seconds.
NewData <- do.call( merge, lapply( split(Data,Data$name), function(x) {
align.time( xts(x[,"amount"],as.POSIXct(x[,"datetime"])), n=1 )
}) )
If you want to create objects test1 and test2 in your global environment, you can do something like:
lapply( split(Data,Data$name), function(x) {
assign(x[,"name"], xts(x[,"amount"],as.POSIXct(x[,"datetime"])),envir=.GlobalEnv)
})
You cannot mix numeric and character data in a zoo or xts object; however, if the name column is not intended to be time series data but rather is intended to distinguish between multiple time series, one for test1, one for test2, etc. then you can split on column 1 using split=1 to cause such splitting as shown in the following code. Be sure to set the digits.secs or else you won't see the sub-seconds on output (although they will be there in any case):
options(digits.secs = 3)
z <- read.zoo("myfile.csv", sep = ";", split = 1, index = 3, header = TRUE, tz = "")
x <- as.xts(z)

Resources