I have an issue that I'm running into on Windows. Suppose that I have the following data stored in a text file dat.csv:
timestamp, demand
2011-05-27 15:50:04, 38874
2016-03-27 01:30:03, 25107
This data is originally from a much larger csv file detailing the energy market in the UK. I attempt to create an xts time series object from this file as follows:
> library(xts)
> dat <- read.csv('dat.csv', sep=',', header=T, stringsAsFactors=F)
> dat.xts <- xts(dat[, 2],
> order.by = strptime(dat$timestamp, format="%Y-%m-%d %H:%M:%S"))
However, when I attempt to view the resulting xts object, this is what happens:
> dat.xts
[,1]
2011-05-27 15:50:04 38874
<NA> 25107
As you can see, while the index for the first line was parsed correctly, that for the second line has resulted in an NA.
Interestingly, the same code appears to run correctly on Ubuntu 16.04. I suspect it's something to do with how timezones work on Windows but I'm not entirely certain about that. Can somebody explain how I can avoid this problem when running on Windows?
Related
I have a data in excel with 13 columns defined as year and months. I want to run time series analysis but finds it difficult to plot the data. Please how can I convert this file such that I can analyse the data?
My excel file
Tried plotting but gets error. Error reads "can not plot morethan 10 series"
Say your data is df.
Using base R,
rownames(df) <- df$Year
df$Year <- NULL
ts(as.vector(t(as.matrix(dummy.df))),
start=..., end=..., frequency=12)
You may change start and end depends on your data.
I am very new to R, I watched a youtube video to do various time series analysis, but it downloaded data from yahoo - my data is in Excel. I wanted to follow the same analysis, but with data from an excel.csv file. I spent two days finding out that the date must be in USA style. Now I am stuck again on a basic step - loading the data so it can be analysed - this seems to be the biggest hurdle with R. Please can someone give me some guidance on why the command shown below does not do the returns for the complete column set. I tried the zoo format, but it didn't work, then I tried xts and it worked partially. I suspect the original import from excel is the major problem. Can I get some guidance please
> AllPrices <- as.zoo(AllPrices)
> head(AllPrices)
Index1 Index2 Index3 Index4 Index5 Index6 Index7 Index8 Index9 Index10
> AllRets <- dailyReturn(AllPrices)
Error in NextMethod("[<-") : incorrect number of subscripts on matrix
> AllPrices<- as.xts(AllPrices)
> AllRets <- dailyReturn(AllPrices)
> head(AllRets)
daily.returns
2012-11-06 0.000000e+00
2012-11-07 -2.220249e-02
2012-11-08 1.379504e-05
2012-11-09 2.781961e-04
2012-11-12 -2.411128e-03
2012-11-13 7.932869e-03
Try to load your data using the readr package.
library(readr)
Then, look at the documentation by running ?read_csv in the console.
I recommend reading in your data this way. Specify the column types. For instance, if your first column is the date, read it in as a character "c" and if your other columns are numeric use "n".
data <- read_csv('YOUR_DATA.csv', col_types = "cnnnnn") # date in left column, 5 numeric columns
data$Dates <- as.Date(data$Dates, format = "%Y-%m-%d") # make the dates column a date class (you need to update "Dates" to be your column name for the Dates column, you may need to change the format
data <- as.data.frame(data) # turn the result into a dataframe
data <- xts(data[,-1], order.by = XAU[,1]) # then make an xts, data is everything but the date column, order.by is the date column
All of these dates that I’ve manipulated in Execute R module in Azure Machine Learning write out as blank in the output – that is, these date columns exist, but there is no value in those columns.
The source variables which contain date information that I’m reading into the data frame have two different date formats. They are as follows:
usage$Date1=c(‘8/6/2015’ ‘8/20/2015’ ‘7/9/2015’)
usage$Date2=c(‘4/16/2015 0:00’, ‘7/1/2015 0:00’, ‘7/1/2015 0:00’)
I inspected the log file in AML, and AML can't find the local time zone.
The log file warnings specifically:
[ModuleOutput] 1: In strptime(x, format, tz = tz) :
[ModuleOutput] unable to identify current timezone 'C':
[ModuleOutput] please set environment variable 'TZ' [ModuleOutput]
[ModuleOutput] 2: In strptime(x, format, tz = tz) : unknown timezone 'localtime'
I referred to another answer regarding setting default time zone for strptime here
unknown timezone name in R strptime/as.POSIXct
I changed my code to explicitly define the global environment time variable.
Sys.setenv(TZ='GMT')
####Data frame usage cleanup, format and labeling
usage<-as.data.frame(usage)
usage$Date1<-as.character(usage$Date1)
usage$Date1<-as.POSIXct(usage$Date1, "%m/%d/%Y",tz="GMT")
usage$Date1<-format(usage$Date1, "%m/%d/%Y")
usage$Date1<-as.Date(usage$Date1, "%m/%d/%Y")
usage<-as.data.frame(usage)
usage$Date2<- as.POSIXct(usage$Date2, "%m/%d/%Y",tz="GMT")
usage$Date2<- format(usage$Date2,"%m/%d/%Y")
usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y")
usage<-as.data.frame(usage)
The problem persists -as a result AzureML does not write these variables out, rather writing out these columns as blanks.
(This code works in R studio, where I presume the local time is taken from the system.)
After reading two blog posts on this problem, it seems that Azure ML doesn't support some date time formats:
http://blogs.msdn.com/b/andreasderuiter/archive/2015/02/03/troubleshooting-error-1000-rpackage-library-exception-failed-to-convert-robject-to-dataset-when-running-r-scripts-in-azure-ml.aspx
http://www.mikelanzetta.com/2015/01/data-cleaning-with-azureml-and-r-dates/
So I tried to convert to POSIXct before sending it to the output stream, which I've done as follows:
tenantusage$Date1 = as.POSIXct(tenantusage$Date1 , "%m/%d/%Y",tz = "EST5EDT");
tenantusage$Date2 = as.POSIXct(tenantusage$Date2 , "%m/%d/%Y",tz = "EST5EDT");
But encounter the same problem. The information in these variables refuses to write out to the output. Date1 and Date2 columns are blank.
Please advise!
thanks
Hi SingingData and SochiX,
Sorry to hear about this source of frustration! I find that the following variation on SingingData's code sample works for me (tested in a CRAN 3.1.0 module):
usage <- data.frame(list(Date1 = c('8/6/2015', '8/20/2015', '7/9/2015'),
Date2 = c('4/16/2015 0:00', '7/1/2015 0:00', '7/1/2015 0:00')))
usage$Date1 <- as.POSIXlt(usage$Date1, "%m/%d/%Y",tz="GMT")
usage$Date2 <- as.POSIXlt(usage$Date2, "%m/%d/%Y",tz="GMT")
usage$Date1 <- format(usage$Date1, "%m/%d/%Y")
usage$Date2 <- format(usage$Date2,"%m/%d/%Y")
usage$Date1 <- as.Date(usage$Date1, "%m/%d/%Y")
usage$Date2 <- as.Date(usage$Date2, "%m/%d/%Y")
maml.mapOutputPort("usage");
I've used as.POSIXlt() instead of as.POSIXct(). I hope that this helps unblock your work in R.
I am working with data from csv files that will all look the same so I am hoping to come up with a code that can be easily applied to all of them.
However, sadly enough I am failing at step one :-(.
The csv files have the date and time saved in one column, so when I import them with read.csv that column gets read as a chr. How can I most easily convert this into a date that I then can use for plotting and analysis?
Here is what I tried:
load the data --> will save the date and time as chr under mydata$Date.Time (e.g. 1/1/15 0:00)
mydata<-read.csv(file.choose(), stringsAsFactors = FALSE,
strip.white = TRUE,
na.strings = c("NA",""), skip=16,
header=TRUE)
separate the Date.Time into Date and Time:
new <- do.call( rbind , strsplit( as.character( mydata$Date.Time ) , " " ) )
add these two back to the df mydata:
cbind( mydata , Date = new[,2] , Time = new[,1] )
convert Date into a date format via as.Date:
mydata$Date <- as.Date(new[,1], format="")
So this works fine for the date however I am stuck with the time, I tried this:
mydata$Time <- format(as.POSIXct(new[,2], format="%H:%M"))
this gives me the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I wonder if there is a smarter way of doing this? Reading in time and date seems to be one of the substantial tasks that I would like to understand. Is there a way of R directly recognizing the date and time from the csv? Or is it generally smarter to generate a time vector by its own, if so how would I do that?
Thanks so much for your help.
Sandra
If you want to use time only, consider using the chron package:
library(chron)
mytime <- times("21:19:37")
or in your case
times(new[,2])
assuming that that's a character vector.
I tried the chron approach but it wouldn't work for me :-(.
So what I ended up doing is just creating a time vector for the period that I am loading the data in for:
date <-seq(as.POSIXct("2015/1/1 00:00"), as.POSIXct("2015/1/31 23:00"), "hours")
and then adding it back to the df.
Not what I wanted but it will work until I find the ultimate solution :-)
Ok. I´ve tried several foruns and threads, but I couldn't find this. I imported my database to R using this:
teste <- read.zoo("bitcoin2.csv", header=TRUE, sep=",", format = "%m/%d/%Y")
Which worked fine. My xyplot gave me the right plot. So I tried to convert it to ts in order to use strucchange and other outlier/breakpoints packages.
aba <- as.ts(zoo(z$Weighted_Price))
When I did it, it seems to have been lost the index time. The plot still has the same shape, but the X-axis doesn't look as a regular time series plot.
Anyway, I´ve tried the strucchange. After loading it, I made this simple test:
test<-breakpoints(teste$Weighted_Price~1)
But R returned me:
Error in my.RSS.table[as.character(i), 3:4] <- c(pot.index[opt], break.RSS[opt]) :
replacement has length zero
I presume my mistake is that the coercion from zoo to ts was not correct. Any help would be great.