Error in changing data into Date in [R] - r

I have a problem in converting a vector into date one by using as.Date.
Data is as below.
> new3<-read.csv("Total Load - Day Ahead _ Actual.csv",stringsAsFactors=F)
> colnames(new3)<- c("Date","Hour","Dayahead","Actual")
> str(new3)
'data.frame': 35044 obs. of 4 variables:
$ Date : chr "01-01-2015" "01-01-2015" "01-01-2015" "01-01-2015" ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead: chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
...
Here, I tried as.Data
new3$Date<-as.Date(new3$Date,"%d/%m/%Y")
The order of d,m,Y is right. But when I do this, it shows me NA in date info as below
> str(new3)
'data.frame': 35044 obs. of 4 variables:
$ Date : Date, format: NA NA NA NA ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead: chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
...
I don't know what to do to fix it.
Can anyone help me out here? Thank you

The step doesn't seem right
new3$Date<-as.Date(new3$Date,"%d/%m/%Y")
You should try using
new3$Date<-as.Date(new3$Date,"%d-%m-%Y")
The separator for date in your date seems to be - and not /
I'll suggest looking into lubridate package as well. It allows you easy ways to convert date from character to date format.

Related

R converts character variables into integer variables automatically

I have a data file that contains several character variables that only consist of numbers. They need to remain character variables as some of them start with a 0 and when converting to integer/numeric, the leading zeros are cut-off. For some strange reason, when I use fwrite to save my data file as csv and then open it again with fread, the character variables that only consisted of numbers are suddenly integer variables. How can I keep R from doing this?
> str(Dataset_Master)
Classes ‘data.table’ and 'data.frame': 12178669 obs. of 4 variables:
$ Date_of_goods_arrival_at_the_customer: int 20160527 20160527 20160527...
$ Sales_document : chr "0505399186" "0505435949"...
$ Warehouse : chr "8150" "8150" "8150" "8150" ...
$ Sold_to_country : chr "DE" "DE" "DE" "DE" ...
- attr(*, ".internal.selfref")=<externalptr>
> ##Save document
> fwrite(Dataset_Master, "Dataset_Master_3.csv")
> ##Load data
> Dataset_Master <- fread("Dataset_Master_3.csv")
|--------------------------------------------------|
|==================================================|
> str(Dataset_Master)
Classes ‘data.table’ and 'data.frame': 12178669 obs. of 4 variables:
$ Date_of_goods_arrival_at_the_customer: int 20160527 20160527 20160527...
$ Sales_document : int 505399186 505435949 505435949...
$ Warehouse : int 8150 8150 8150 8150 8150 8150...
$ Sold_to_country : chr "DE" "DE" "DE" "DE" ...
- attr(*, ".internal.selfref")=<externalptr>

What does the error "wrong sign in 'by' argument" mean when using the seq function in R?

I have a dataframe called forecast.df:
> str(forecast.df)
Classes ‘data.table’ and 'data.frame': 1027 obs. of 9 variables:
$ group : chr "IT" "IT" "IT" "IT" ...
$ Name : chr "name1" "name1" "name2" "name2" ...
$ position: chr "Specialist" "Specialist" "Analyst" "Analyst" ...
$ job : chr "job1" "job2" "job3" "job4" ...
$ dept : chr "IT" "FIN" "FIN" "P&C" ...
$ bucket : chr "Apr-18" "Apr-18" "Apr-18" "Apr-18" ...
$ start : Date, format: "2018-01-02" "2018-01-02" "2018-01-15" "2018-01-22" ...
$ end : Date, format: "2018-04-06" "2018-01-26" "2018-04-20" "2018-04-06" ...
$ hours : int 149 8 109 123 44 124 125 142 70 75 ...
- attr(*, ".internal.selfref")=<externalptr>
And instead of a start and end date, I am trying to transform it so each row has a single date, and a job that takes 3 days has three rows associated with it (needed for the visualization we are doing.)
The code I am using is this:
tidyForecast.df <- setDT(forecast.df)[ , list(group = group
, name = Name
, position = position
, job = job
, dept = dept
, bucket = bucket
, hours = hours
, date = seq(start
, end
, by = "day"))
, by = 1:nrow(forecast.df)]
And the error I am getting when I use this is:
Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument
I have never encountered this error before, and I used this same format earlier in the code and it worked, so maybe it's something nuanced?
Found what was going wrong; there was a single instance in the 1027 observations where the start date was after the end date. This is why it worked in the past, but stopped working when I used it for new data. The "by" argument was negative because the difference between the two dates was negative.

Plotting time (HMS) with ggplot2

I'm trying to plot a running sessionsI want to make a ggplot with:
x=distance (2.2KM, 5KM, 10KM , 12.8KM, Ziel)
Y= time (HMS)
I have the following data:
'data.frame': 16333 obs. of 6 variables:
$ Numéro : chr "6526" "5427" "6528" "6529" ...
$ X2.2km : chr "00:10:47.4" "00:08:58.2" "00:11:10.4" "00:09:27.3" ...
$ X5km : chr "00:26:05.0" "00:21:46.1" "00:27:13.5" "00:22:35.3" ...
$ X10km : chr "00:56:30.1" "00:45:59.3" "00:58:53.1" "00:47:51.7" ...
$ X12.8km : chr "01:14:24.7" "00:59:50.7" "01:17:35.0" "01:01:42.6" ...
$ Zielzeit: chr "01:37:40.0" "01:16:38.1" "01:41:53.0" "01:19:02.5" ...
the next step is to use melt function from library reshape2 and lubridate
xx<-melt(xx,id="Numéro")
####Using lubridate ####
xx$value<-hms(xx$value)
My problem is here when i try to plot simple graphics, i receive the following message
> ggplot(xx,aes(variable,value))+geom_point()
Error in x < range[1] : cannot compare Period to Duration:
coerce with 'as.numeric' first.
> ggplot(xx,aes(variable,value))+geom_line()
Error in x < range[1] : cannot compare Period to Duration:
coerce with 'as.numeric' first.)
DATASET
xx <- read.table(header=TRUE, text="
Numéro variable value
1 6526 X2.2km 10M 47.4S
2 5427 X2.2km 8M 58.2S
3 6528 X2.2km 11M 10.4S
4 6529 X2.2km 9M 27.3S
5 6530 X2.2km 8M 29.3S")
Thank for any kind of contributions .

How do I combine two data frames with different row lengths?

I have two data sets:
str(a)
'data.frame': 525930 obs. of 3 variables:
$ reg_code : int 11542359 10077860 10050401 10988998 11465162 10933454 11170863 11291673 12086780 10248250 ...
$ begin_date: Date, format: "2008-10-01" "1994-06-01" ...
$ pair_id : chr "115423591" "100778601" "100504011" "109889981" ...
str(b)
'data.frame': 618655 obs. of 3 variables:
$ reg_code: int 10077860 10050401 10988998 11465162 10933454 11170863 11291673 10248250 10998100 10837319 ...
$ end_date: Date, format: "2006-03-09" "2000-11-16" ...
$ pair_id : chr "100778601" "100504011" "109889981" "114651621" ...
merge:
abc<-merge(x=df1,y=df2,by="id")
but it is throwing an error:
Error in data.frame(..., check.names = FALSE) :
arguments imply
differing number of rows:15930, 28655, 1
This might seem silly, but just to confirm, are you trying to merge based on "pair_id"? It looks like you're using "id" for the by argument.
If you're simply trying to add one to the other and they have the same columns, you can use rbind().

read.zoo works but then as.xts fails with "currently unsupported data type"

I've a csv file of daily bars, with just two lines:
"datestamp","Open","High","Low","Close","Volume"
"2012-07-02",79.862,79.9795,79.313,79.509,48455
(That file was an xts that was converted to a data.frame then passed on to write.csv)
I load it with this:
z=read.zoo(file='tmp.csv',sep=',',header=T,format = "%Y-%m-%d")
And it is fine as print(z) shows:
Open High Low Close Volume
2012-07-02 79.862 79.9795 79.313 79.509 48455
But then as.xts(z) gives: Error in coredata.xts(x) : currently unsupported data type
Here is the str(z) output:
‘zoo’ series from 2012-07-02 to 2012-07-02
Data:List of 5
$ : num 79.9
$ : num 80
$ : num 79.3
$ : num 79.5
$ : int 48455
- attr(*, "dim")= int [1:2] 1 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Open" "High" "Low" "Close" ...
Index: Date[1:1], format: "2012-07-02"
I've so far confirmed it is not that 4 columns are num and one column is int, as I still get the error even after removing the Volume column. But, then, what could that error message be talking about?
As Sebastian pointed out in the comments, the problem is in the single row. Specifically the coredata is a list when read.zoo reads a single row, but something else (a matrix?) when there are 2+ rows.
I replaced the call to read.zoo with the following, and it works fine whether 1 or 2+ rows:
d=read.table(fname,sep=',',header=T)
x=as.xts(subset(d,select=-datestamp),order.by=as.Date(d$datestamp))

Resources