I'm trying to use Bayesian VAR, but I can't even get my data right properly. I get them from https://sdw.ecb.europa.eu/ but since a lot of them are quarterly data I have a problem to merge my variables since I'm unable to convert for example "2020-Q1" from char to date with as.Date.
I used the sub function to get 2020-1 for example and then tried as.Date(, format="%Y-%q) but it doesn't work, so I'm stuck.
textData <- "yearQuarter,Amount
2019-Q1,1000
2019-Q2,2000
2019-Q3,3000"
df <- read.csv(text=textData,header = TRUE,stringsAsFactors = FALSE)
as.Date(df$yearQuarter,format="%Y-%q")
...which produces:
> as.Date(df$yearQuarter,format="%Y-%q")
[1] NA NA NA
Thank you for your help !
library(lubridate)
d = yq("2020-Q1")
d
# [1] "2020-01-01"
year(d)
# [1] 2020
quarter(d)
# [1] 1
Related
I am trying to convert a vector of the following form:
data$Time[1:10]
[1] 0:00.00 0:00.01 0:00.02 0:00.03 0:00.04 0:00.05 0:00.06 0:00.07 0:00.08 0:00.09
573394 Levels: 0:00.00 0:00.01 0:00.02 0:00.03 0:00.04 0:00.05 0:00.06 0:00.07 0:00.08 0:00.09 0:00.10 0:00.11 0:00.12 0:00.13 0:00.14 ... 9:59.99
notice that this is a factor form
class(data$Time)
factor
I've tried the following
hms(data$Time[1:10])
[1] "0S" "1S" "2S" "3S" "4S" "5S" "6S" "7S" "8S" "9S"
it sees the 1/100 of a second as a second! same thing for
period_to_seconds(hms(data$Time[1:10]))
[1] 0 1 2 3 4 5 6 7 8 9
I need to be able to extract the time (with the require accuracy) to be able to subtract and calculate periods. Notice that these files will extend to few hours. So a solution that is good for HH:MM:SS.00 will be appreciated
another approach that only works if you have data that is either H M S or M S solely is the following:
Test <- c('03:5.05', '1:03.05.05')
tmp <- strptime(as.character(Test),"%H:%M:%OS")
tmp
[1] NA NA
tmp <- strptime(as.character(Test),"%M:%OS")
tmp
[1] "2016-04-30 00:03:05.05 CDT" "2016-04-30 00:01:03.05 CDT
(The hours had to be removed)
## set option to use digits for seconds
options(digits.secs = 2)
## convert your factor to a string and then to Posix format
tmp <- strptime(as.character(data$Time),'%H:%M:%OS')
## convert it to a numeric (unit seconds)
as.numeric(strftime(tmp,'%OS'))+60*as.numeric(strftime(tmp,'%M'))+60*60*as.numeric(strftime(tmp,'%H'))
There is a ms function in lubridate package to read only the minutes and seconds.
Test <- c('0:00.02', '9:59.99')
library(lubridate)
Test %>% ms() %>% period_to_seconds()
[1] 0.02 599.99
Based on Jorg's answer. I think I was able to solve my problem. The files I am working with extend for few hours (with each point representing 0.01 sec). So I split the vector (data$Time) and applied the MS script for the first 360000 points and the HMS script for what following:
options(digits.secs = 2)
tmp1 <- strptime(as.character(data$Time[1:360000]),"%M:%OS")
tmp2 <- strptime(as.character(data$Time[-(1:360000)]),"%H:%M:%OS")
tmp1_numeric <-as.numeric(strftime(tmp1,'%OS'))+60*as.numeric(strftime(tmp1,'%M'))+60*60*as.numeric(strftime(tmp1,'%H'))
tmp2_numeric <-as.numeric(strftime(tmp2,'%OS'))+60*as.numeric(strftime(tmp2,'%M'))+60*60*as.numeric(strftime(tmp2,'%H'))
tmp_numeric <- c(tmp1_numeric, tmp2_numeric)
I have a raster stack/brick in R containing 84 layers and I am trying to name them according to year and month from 199911 to 200610 (November 1999 to October 2006). However for some reason R keeps adding an "X" onto the beginning of any names I give my layers.
Does anyone know why this is happening and how to fix it? Here are some of the ways I've tried:
# Import raster brick
rast <- brick("rast.tif")
names(rast)[1:3]
[1] "MonthlyRainfall.1" "MonthlyRainfall.2" "MonthlyRainfall.3"
## Method 1
names(rast) <- paste0(rep(1999:2006, each=12), 1:12)[11:94]
names(rast)[1:3]
[1] "X199911" "X199912" "X20001"
## Method 2
# Create a vector of dates
dates <- format(seq(as.Date('1999/11/1'), as.Date('2006/10/1'), by='month'), '%Y%m')
dates[1:3]
[1] "199911" "199912" "200001"
# Set names
rast <- setNames(rast, dates)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
## Method 3
names(rast) <- paste0("", dates)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
## Method 4
substr(names(rast), 2, 7)[1:3]
[1] "199911" "199912" "200001"
names(rast) <- substr(names(rast), 2, 7)
names(rast)[1:3]
[1] "X199911" "X199912" "X200001"
To some extent I have been able to work around the problem by adding "X" to the beginning of some of my other data but now its reached the point where I can't do that any more. Any help would be greatly appreciated!
R won't allow the column to begin with a numeral so it prepends a character to avoid that restriction.
I have a data (df) in this format. I need to covert the Time stamp (tweetCreatedAt) into a date object so that I can further manipulate the data.
tweetCreatedAt comment_text
1 2014-05-17T00:00:49.000Z #truthout: India Elects Hard-Right Hindu
2 2014-05-17T00:00:49.000Z Narendra Modi is welcome to visit US !
Any help?
I have tried the following
df[,1] <- lapply(df[,1],function(x) as.POSIXct(x, '%Y-%m-%dT%H:%M:%S'))
But now I'm getting the dates only and not the actual time.
Not sure if this is the problem, but it's a possible one.
As I've mentioned in my comment, the elements of a column could be values, or lists, due to the process that generated this dataset.
Check this example:
# simplified example
dt = read.table(text = "tweetCreatedAt comment_text
1 2014-05-17T00:00:49.000Z #truthout
2 2014-05-19T00:00:49.000Z Narendra", header=T)
dt$tweetCreatedAt = as.character(dt$tweetCreatedAt)
# data set looks like
dt
# tweetCreatedAt comment_text
# 1 2014-05-17T00:00:49.000Z #truthout
# 2 2014-05-19T00:00:49.000Z Narendra
as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')
# [1] "2014-05-17 00:00:49 BST" "2014-05-19 00:00:49 BST"
# let's manually change this element to a list
dt$tweetCreatedAt[2] = list(c("2014-05-19T00:00:49.000Z","2014-05-20T00:00:49.000Z"))
# data set now looks like this
dt
# tweetCreatedAt comment_text
# 1 2014-05-17T00:00:49.000Z #truthout
# 2 2014-05-19T00:00:49.000Z, 2014-05-20T00:00:49.000Z Narendra
as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')
# Error in as.POSIXct.default(dt$tweetCreatedAt, format = "%Y-%m-%dT%H:%M:%S") :
# do not know how to convert 'dt$tweetCreatedAt' to class “POSIXct”
I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.
How to read the following vector "c" of strings into a list of tables? Which way is the shortest read.table strsplit? e.g. I cant see how to read the table Edit:c[4:6] a[4:6] in one command.
require(car)
m<-matrix(rnorm(16),4,4,byrow=T)
a<-Anova(lm(m~1),type=3,idata=data.frame(treatment=factor(1:4)),idesign=~treatment)
c<-capture.output(summary(a,multivariate=F))
c
This returns lines 4:6
c[4:6]
Now if you wanted to parse this I would do it in two steps. First on the column values from rows 5:6 and then add back the names.
> vals <- read.table(text=c[5:6])
> txt <- " \t SS\t num Df\t Error SS\t den Df\t F\t Pr(>F)"
> names(vals) <- names(read.delim(text=txt))
> vals
X SS num.Df Error.SS den.Df F Pr..F.
1 (Intercept) 0.57613392 1 0.4219563 3 4.09616 0.13614
2 treatment 1.85936442 3 8.2899759 9 0.67287 0.58996
EDIT --
you could look at the source code of the summary function and calculate the quantities required by yourself
getAnywhere(summary.Anova.mlm)
The original idea seems not to work.
c2 <- summary(a)
# find out what 'properties' the summary object has
# turns out, it is just the Anova object
class(c2) <- "list"
names(c2)
This returns
[1] "SSP" "SSPE" "P" "df" "error.df"
[6] "terms" "repeated" "type" "test" "idata"
[11] "idesign" "icontrasts" "imatrix" "singular"
and we can get access them
c2$SSP
c2$SSPE
It seems not a good idea to use R internal c function as a variable name