Subset xts object using variables for start and end periods - r

I have a xts object called 'usagexts' with dates from 01 Oct 15 to 31 Mar 18. I want to create 3 subsets of this object for the periods 01 Oct 15 to 31 Mar 16, 01 Oct 16 to 31 Mar 17 and 01 Oct 17 to 31 Mar 18 without actually hardcoding the dates as these will changes as time goes on.
The object structure is like so :
dateperiod,usageval
2015-10-01,21542
2015-10-02,21572
2015-10-03,21342
...
...
2018-03-31,20942
I have another data frame called 'periodvalues' like so :-
startdate,enddate, periodtext
2015-10-01,2016-03-31,1510_1603
2016-10-01,2017-03-31,1610_1703
2017-10-01,2018-03-31,1710_1803
I want to be able to create 3 xts objects like so :-
usagexts_1510_1603 -> xts object containing usage details for relevant period
usagexts_1610_1703 -> xts object containing usage details for relevant period
usagexts_1710_1803 -> xts object containing usage details for relevant period
I only got as far as creating a list of size 3 containing the periodtext from the above data frame. I was trying to somehow specify the start and end period for the xts object using the "objectname fromdate/todate" structure through variables but it didn't work - something like so :
usagexts_1610_1703 <- usagexts[var1/var2]
The LHS came from the list and the variables on the RHS cames from variable defintion done prior.
usagexts_1610_1703 <- usagexts[var1/var2]
Expected results should be like so :
usagexts_1510_1603 <- usagexts["2015-10-01/2016-03-31"]
usagexts_1610_1703 <- usagexts["2016-10-01/2017-03-31"]
usagexts_1710_1803 <- usagexts["2017-10-01/2018-03-31"]
Any assistance on that shall be highly valued.
Best regards
Deepak

If var1 and var2 are variables, then the filter string can be specified using paste as:
usagexts[paste(var1, var2, sep="/")]

Related

Error: object not found in R. Headers not naming from .csv file

I am new to R and I keep getting inconsistent results with trying to display a column of data from a csv. I am able to import the csv into R without issue, but I can't call out the individual columns.
Here's my code:
setwd('mypath')
cdata <- read.csv(file="cendata.csv",header=TRUE, sep=",")
cdata
This prints out the following:
year pop
1 2010 2,775,332
2 2011 2,814,384
3 2012 2,853,375
4 2013 2,897,640
5 2014 2,936,879
6 2015 2,981,835
7 2016 3,041,868
8 2017 3,101,042
9 2018 3,153,550
10 2019 3,205,958
When I try to plot the following, the columns cannot be found.
plot(pop,year)
Error: object 'pop' not found
I even checked if the column names existed, and only data shows up.
ls()
[1] "data"
I can manually enter the data and label them "pop" and "year" but that kind of defeats the point of importing the csv.
Is there a way to label each header as an object?
year and pop are not independent objects. You need to refer them as part of the dataframe you have imported. Also you might need to remove "," from the numbers to turn them to numeric before plotting. Try :
cdata$pop <- as.numeric(gsub(',', '', cdata$pop))
plot(cdata$year, cdata$pop)

r how to convert possible character variables

I am new to R and reading 19 variables (Import_1 to Import_19) from a CSV file x <- (as.data.frame(final_data[,c(15:33)]))
When I summarized one variable I got following display (possibly character variable)
Import_1
EXTREMELY IMPORTANT-10:177
09 :176
08 : 89
07 : 45
06 : 15
05 : 6
04 : 3
03 : 3
02 : 3
NOT AT ALL IMPORTANT-01 : 2
Now I need to convert these 19 variables into numeric 1-10 values, so that I can do regression. Let me know how can I do that.
You can convert variables to numeric using the functions as.numeric, as.double, as.integer. See this Introduction to R DataTypes to get you started.

Monthly operations time series with apply.monthly in R

The problem is to use apply.monthly or any other similar function to do monthly operations with a dataset. The data I have looks like the following:
> minidata[1:10,]
date Month Year TMIN
1 1948-01-01 Jan 1948 1.1
2 1948-01-02 Jan 1948 7.2
3 1948-01-03 Jan 1948 5.0
4 1948-01-04 Jan 1948 9.4
5 1948-01-05 Jan 1948 4.4
> tail(minidata)
date Month Year TMIN
54 1948-02-23 Feb 1948 2.8
55 1948-02-24 Feb 1948 -0.6
56 1948-02-25 Feb 1948 1.7
57 1948-02-26 Feb 1948 2.8
58 1948-02-27 Feb 1948 4.4
59 1948-02-28 Feb 1948 3.3
Task, use my own function to produce the monthly mean:
mymean <- function(date){
for (j in 1:days_in_month(date)){
avg = (1/(days_in_month(date))
*sum(minidata$TMIN[1:days_in_month(date)])}
return(avg)
}
The result must be the same as the R function in the xts package:
dat.xts <- xts(x= minidata$TMIN,order.by = minidata$date)
> apply.monthly(dat.xts,mean)
[,1]
1948-01-31 2.312903
1948-02-28 2.082143
My function outputs the correct values:
> mymean(minidata$date[1])
Jan
2.312903
> mymean(dat.xts[1])
Jan
2.312903
I wouldn't mind if $apply.monthly$ generated a new column with the means, but I have to use my own function! (This is an example, in reality my function is a lot harder).
I tried:
> apply.monthly(dat.xts,function(dat.xts) mymean(dat.xts))
Error in coredata.xts(x) : currently unsupported data type
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Thanks!
Update: days_in_month can be found in the lubridate package. It calculates the number of days in a given month
Your function is the issue, not apply.monthly. I don't know where the days_in_month function is defined, but it probably doesn't work with xts objects. I assume it expects a date-time class.
And your mymean function references an object that isn't passed to it, which is not good practice because it makes R search for minidata.
Your function should expect an xts object containing a month of data and operate only on that data, not some object outside the function scope. For example:
mymean <- function(Data) {
days <- days_in_month(index(Data)[1])
avg <- (1/days) * sum(Data$Close)
return(avg)
}
require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
apply.monthly(x, mymean)
To perform operations within groups of a data frame, you can use the dplyr package. For instance, to get the average TMIN within each group:
library(dplyr)
summarize(group_by(minidata, Month), mean = mean(TMIN))
This is often written as:
minidata %>% group_by(Month) %>%
summarize(mean = mean(TMIN))
Your function works only on data frames, the xts object is different and isn't going to work the way you want. That is why it is giving you errors.
Besides that, you do not want to do this with a loop. It is going to take much longer than many other ways of doing it.
David's answer (use dplyr::group_by and dplyr::summarize) is the best way to handle this. You can use a custom function in the summarize if that is what the problem is. Just define your function and use it there.

Having trouble with R's time series objects

I have a column of 84 monthly expenditures from 1/2004 - 12/2010, which in Excel looks like...
12247815.55
11812697.14
13741176.13
21372260.37
27412419.28
42447077.96
55563235.3
45130678.8
54579583.53
43406197.32
34318334.64
25321371.4
...(74 more entries)
I am trying to run an stl() from the forecast package on this series, and so I load the data:
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE),
start=c(2004,1),
end=c(2010,12),
frequency = 12)
(If I do header=FALSE it will absorb the first entry - 122...- as the header for the second column, and name the first column's header 'X')
But instead of my environment being populated with a Time Series Object from 2004 to 2011 (as it has said before) it simply says ts[1:84, 1].
Probably related is the fact that,
fit <- stl(d)
throws
Error in stl(d) : only univariate series are allowed.
despite the fact that
head(d)
[1] 12247816 11812697 13741176 21372260 27412419 42447078
and
d
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
2004 12247816 11812697 13741176 21372260 27412419 42447078 55563235 45130679 54579584 43406197
("years 2005-2010 look exactly the same, and all rows have columns for Jan-Dec; it just doesn't fit on here neatly - just trying to show the object has taken the ts labeling structure.")
What am I doing wrong? As far as I know this is the same way I have been building my time series objects in the past...
read.csv reads in a matrix. If it only has one column, it is still a matrix. To make it a vector use
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE)[,1],
start=c(2004,1),
end=c(2010,12),
frequency = 12)
Also, please check your facts. stl is in the stats package, not the forecast package. This is easily checked by using help(stl).

Bootstrapping: Error in statistic(data, original, ...) : unused argument(s) (original)

I have a database of position estimates, and want to calculate monthly kernel utilization distributions. I can do this using the adehabitat package in R, but I would like to estimate 95%confidence intervals for these values using bootstrapping that samples from the database.
Today I've been experimenting with the boot package, but I am still fairly new to R and am needing some more expert help!
The main error message I'm getting is:
Error in statistic(data, original, ...) : unused argument(s) (original)
Here is a look at the file I've been using:
head(all)
Num Hourbin COA_Lat COA_Lon POINT_X POINT_Y month year id
1 07/10/2010 15:00 48.56225 -53.89144 729339.9 5383461 October 2010 29912
2 07/10/2010 16:00 48.56254 -53.89121 729355.7 5383495 October 2010 29912
4 07/10/2010 18:00 48.56225 -53.89144 729339.7 5383461 October 2010 29912
5 07/10/2010 19:00 48.56225 -53.89144 729339.9 5383461 October 2010 29912
6 07/10/2010 20:00 48.56225 -53.89144 729339.8 5383461 October 2010 29912
7 07/10/2010 21:00 48.56225 -53.89144 729339.9 5383461 October 2010 29912
With columns 5 and 6 being the X and Y positions respectively. I subset this dataset for different months (ie getting files named "oct","nov",etc). I have tried setting up the kernelUD function in the adehabitat package to be a function that I can call up for bootstrapping, but have had no luck so far.
kUDoct<-function(i) kernel.area(oct[,5:6],oct[,10],kern="bivnorm",unin=c("m"),unout=c("km2"))
bootoct<-boot(oct,kUDoct,R=1000)
Error in statistic(data, original, ...) : unused argument(s) (original)
Any help would be greatly appreciated!
M
Well, a problem that you're having is that you aren't using the boot function as the documentation direct you to. From ?boot we see that the second argument, statistic is:
A function which when applied to data returns a vector containing the
statistic(s) of interest. When sim = "parametric", the first argument
to statistic must be the data. For each replicate a simulated dataset
returned by ran.gen will be passed. In all other cases statistic must
take at least two arguments. The first argument passed will always be
the original data. The second will be a vector of indices, frequencies
or weights which define the bootstrap sample.
Note that this means your function should be defined to take at least two arguments. Your accepts only one (and then ignores it completely, oddly enough).
The idea is that you pass in your original data, and a vector of indicies. Then you calculate your statistic of interest by subsetting your original data using those indicies, which will constitute a "bootstrap sample".
So instead of this:
kUDoct<-function(i) kernel.area(oct[,5:6],oct[,10],kern="bivnorm",unin=c("m"),unout=c("km2"))
bootoct<-boot(oct,kUDoct,R=1000)
You'd probably want to do something more like this:
kUDoct<-function(dat,ind) kernel.area(dat[ind,5:6],dat[ind,10],kern="bivnorm",unin=c("m"),unout=c("km2"))
bootoct<-boot(oct,kUDoct,R=1000)
But I can't diagnose any other errors you may get, as your example isn't entirely reproducible.

Resources