plot same data two different ways, get different results (lattice xyplot) - r

I am trying to produce a scatter plot of some data. I do so in two different ways, as shown in code below (most of the code is just arranging data, the only graphing part is at the bottom). One uses a direct reference to the variables in the workspace, and the other arranges the data into an xts object first and then uses column indices to refer to them.
The resulting scatter plots are different, even though I have checked that the source data is the same in both ways.
I am wondering why these plots are different, thanks in advance.
# Get data
# =============
library('quantmod')
# Set monthly time interval
StartPeriod = paste0("1980-01")
EndPeriod = paste0("2014-07")
DateString = paste0(StartPeriod,"/", EndPeriod)
# CPI (monthly)
getSymbols("CPIAUCSL", src="FRED")
# QoQ growth, Annualized
CPIAUCSL = ((CPIAUCSL/lag(CPIAUCSL))^4-1)*100
CPIAUCSL = CPIAUCSL[DateString]
# Oil prices (monthly)
getSymbols(c("MCOILWTICO"), src="FRED")
# QoQ growth, annualized
MCOILWTICO = ((MCOILWTICO/lag(MCOILWTICO))^4-1)*100
MCOILWTICO = MCOILWTICO[DateString]
# Produce plots
# ===============
library('lattice')
# Method 1, direct reference
xyplot(CPIAUCSL~lag(MCOILWTICO,1), ylim=c(-5,6),
ylab="CPI",
xlab="Oil Price, 1 month lag",
main="Method 1: Inflation vs. Lagged Oil Price",
grid=TRUE)
# Method 2, refer to column indices of xts object
basket = merge(CPIAUCSL, MCOILWTICO)
xyplot(basket[ ,1] ~ lag(basket[ ,2],1), ylim=c(-5, 6),
ylab="CPI",
xlab="Oil Price, 1 month lag",
main="Method 2: Inflation vs. Lagged Oil Price",
grid=TRUE)
# Double check data fed into plots is the same
View(merge(CPIAUCSL, lag(MCOILWTICO,1)))
View(merge(basket[ ,1], lag(basket[ ,2],1))) # yes, matches

Method 1 is definitely incorrect as it will pair points 6 years apart! For instance, CPIAUCSL[3] is the data for 1980-03-01, while lag(MCOILWTICO,1)[3] corresponds to 1986-03-01 - however, on the scatterplot they will be paired! In contrast, basket[ ,1][3] and basket[ ,2][3] both belong to 1980-03-01.
(Your double check didn't show the problem, because there you used merge - as opposed to Method 1! - which solves the problem.)

Related

Traminer: Mean time barplot with number of observations

Because I am still new to TraMineR, my problem may seem trivial to most of you. I'm working on meantime plots with my data and would I like to plot on the bar charts the mean time spent in different states. is there a command in TramineR?
The option to add bar labels on the mean time plot has been implemented in version TraMineR v 2.2-3. The option is available through the arguments bar.labels, cex.barlab, and offset.barlab of the plot method for the outcome of seqmeant. These arguments can be passed as ... arguments to seqmtplot. In this latter case, when groups are specified, bar.labels should be a matrix with the labels for each group in columns.
I show, using the actcal data, how to display the meant times over the bars. The group is here sex, but can of course be your clusters.
library(TraMineR)
data(actcal)
## We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal),300),]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)
group <- factor(actcal$sex)
blab <- NULL
for (i in 1:length(levels(group))){
blab <- cbind(blab,seqmeant(actcal.seq[group==levels(group)[i],]))
}
seqmtplot(actcal.seq, group=group,
bar.labels = round(blab,digits=2), cex.barlab=1.2)

Black Scholes surface vs Time to Maturity and Strike Price in R

I've just found out the "wireframe" function in R to plot 3D-surface graphs.
I wish to implement it by plotting the Black&Scholes Call Option price against two sequence of data: Time to Maturity and Strike Price. So, first of all here follows my script so far:
S=100 #My stock Price
K=90 #Initial Strike Price
T=1 #Initial Time to Maturity (1 year here)
RF=0.03 #Risk free rate
SIGMA=0.2 #Volatility
d1=(log(S/K) + (RF + 0.5*SIGMA^2)*T)/SIGMA*sqrt(T) #initial d(1)
d2=d1-SIGMA*sqrt(T) #initial d(2)
Then I tried to prepare a grid for my surface/3D plot:
K=seq(80,120,1)
T=seq(0,1,0.1)
table=expand.grid(K,T)
Last step, I add a new column variable for computing my Call price according to every single combination:
table$CALL= S*pnorm(d1) - K*exp(-RF*T)*pnorm(d2)
names(table)= c("K","T","CALL")
Finally the surface/3D plot:
wireframe(CALL ~ K * T, scales = list( arrows = FALSE),aspect = c(1, .6),data=table,
drape=T,shade=T)
So, it plots an apparently reliable graph (according to my finance study) but...I don't know, it looks a bit "scale-step" graph. As I'm a newbye in "wireframe" function, I don't know if I properly used all input data. I'd love an opinion to someone who already used to plot B&S formula in a 3D plot. I'm interested because I'd do the same to plot Greeks and Implied Volatility in the future.
Thanks in advance

X axis in DateTime format in R script/plot

I am trying to build a forecast plot in R. But, inspite of trying many solutions I am unable to plot my X axis in dates.
My data is in the form of :
Datetime(MM/DD/YYY) ConsumedSpace
01-01-2015 2488
02-01-2015 7484
03-01-2015 4747
Below is the forecast script I am using:
library(forecast)
library(calibrate)
# group searches by date
dataset <- aggregate(ConsumedSpace ~ Date, data = dataset, FUN= sum)
# create a time series based on day of week
ts <- ts(dataset$ConsumedSpace, frequency=6)
# pull out the seasonal, trend, and irregular components from the time series (train the forecast model)
decom <- stl(ts, s.window = "periodic")
#predict the next 7 days of searches
Pred <- forecast(decom)
# plot the forecast model
plot(Pred)
#text(Pred,ts ,labels = dataset$ConsumedSpace)
The output looks like this-- as you can see I have X axis displayed is periods(numbers) rather than in data format.
Any help is highly appreciated.
Try to enter explicit specifications in your plot : plot(x=Date, ...)
if it does not work try :
timeline<-seq(from=your.first.date, to=your.last.date, by="week")
plot(x=...,y=..., xlab=NA, xaxt="n") # no x axis
axis.Date(1, at=(timeline), format=F, labels=TRUE) # Special axis
Edit :
Sorry for my first solution, which does not fit for your timeserie. The problem is there is no date is time series, but an index refering to "start" and "frequency". Here, your problem comes from your use of "frequency", which is supposed to specify the number of observations by unit of time, ie 4 for quarterly data, 12 for monthly data... Here your unit of time is the week, with 6 open days, that's why your graph axes indicates the index ok the weeks. To have a more readable axis you can try this :
dmin<-as.Date("2015-01-01") # Starting date
# Dummy data
ConsumedSpace=rep(c(5488, 7484, 4747, 4900, 4747, 6548, 6548, 7400, 6300, 8484, 5161, 6161),2)
ts<-ts(ConsumedSpace, frequency=6)
decom <- stl(ts, s.window = "periodic")
Pred <- forecast(decom)
plot(Pred, xlab=NA, xaxt="n") # Plot with no axis
ticks<-seq(from=dmin, to= dmin+(length(time(Pred))-1)*7, by = 7) # Ticks sequency : ie weeks label
axis(1, at=time(Pred), labels=ticks) # axis with weeks label at weeks index
You have to use a 7 interval for weeks labels because of the closed day.
It's ugly but it works. There is surely a better way looking closely at your ts() to specify those data are daily data, and adapting your forecasting function.

How to apply splom() function in order to create multiple correlation pairwise plots?

I have already asked similar question on how to create the following figure:
I was suggested to use splom() function but I do not know how to apply it on my data. I saw examples of splom() function which can be found here and here, but due to my low programming skills I am not able to apply it.
I have 24 time series, belonging to 4 independent groups (4 Pirwise correlation plots).
4 Groups:
1) Frequency = 1 Min. , with belonging time series: AAPL_1m, MSFT_1m, INTC_1m, FB_1m, MU_1m, IBM_1m.
2) Frequency = 2 Min. , with belonging time series: AAPL_2m, MSFT_2m, INTC_2m, FB_2m, MU_2m, IBM_2m.
3) Frequency = 5 Min. , with belonging time series: AAPL_5m, MSFT_5m, INTC_5m, FB_5m, MU_5m, IBM_5m.
4) Frequency = 10 Min. , with belonging time series: AAPL_10m, MSFT_10m, INTC_10m, FB_10m, MU_10m, IBM_10m.
Each pairwise plot should show correlation between time series in each group.
For creation of each individual pairwise plot I used following functions:
pairs(cbind(AAPL_1m, MSFT_1m, INTC_1m, FB_1m, MU_1m, IBM_1m),main="Frequency=1 Min.",font.labels = 2, col="blue",pch=16, cex=0.8, cex.axis=1.5,las=1)
pairs(cbind(AAPL_2m, MSFT_2m, INTC_2m, FB_2m, MU_2m, IBM_2m),main="Frequency = 2 Min.",font.labels = 2, col="blue",pch=16, cex=0.8, cex.axis=1.5,las=1)
pairs(cbind(AAPL_5m, MSFT_5m, INTC_5m, FB_5m, MU_5m, IBM_5m),main="Frequency = 5 Min.",font.labels = 2, col="blue",pch=16, cex=0.8, cex.axis=1.5,las=1)
pairs(cbind(AAPL_10m, MSFT_10m, INTC_10m, FB_10m, MU_10m, IBM_10m),main="Frequency = 10 Min.",font.labels = 2, col="blue",pch=16, cex=0.8, cex.axis=1.5,las=1)
If anyone could suggest how to apply splom() function in order to create mentioned/shown figure it will be greatly appreciated.
Also if there is another more suitable function which can integrate for individual pairwise plots (pairs()) in one single figure, I am eager to apply it.
Some demodata would have been nice to have, but let's generate some first, just for three variables here:
AAPL_1m<-rnorm(1000)
MSFT_1m<-rnorm(1000)
INTC_1m<-rnorm(1000)
AAPL_2m<-rnorm(1000)
MSFT_2m<-rnorm(1000)
INTC_2m<-rnorm(1000)
In order for the splom() to work you would need to generate a grouping variable. Here are 1000 observation from the 1m group, and another 1000 observation from the 2m group. So the grouping variable would be just a simple vector of 1000 1m value and after them 1000 2m values:
group<-c(rep("1m", 1000), rep("2m", 1000))
In your case the grouping variable might be generated as follows:
group<-c(rep("1m", length(AAPL_1m)), rep("2m", length(AAPL_2m)))
After you have the grouping variable, you might want to bind everything into a sinle dataframe as follows:
dat<-data.frame(AAPL=c(AAPL_1m, AAPL_2m), MSFT=c(MSFT_1m, MSFT_2m), INTC=c(INTC_1m, INTC_2m), group=group)
Once you have a single data frame with the grouping variable giving the groups of observations, you can plot the scatterplot matrices:
library(lattice)
# Three first columns of the data plotted conditional on the grouping
splom(~dat[,1:3]|group)
The resulting plot should appear roughly as follows:
This would need to be generalized to your four batches of data, but it should be straighforward (just generate grouping for four batches, and bind all four separate batches of together). Function splom() also has many more arguments that you can use for, e.g., making the plot prettier.
JTT gave an accurate explanation on how splom() should be applied for this problem. Following code represents extension of JTT's code applied to the problem.
group<-c(rep("Frequency = 1 Min.", length(AAPL_1m)),
rep("Frequency = 2 Min.", length(AAPL_2m)),
rep("Frequency = 5 Min.", length(AAPL_5m)),
rep("Frequency = 10 Min.", length(AAPL_10m)))
dat<-data.frame(AAPL=c(AAPL_1m, AAPL_2m, AAPL_5m, AAPL_10m),
MSFT=c(MSFT_1m, MSFT_2m, MSFT_5m, MSFT_10m),
INTC=c(INTC_1m, INTC_2m, INTC_5m, INTC_10m),
FB=c(FB_1m, FB_2m, FB_5m, FB_10m),
MU=c(MU_1m, MU_2m, MU_5m, MU_10m),
IBM=c(IBM_1m, IBM_2m, IBM_5m, IBM_10m),
group=group)
splom(~dat[,1:6]|group)
The result of the code is following figure:
Still, there should be some improvements regarding:
x and Y axis and labels should be set outside (like it is shown in the problem question)
the order of pairwise plots should be changed (left top corner should be "Frequency = 1", right top corner should be "Frequency = 1"...)

1-D conditional slice from a 2-D probability density function in R using np package

consider the included example in the np-package for r,
page 21 of the Vignettes for np package.
npcdens returns a conditional density object and is able to plot 2d-pdf and 2d-cdf, as shown. I wanted to know if I can somehow extract the 1-D information (pdf / cdf) from the object if I were to specify one of the two parameters, like in a vector or something ?? I am new to R and was not able to find out the format of the object.
Thanks for the help.
-Egon.
Here is the code as requested:
require(np)
data("Italy")
attach(Italy)
bw <- npcdensbw(formula=gdp~ordered(year), tol=.1, ftol=.1)
fhat <- npcdens(bws=bw)
summary(fhat)
npplot(bws=bw)
npplot(bws=bw, cdf=TRUE)
detach(Italy)
The fhat object contains all the needed info plus a whole lot more. To see what all is in there, do a str( fhat ) to see the structure.
I believe the values you are interested in are xeval, yeval, and condens (PDF density).
There are lots of ways to get at the values but I tend to like data frames. I'd pop the three vectors in a single data frame:
denDf <- cbind( year=as.character( fhat$xeval[,1] ), fhat$yeval, fhat$condens )
## had to do a dance around the year variable because it's a factor
then I'd select the values I want with a subset():
subset( denDf, year==1951 & gdp > 8 & gdp < 8.2)
since gdp is a floating point value it's very hard to select with a == operator.
The method suggested by JD Long will only extract density for data points in the existing training set. If you want the density at other points (conditioning or conditional variables) you will need to use the predict()
function. The following code extracts and plots the 1-D density distribution conditioned on year ==1999, a value not contained in the original data set.
First construct a data frame with the same components as the Italy data set, with gdp regularly spaced and with "1999" an ordered factor.
yr1999<- rep("1999", 100)
gdpVals <-seq(1,35, length.out=100)
nD1999 <- data.frame(year = ordered(yr1999), gdp = gdpVals)
Next use the predict function to extract the densities.
gdpDens1999 <-predict(fhat,newdata = nD1999)
The following code plots the density.
plot(gdpVals, gdpDens1999, type='l', col='red', xlab='gdp', ylab = 'p(gdp|yr = 1999)')

Resources