rCharts - interactive matrix plot - r

I have a horizontally oriented matrix with x = time and y = stocks' returns. I'd like to plot it with rCharts to make it interactive but I can't find HOW anywhere...
matrix is like:
matTest <- as.data.frame(matrix(rnorm(100,0,1), nrow = 5, ncol = 10))
colnames(matTest) <- c('t0','t1','t2','t3','t4','t5','t6','t7','t8','t9')
rownames(matTest) <- c('stock1','stock2','stock3', 'stock4','stock5')
do you know how can I do that?
Thank you very much

If you need an interactive table, you can use this code on your original data.
library(DT)
datatable(matTest, options = list(pageLength = 9))
If you want an interactive time_series plot, first of all change the format of your data in this way:
df<-as.data.frame(cbind(as.matrix(as.vector(t(matTest))),c(1:ncol(matTest)-1),unlist(lapply(rownames(matTest),rep,times=ncol(matTest)))))
colnames(df)<-c("time_series","time","stock")
df
time_series time stock
1 -0.813688587253615 0 1
2 -0.457763419325742 1 1
3 0.0756429812511287 2 1
4 2.18700453503453 3 1
5 1.00659661717065 4 1
6 -2.16436341755656 5 1
7 -0.0829999360152501 6 1
8 -0.491237208736282 7 1
9 0.351591891565934 8 1
10 0.138073915553248 9 1
11 0.276431050047784 0 2
12 -0.88208290628419 1 2
13 0.421498167781597 2 2
...
Now use rCharts to plot yout time_series
library("rCharts")
xPlot(time_series~time, group="stock",data=df,type="line-dotted")
Now you can change the plot's parameters to have the best outfit.

Related

In R, how do I plot a Weibull probability density function for right censored data using package fitdistrplus?

I am trying to overlay a Weibull probabiliy density function (PDF) for right censored data on a histogram of the data using package fitdistrplus but have been unable to do so.
df is a data.frame containing a dummy data set in the format of my larger data set.
> library(fitdistrplus)
> left <- c(rep(1,12),rep(2,5),rep(3,3),rep(4,2),rep(5,1))
> right <- c(rep(1,12),rep(2,5),rep(3,3),rep(4,2),NA)
> (df <- data.frame(left,right))
left right
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
11 1 1
12 1 1
13 2 2
14 2 2
15 2 2
16 2 2
17 2 2
18 3 3
19 3 3
20 3 3
21 4 4
22 4 4
23 5 NA
> fitcen <- fitdistrplus::fitdistcens(df,"weibull")
> fitdistrplus::plotdistcens(df,
+ distr = "weibull",
+ para = list(fitcen$estimate[1],fitcen$estimate[2]))
Executing the script above generates the figures below.
I have successfully overlaid a Weibull PDF over a histogram of a non-censored version of this dummy data set (below). The top-left figure in the image below is very nearly my target except that it shows the PDF of non-censored rather than right censored data.
> noncens <- fitdistrplus::fitdist(left, distr = "weibull")
> plot(noncens)
Edit: as I read more I see that this is likely a multi-step process and more complex than a single answer.
Partial answer: extract slope and scale values from fitcen and use use function dweibull() to generate plot of PDF. Histogram of values is not here included.
> shape <- fitcen$estimate[1]
> scale <- fitcen$estimate[2]
> curve(dweibull(x, shape=shape, scale=scale), from=0, to=10)

Plotting multiple bar plots on same y-axis but each on separate x-axis in ggplot2 for count data

I have some count variables against which I want to make bar-plots on the same y-axis but I have no grouping variable. Something like the following plot
B <- 25
iter_M1
[1] 5 13 14 11 7 8 10 14 10 5 7 13 10 12 4 5 9 6 5 12 8 8 7 11 9
max_M1 <- max(iter_M1)
count_M1 <- integer(max_M1)
for(i in 1:max_M1)
{
for(j in 1:B)
{
if(iter_M1[j] == i)
count_M1[i] = count_M1[i] +1
}
}
count_M1
[1] 0 0 0 1 4 1 3 3 2 3 2 2 2 2
df <- data.frame(x = 1:max_M1, y = count_M1)
p_M1 <-ggplot(data=df, aes(x=x, y=y)) + geom_bar(stat="identity")
p_M1
This results in a plot like this
and another similar variable
iter_M2
[1] 3 1 3 2 6 3 4 4 3 7 4 2 2 3 4 3 4 4 1 3 7 3 2 4 2
max_M2 <- max( iter_M2)
count_M2 <- integer(max_M2)
for(i in 1:max_M2)
{
for(j in 1:B)
{
if(iter_M2[j] == i)
count_M2[i] = count_M2[i] +1
}
}
count_M2
[1] 2 5 8 7 0 1 2 df1 <- data.frame(x1 = 1:max_M2, y1 = count_M2)
p_M2 <-ggplot(data=df1, aes(x=x1, y=y1)) +
geom_bar(stat="identity") p_M2
which results in a second plot as
and similar variables like these... How can I plot this data side by side. Also the way I'hv generated data currently, there is no common y-axis for all x-axis. Are there some suggestion to generate such a plot or dataset in other format to achive the requried plot.
As suggested in the comments, making a factor (class) is the easiest way, allowing you to facet the plot.
But you seem explicitly just to want to have the same y-axis. This is achievable with the scale limits. For example, generate a vector with the limits based on max and then use this in your plots.
ylimits <- c(0, max(c(count_M1, count_M2)))
p_M1 + ylim(ylimits)
p_M2 + ylim(ylimits)

How to get member of clusters from R's hclust/heatmap.2

I have the following code that perform hiearchical
clustering and plot them in heatmap.
library(gplots)
set.seed(538)
# generate data
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# the actual data is much larger that the above
# perform hiearchical clustering and plot heatmap
test <- heatmap.2(y)
Which plot this:
What I want to do is to get the cluster member from each hierarchy of in the plot
yielding:
Clust 1: g3-g2-g4
Clust 2: g2-g4
Clust 3: g4-g7
etc
Cluster last: g1-g2-g3-g4-g5-g6-g7-g8-g9-g10
Is there a way to do it?
I did have the answer, after all! #zkurtz identified the problem ... the data I was using were different than the data you were using. I added a set.seed(538) statement to your code to stabilize the data.
Use this code to create a matrix of cluster membership for the dendrogram of the rows using the following code:
cutree(as.hclust(test$rowDendrogram), 1:dim(y)[1])
This will give you:
1 2 3 4 5 6 7 8 9 10
g1 1 1 1 1 1 1 1 1 1 1
g2 1 2 2 2 2 2 2 2 2 2
g3 1 2 2 3 3 3 3 3 3 3
g4 1 2 2 2 2 2 2 2 2 4
g5 1 1 1 1 1 1 1 4 4 5
g6 1 2 3 4 4 4 4 5 5 6
g7 1 2 2 2 2 5 5 6 6 7
g8 1 2 3 4 5 6 6 7 7 8
g9 1 2 3 4 4 4 7 8 8 9
g10 1 2 3 4 5 6 6 7 9 10
This solution requires computing the cluster structure using a different packags:
# Generate data
y = matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# The new packags:
library(nnclust)
# Create the links between all pairs of points with
# squared euclidean distance less than threshold
links = nncluster(y, threshold = 2, fill = 1, give.up =1)
# Assign a cluster number to each point
clusters=clusterMember(links, outlier = FALSE)
# Display the points that are "alone" in their own cluster:
nas = which(is.na(clusters))
print(rownames(y)[nas])
clusters = clusters[-nas]
# For each cluster (with at least two points), display the included points
for(i in 1:max(clusters, na.rm = TRUE)) print(rownames(y)[clusters == i])
Obviously you would want to revise this into a function of some kind to be more user friendly. In particular, this gives the clusters at only one level of the dendrogram. To get the clusters at other levels, you would have to play with the threshold parameter.

Why doesn't qplot plot lines in multiple series for this data file?

It's my first day learning R and ggplot. I've followed some tutorials and would like plots like are generated by the following command:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
It looks like the figure on this page:
http://www.r-bloggers.com/quick-introduction-to-ggplot2/
I had a handmade test data file I created, which looks like this:
site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7
but when I try to read and plot it with:
test <- read.table('test.data')
qplot(temp, humidity, data = test, color=site, geom = c("point", "line"))
the lines on the plot aren't separate series, but link together:
http://imgur.com/weRaX
What am I doing wrong?
Thanks.
You need to tell ggplot2 how to group the data into separate lines. It's not a mind reader! ;)
dat <- read.table(text = " site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7",sep = "",header = TRUE)
qplot(temp, humidity, data = dat, group = site,color=site, geom = c("point", "line"))
Note that you probably also wanted to do color = factor(site) in order to force a discrete color scale, rather than a continuous one.

Creating a delta column to plot time series differences in R

I have a set of motorsport laptime data (mld) of the form:
car lap laptime
1 1 1 138.523
2 1 2 122.373
3 1 3 121.395
4 1 4 137.871
and I want to produce something of the form:
lap car.1 car.1.delta
1 1 138 NA
2 2 122 -16
3 3 121 -1
4 4 127 6
I can use the R command diff(mld$laptime, lag=1) to produce the difference column, but how do I elegantly create the padded difference column in R?
Here are a couple of approaches:
1) zoo
If we represented this as a time series using zoo then the calculation would be particularly simple:
# test data with two cars
Lines <- "car lap laptime
1 1 138.523
1 2 122.373
1 3 121.395
1 4 137.871
2 1 138.523
2 2 122.373
2 3 121.395
2 4 137.871"
cat(Lines, "\n", file = "data.txt")
# read it into a zoo series, splitting it
# on car to give wide form (rather than long form)
library(zoo)
z <- read.zoo("data.txt", header = TRUE, split = 1, index = 2, FUN = as.numeric)
# now that its in the right form its simple
zz <- cbind(z, diff(z))
The last statement gives:
> zz
1.z 2.z 1.diff(z) 2.diff(z)
1 138.523 138.523 NA NA
2 122.373 122.373 -16.150 -16.150
3 121.395 121.395 -0.978 -0.978
4 137.871 137.871 16.476 16.476
To plot zz, one column per panel, try this:
plot(zz, type = "o")
To only plot the differences we do not really need zz in the first place as this will do:
plot(diff(z), type = "o")
(Add the screen=1 argument to the plot command to plot everything on the same panel.)
2) ave. Here is a second solution that uses just plain R (except for the plotting) and keeps the output in long form; however, it is a bit more complex:
# assume same input as above
DF <- read.table("data.txt", header = TRUE)
DF$diff <- ave(DF$laptime, DF$car, FUN = function(x) c(NA, diff(x)))
The result is:
> DF
car lap laptime diff
1 1 1 138.523 NA
2 1 2 122.373 -16.150
3 1 3 121.395 -0.978
4 1 4 137.871 16.476
5 2 1 138.523 NA
6 2 2 122.373 -16.150
7 2 3 121.395 -0.978
8 2 4 137.871 16.476
To plot just the differences, one per panel, try this:
library(lattice)
xyplot(diff ~ lap | car, DF, type = "o")
Update
Added info above on plotting since the title of the question mentions this.
I think this is enough:
mld$car.1.delta = c(NA, diff(mld$laptime, lag = 1))
In your example you have truncated laptimes but rounded car.1.delta, so if you really depends on how you want that to work, but code below gives what you posted.
Wrap everything in with to simplify, and create a new data.frame based on modifications of the existing columns. Prepend an NA to the diff to pad it out.
with(mld,
data.frame(
lap = lap,
car.1 = trunc(laptime),
car.1.delta = c(NA, round(diff(laptime)))
)
)
lap car.1 car.1.delta
1 1 138 NA
2 2 122 -16
3 3 121 -1
4 4 137 16
I wonder if you want to do this by car, and if so it will need a bit more handling but since you've literally asked for column car.1 I think this works so far as that goes.

Resources