Spidergraph in R - r

The following is some code that produces various spider graphs:
# Data must be given as the data frame, where the first cases show maximum.
maxmin <- data.frame(
total=c(5, 1),
phys=c(15, 3),
psycho=c(3, 0),
social=c(5, 1),
env=c(5, 1))
# data for radarchart function version 1 series, minimum value must be omitted from above.
RNGkind("Mersenne-Twister")
set.seed(123)
dat <- data.frame(
total=runif(3, 1, 5),
phys=rnorm(3, 10, 2),
psycho=c(0.5, NA, 3),
social=runif(3, 1, 5),
env=c(5, 2.5, 4))
dat <- rbind(maxmin,dat)
op <- par(mar=c(1, 2, 2, 1),mfrow=c(2, 2))
radarchart(dat, axistype=1, seg=5, plty=1, vlabels=c("Total\nQOL", "Physical\naspects",
"Phychological\naspects", "Social\naspects", "Environmental\naspects"),
title="(axis=1, 5 segments, with specified vlabels)")
radarchart(dat, axistype=2, pcol=topo.colors(3), plty=1, pdensity=30, pfcol=topo.colors(3),
title="(topo.colors, fill, axis=2)")
radarchart(dat, axistype=3, pty=32, plty=1, axislabcol="grey", na.itp=FALSE,
title="(no points, axis=3, na.itp=FALSE)")
radarchart(dat, axistype=1, plwd=1:5, pcol=1, centerzero=TRUE,
seg=4, caxislabels=c("worst", "", "", "", "best"),
title="(use lty and lwd but b/w, axis=1,\n centerzero=TRUE, with centerlabels)")
par(op)
The output of the graphs consists of two sets of line segments with different colors. Where did the second set of line segments come from? Also what is a good way to graph multiple items on the same spider graph?

You should mention that you are using the fmsb library to create the graph. The code you show is the example in the documentation. The puzzling thing at first glance is why three sets of lines are shown (not two as you imply with "second set") while there are 5 records in dat.
It is all in that same documentation you took the code from:
row 1 = the maximum values (defined in `maxmin` in the example code)
row 2 = minimum values (defined in `maxmin` in the example code)
row 3 to 5 are example data points, each row leading to one of the
three line segments that you see in the example graphs.
Just read the documentation for radarchart {fmsb} again and play with the numbers in the example as you do so. It should be pretty clear what is happening and what options you have for your own data. You can add as many data-rows and create corresponding lines as you wish. But these do tend to become unreadable if you overdo it.

Related

R - Histogram Doesn't show density due to magnitude of the Data

I have a vector called data with length 444000 approximately, and most of the numeric values are between 1 and 100 (almost all of them). I want to draw the histogram and draw the the appropriate density on it. However, when I draw the histogram I get this:
hist(data,freq=FALSE)
What can I do to actually see a more detailed histogram? I tried to use the breaks code, it helped, but it's really hard do see the histogram, because it's so small. For example I used breaks = 2000 and got this:
Is there something that I can do? Thanks!
Since you don't show data, I'll generate some random data:
d <- c(rexp(1e4, 100), runif(100, max=5e4))
hist(d)
Dealing with outliers like this, you can display the histogram of the logs, but that may difficult to interpret:
If you are okay with showing a subset of the data, then you can filter the outliers out either dynamically (perhaps using quantile) or manually. The important thing when showing this visualization in your analysis is that if you must remove data for the plot, then be up-front when the removal. (This is terse ... it would also be informative to include the range and/or other properties of the omitted data, but that's subjective and will differ based on the actual data.)
quantile(d, seq(0, 1, len=11))
d2 <- d[ d < quantile(d, 0.90) ]
hist(d2)
txt <- sprintf("(%d points shown, %d excluded)", length(d2), length(d) - length(d2))
mtext(txt, side = 1, line = 3, adj = 1)
d3 <- d[ d < 10 ]
hist(d3)
txt <- sprintf("(%d points shown, %d excluded)", length(d3), length(d) - length(d3))
mtext(txt, side = 1, line = 3, adj = 1)

R: Increase space between multiple boxplots to avoid omitted x axis labels

Let's say I generate 5 sets of random data and want to visualize them using boxplots and save those to a file "boxplots.png". Using the code
png("boxplots.png")
data <- matrix(rnorm(25),5,5)
boxplot(data, names = c("Name1","Name2","Name3","Name4","Name5"))
dev.off()
there are 5 boxplots created as desired in "boxplots.png", however the names for the second ("Name2") and the fourth ("Name4") boxplot are omitted. Even changing the window of my png-view makes no difference. How can I avoid this behavior?
Thank you!
Your offered code does not produce an overlap in my setting, but that point is relatively moot: you want a way to allow more space between words.
One (brute-force-ish) way to fix the symptom is to alternate putting them on separate lines:
set.seed(42)
data <- matrix(rnorm(25),5,5)
nms <- c("Name1","Name2","Name3","Name4","Name5")
oddnums <- which(seq_along(nms) %% 2 == 0)
evennums <- which(seq_along(nms) %% 2 == 1)
(There's got to be a better way to do that, but it works.)
From here:
png("boxplot.png", height = 240)
boxplot(data, names = FALSE)
mtext(nms[oddnums], side = 1, line = 2, at = oddnums)
mtext(nms[evennums], side = 1, line = 1, at = evennums)
dev.off()
(The use of png is not important here, I just used it because of your edit.)

How to overlay multiple TA in new plot using quantmod?

We can plot candle stick chart using chart series function chartSeries(Cl(PSEC)) I have created some custom values (I1,I2 and I3) which I want to plot together(overlay) outside the candle stick pattern. I have used addTA() for this purpose
chartSeries(Cl(PSEC)), TA="addTA(I1,col=2);addTA(I2,col=3);addTA(I3,col=4)")
The problem is that it plots four plots for Cl(PSEC),I1,I2 and I3 separately instead of two plots which I want Cl(PSEC) and (I1,I2,I3)
EDITED
For clarity I am giving a sample code with I1, I2 and I3 variable created for this purpose
library(quantmod)
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
I1=SMA(price,3)
I2=SMA(price,10)
I3=SMA(price,15)
chartSeries(price, TA="addTA(I1,col=2);addTA(I2,col=3);addTA(I3,col=4)")
Here is an option which preserves largely your original code.
You can obtain the desired result using the option on=2 for each TA after the first:
library(quantmod)
getSymbols("PSEC")
price <- Cl(PSEC)
I1 <- SMA(price,3)
I2 <- SMA(price,10)
I3 <- SMA(price,15)
chartSeries(price, TA=list("addTA(I1, col=2)", "addTA(I2, col=4, on=2)",
"addTA(I3, col=5, on=2)"), subset = "last 6 months")
If you want to overlay the price and the SMAs in one chart, you can use the option on=1 for each TA.
Thanks to #hvollmeier who made me realize with his answer that I had misunderstood your question in the previous version of my answer.
PS: Note that several options are described in ?addSMA(), including with.col which can be used to select a specific column of the time series (Cl is the default column).
If I understand you correctly you want the 3 SMAs in a SUBPLOT and NOT in your main chart window.You can do the following using newTA.
Using your data:
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
Now plotting a 10,30,50 day SMA in a window below the main window:
chartSeries(price['2016'])
newSMA <- newTA(SMA, Cl, on=NA)
newSMA(10)
newSMA(30,on=2)
newSMA(50,on=2)
The key is the argument on. Use on = NA in defining your new TA function, because the default value foron is 1, which is the main window. on = NA plots in a new window. Then plot the remaining SMAs to the same window as the first SMA. Style the colours etc.to your liking :-).
You may want to consider solving this task using plotting with the newer quantmod charts in the quantmod package (chart_Series as opposed to chartSeries).
Pros:
-The plots look cleaner and better (?)
-have more flexibility via editing the pars and themes options to chart_Series (see other examples here on SO for the basics of things you can do with pars and themes)
Cons:
-Not well documented.
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
chart_Series(price, subset = '2016')
add_TA(SMA(price, 10))
add_TA(SMA(price, 30), on = 2, col = "green")
add_TA(SMA(price, 50), on = 2, col = "red")
# Make plot all at once (this approach is useful in shiny applications):
print(chart_Series(price, subset = '2016', TA = 'add_TA(SMA(price, 10), yaxis = list(0, 10));
add_TA(SMA(price, 30), on = 2, col = "purple"); add_TA(SMA(price, 50), on = 2, col = "red")'))

Printing plot depending on variable conditions on 2 pdf pages

I'am trying to print a plot, depending on a variable with 12 terms. This plot is the result of cluster classification on sequences, using OM distance.
I print this plot on one pdf page :
pdf("YYY.pdf", height=11,width=20)
seqIplot(XXX.seq, group=XXX$variable, cex.legend = 2, cex.plot = 1.5, border = NA, sortv =XXX.om)
dev.off()
But the printing is to small ... so i try to print this on 2 pages, like this :
pdf("YYY.pdf", height=11,width=20)
seqIplot(XXX.seq, group=XXX$variable, variable="1":"6", cex.legend = 2, cex.plot = 1.5, border = NA, sortv =XXX.om)
seqIplot(XXX.seq, group=XXX$variable, variable="7":"12", cex.legend = 2, cex.plot = 1.5, border = NA, sortv = XXX.om)
dev.off()
But it doesn't work ... Do you know how I can ask R to separate terms' variables into two groups, so as to print 6 graphics per pdf page ?
The solution is to plot separately the subset of groups you want on each page. Here is an example using the biofam data provided by TraMineR. The group variable p02r04 is religious participation which takes 10 different values.
library(TraMineR)
data(biofam)
bs <- seqdef(biofam[,10:25])
group <- factor(biofam$p02r04)
lv <- levels(group)
sel <- (group %in% lv[1:6])
seqIplot(bs[sel,], group=group[sel], sortv="from.end", withlegend=FALSE)
seqIplot(bs[!sel,], group=group[!sel], sortv="from.end")
If you are sorting the index plot with a variable you should indeed take the same subset of the sort variable, e.g. sortv=XXX.om[sel] in your case.
I don't know if I understood your question, you could post some data in order to help us reproduce what you want, maybe this helps. To plot six graphs in one page you should adjust the mfrow parameter, is that what you wanted?
pdf("test.pdf")
par(mfrow=c(3,2))
plot(1:10, 21:30)
plot(1:10, 21:30, pch=20)
hist(rnorm(1000))
barplot(VADeaths)
...
dev.off()

avoiding over-crowding of labels in r graphs

I am working on avoid over crowding of the labels in the following plot:
set.seed(123)
position <- c(rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5),
rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5))
group <- c(rep (1, length (position)/2),rep (2, length (position)/2) )
mylab <- paste ("MR", 1:length (group), sep = "")
barheight <- 0.5
y.start <- c(group-barheight/2)
y.end <- c(group+barheight/2)
mydf <- data.frame (position, group, barheight, y.start, y.end, mylab)
plot(0,type="n",ylim=c(0,3),xlim=c(0,10),axes=F,ylab="",xlab="")
#Create two horizontal lines
require(fields)
yline(1,lwd=4)
yline(2,lwd=4)
#Create text for the lines
text(10,1.1,"Group 1",cex=0.7)
text(10,2.1,"Group 2",cex=0.7)
#Draw vertical bars
lng = length(position)/2
lg1 = lng+1
lg2 = lng*2
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, mydf$mylab[1:lng], srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
You can see some areas are crowed with the labels - when x value is same or similar. I want just to display only one label (when there is multiple label at same point). For example,
mydf$position[1:5] are all 0,
but corresponding labels mydf$mylab[1:5] -
MR1 MR2 MR3 MR4 MR5
I just want to display the first one "MR1".
Similarly the following points are too close (say the difference of 0.35), they should be considered a single cluster and first label will be displayed. In this way I would be able to get rid of overcrowding of labels. How can I achieve it ?
If you space the labels out and add some extra lines you can label every marker.
clpl <- function(xdata, names, y=1, dy=0.25, add=FALSE){
o = order(xdata)
xdata=xdata[o]
names=names[o]
if(!add)plot(0,type="n",ylim=c(y-1,y+2),xlim=range(xdata),axes=F,ylab="",xlab="")
abline(h=1,lwd=4)
dy=0.25
segments(xdata,y-dy,xdata,y+dy)
tpos = seq(min(xdata),max(xdata),len=length(xdata))
text(tpos,y+2*dy,names,srt=90,adj=0)
segments(xdata,y+dy,tpos,y+2*dy)
}
Then using your data:
clpl(mydf$position[lg1:lg2],mydf$mylab[lg1:lg2])
gives:
You could then think about labelling clusters underneath the main line.
I've not given much thought to doing multiple lines in a plot, but I think with a bit of mucking with my code and the add parameter it should be possible. You could also use colour to show clusters. I'm fairly sure these techniques are present in some of the clustering packages for R...
Obviously with a lot of markers even this is going to get smushed, but with a lot of clusters the same thing is going to happen. Maybe you end up labelling clusters with a this technique?
In general, I agree with #Joran that cluster labelling can't be automated but you've said that labelling a group of lines with the first label in the cluster would be OK, so it is possible to automate some of the process.
Putting the following code after the line lg2 = lng*2 gives the result shown in the image below:
clust <- cutree(hclust(dist(mydf$position[1:lng])),h=0.75)
u <- rep(T,length(unique(clust)))
clust.labels <- sapply(c(1:lng),function (i)
{
if (u[clust[i]])
{
u[clust[i]] <<- F
as.character(mydf$mylab)[i]
}
else
{
""
}
})
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, clust.labels, srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
(I've only labelled the clusters on the lower line -- the same principle could be applied to the upper line too). The parameter h of cutree() might have to be adjusted case-by-case to give the resolution of labels that you want, but this approach is at least easier than labelling every cluster by hand.

Resources