How can I create a boxplot with whiskers? - r

I have created a plot with 3 boxplots, but my whiskers are not showing for one of them.
How can I make them show?
This is my data:
Class 3: 5.055052 3.028838 3.423485 6.434745 6.396239 4.114418
3.687380 2.633139 7.356185 5.736677 4.462504 7.137034
Class 4: 4.738094 21.736701 6.716363 10.306583 4.757640 6.265024
My code is as follows:
boxplot(hvol.concentration,class.3, class.4, ylab="8-OHdG Concentration (ng/ml)", main="Boxplot Distribution of 8-OHdG", ylim=c(0,25), pch=16, names=c("Control", "NYHA III", "NYHA IV"))

boxplot(c(4.738094, 21.736701, 6.716363, 10.306583, 4.757640, 6.265024), plot = FALSE)$stats
## [,1]
## [1,] 4.738094 <<== It's definitely there but the lower bound of the IQR is almost the same as min val
## [2,] 4.757640
## [3,] 6.490694
## [4,] 10.306583 <<== Upper bound of IQR == max val
## [5,] 10.306583
If you make the plot window bigger the grid size will be sufficient to see the lower IQR:
boxplot(
c(4.738094, 21.736701, 6.716363, 10.306583, 4.757640, 6.265024),
horizontal = TRUE
)

Related

Fanchart Color Scale

I have the following predictions which I obtained from library(vars). Lets call this vecm.pred
$price
fcst lower upper CI
[1,] 4956.787 4864.032 5049.543 92.75548
[2,] 4948.936 4844.545 5053.327 104.39064
[3,] 5089.440 4979.941 5198.939 109.49891
[4,] 5076.999 4939.429 5214.569 137.56992
[5,] 5000.012 4854.955 5145.068 145.05669
[6,] 5072.107 4910.435 5233.780 161.67272
$people
fcst lower upper CI
[1,] 2529.799 2417.699 2641.899 112.1000
[2,] 2498.627 2269.438 2727.817 229.1893
[3,] 2410.037 2116.672 2703.402 293.3648
[4,] 2418.197 2094.965 2741.429 323.2320
[5,] 2371.373 2028.816 2713.929 342.5561
[6,] 2289.163 1941.386 2636.939 347.7764
I am trying to use fanchart to show my forecasts below:
fanchart(vecm.pred, ylab = c("Price (€)","Volume"), main = c("Price","People"))
But I cannot get past the following issues:
1) How do I change the colors from the default grey scale to a heatmap of red to yellows?
2) How do I have alternative ylabs for my first and second plot? As my ylab function above just provides two y-axis names for each plot.

Plot R matrix columns according to column name

I would like to plot the following matrix x, so the column data are plotted according to their column name (i.e. 0.1, 0.2, etc.) on the x-axis.
> x
0.1 0.2 0.3 0.4 0.5
[1,] 5.000000e-01 5.000000e-01 5.000000e-01 5.000000e-01 0.5000000000
[2,] 2.500000e-02 5.000000e-02 7.500000e-02 1.000000e-01 0.1250000000
[3,] 2.437500e-03 9.500000e-03 2.081250e-02 3.600000e-02 0.0546875000
[4,] 2.431559e-04 1.881950e-03 6.113802e-03 1.388160e-02 0.0258483887
[5,] 2.430967e-05 3.756817e-04 1.822927e-03 5.475560e-03 0.0125901247
[6,] 2.430908e-06 7.510810e-05 5.458812e-04 2.178231e-03 0.0062158067
[7,] 2.430902e-07 1.502049e-05 1.636750e-04 8.693947e-04 0.0030885852
[8,] 2.430902e-08 3.004053e-06 4.909445e-05 3.474555e-04 0.0015395229
[9,] 2.430902e-09 6.008089e-07 1.472761e-05 1.389339e-04 0.0007685764
[10,] 2.430902e-10 1.201617e-07 4.418219e-06 5.556585e-05 0.0003839928
But when I use
plot(x, pch=20, ylim=c(0, 1))
I get the following: Plot of R matrix.
I want a plot, where x[1, 1] (i.e. 5.000000e-01) is plotted as a point on 0.1 on the x-axis and 0.5 on the y-axis.
set.seed(123)
mat<-matrix(rnorm(25),5,5)
colnames(mat)<-seq(0.1,0.5,length.out=5)
plot(x=matrix(rep(as.numeric(colnames(mat)),5), 5,5,byrow=T),y=mat)
here the first argument x will repeat the number on the x axis by 5, so 5 x 5 I'll get a matrix which will give the right x position to each y column.
matplot(x=matrix(rep(as.numeric(colnames(mat)),5), 5,5,byrow=T),y=mat)
Can also be used

Rolling PCA and plotting proportional variance of principal components

I'm using the following code to perform PCA:
PCA <- prcomp(Ret1, center = TRUE, scale. = TRUE)
summary(PCA)
I get the following result:
#Importance of components:
# PC1 PC2 PC3 PC4
#Standard deviation 1.6338 0.9675 0.60446 0.17051
#Proportion of Variance 0.6673 0.2340 0.09134 0.00727
#Cumulative Proportion 0.6673 0.9014 0.99273 1.00000
What I would like to do is a Rolling PCA for a specific window ( e.g. 180 days). The Result should be a matrix which shows the evolution of the "Proportion of Variance" of all principal components though time.
I tried it with
rollapply(Ret1, 180, prcomp)
but this doesn't work and I have no Idea how to save the "Proportion of Variance" for each time step in matrix.
The output matrix should look like this:
# PC1 PC2 PC3 PC4
#Period 1 0.6673 0.2340 0.09134 0.00727
#Period 2 0.7673 0.1340 0.09134 0.00727
# ....
Here is a mini subset of my data Ret1:
Cats Dogs Human Frogs
2016-12-13 0.0084041063 6.518479e-03 6.096295e-04 5.781271e-03
2016-12-14 -0.0035340384 -8.150321e-03 4.418382e-04 -5.978296e-03
2016-12-15 0.0107522782 3.875708e-03 -1.784663e-02 3.012253e-03
2016-12-16 0.0033034130 -1.752174e-03 -1.753624e-03 -4.448850e-04
2016-12-17 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00
2016-12-18 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00
2016-12-19 0.0019876743 1.973190e-03 -8.577261e-03 1.996151e-03
2016-12-20 0.0033235161 3.630921e-03 -4.757395e-03 4.594355e-03
2016-12-21 0.0003401156 -2.460351e-03 3.708875e-03 -1.636413e-03
2016-12-22 -0.0010940147 -1.864724e-03 -7.991572e-03 -1.158029e-03
2016-12-23 -0.0005387228 1.250898e-03 -2.843725e-03 7.492594e-04
2016-12-24 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00
2016-12-25 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00
2016-12-26 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00
2016-12-27 0.0019465877 2.245918e-03 0.000000e+00 5.632058e-04
2016-12-28 0.0002396803 -8.391658e-03 8.307552e-03 -5.598988e-03
2016-12-29 -0.0020884556 -2.933868e-04 1.661246e-03 -7.010738e-04
2016-12-30 0.0026172923 -4.647865e-03 9.574997e-03 -2.889166e-03
I tried the following:
PCA <- function(x){
Output=cumsum((apply((prcomp(x,center = TRUE, scale. = TRUE))$x, 2, var))/sum(vars))
return(Output)}
window <- 10
data <- Ret1
result <- rollapply(data, window,PCA)
plot(result)
#Gives you the Proportion of Variance = cumsum((apply((prcomp(x,center = TRUE, scale. = TRUE))$x, 2, var))/sum(vars))
First, the correct function for your purpose may be written as follow, using $sdev result of prcomp. I have left over center = TRUE and scale. = TRUE as they are function default.
PCA <- function(x){
oo <- prcomp(x)$sdev
oo / sum(oo)
}
Now, we can easily use sapply to do rolling operation:
## for your mini dataset of 18 rows
window <- 10
n <- nrow(Ret1)
oo <- sapply(seq_len(n - window + 1), function (i) PCA(Ret1[i:(i + window - 1), ]))
oo <- t(oo) ## an extra transposition as `sapply` does `cbind`
# [,1] [,2] [,3] [,4]
# [1,] 0.5206345 0.3251099 0.12789683 0.02635877
# [2,] 0.5722264 0.2493518 0.14588631 0.03253553
# [3,] 0.6051199 0.1973694 0.16151859 0.03599217
# [4,] 0.5195527 0.2874197 0.16497219 0.02805543
# [5,] 0.5682829 0.3100708 0.09456654 0.02707977
# [6,] 0.5344804 0.3149862 0.08912882 0.06140464
# [7,] 0.5954948 0.2542775 0.10434155 0.04588616
# [8,] 0.5627977 0.2581071 0.13068875 0.04840648
# [9,] 0.6089650 0.2559285 0.11022974 0.02487672
Each column is a PC, while each row gives proportional variance for each component in that period.
To further plot the result, you can use matplot:
matplot(oo, type = "l", lty = 1, col = 1:4,
xlab = "period", ylab = "proportional variance")
PCA 1-4 are sketched with colour 1:4, i.e., "black", "red", "green" and "blue".
Additional comments:
If you want to use zoo::rollapply, do
oo <- zoo::rollapply(Ret1, window, PCA, by.column = FALSE)
Precisely, I am reporting proportional standard deviation. If you really want proportional variance, chance PCA function to:
PCA <- function(x){
oo <- prcomp(x)$sdev ^ 2
oo / sum(oo)
}

Measuring bandwidth of a signal in R

I am trying to measure the bandwidth of a signal from the power spectra. I want to be able to extract the min and max values given a relative amplitude value. I have been using "seewave" to calculate the power spectra, and I can make a density plot, and provide the abline, but I cannot figure out how to get R to tell me where the abline intersects with the plot. I will need to change the relative amplitude values of interest, depending on the signal quality, but want to find a straightforward way to measure bandwidth using R. Thanks in advance!
power.spec <- spec(IBK.trill.1, flim=c(0,2))
pow.spec <- as.matrix(power.spec)
head(pow.spec)
# x y
# [1,] 0.000000000 0.007737077
# [2,] 0.007470703 0.029795630
# [3,] 0.014941406 0.021248476
# [4,] 0.022412109 0.015603801
# [5,] 0.029882813 0.014103307
# [6,] 0.037353516 0.014584454
freq <- pow.spec[1:2941,1]
head(freq)
# [1] 0.000000000 0.007470703 0.014941406 0.022412109 0.029882813 0.037353516
ampl <- pow.spec[,2]
head(ampl)
# [1] 0.007737077 0.029795630 0.021248476 0.015603801 0.014103307 0.014584454
plot(ampl ~ freq, type="l",xlim=c(0,2))
abline(h=0.45)
Save the results of the identification of "y" values that exceed your threshold:
wspec <- which( power.spec[, "y"] > 0.45)
Then used those indices to pull from the "x" values to place vertical lines at the first and last indices:
abline( v= power.spec[ c( wspec[1], tail(wspec, 1) ) , "x"], col="blue" )
BTW, I suggested the original "power.spec" values rather than your as.matrix version because spec returns a matrix so coercion is not needed. I tested this on the first example from the ?spec page. I suppose you could get real picky and try to take the mean of "x" where the thresholds were in excess and the ones just before and after. Which would then be:
abline( v= c( mean( myspec[ c( wspec[1]-1, wspec[1]), "x"]) ,
mean( myspec[ c( tail(wspec, 1), tail(wspec, 1)+1 ) , "x"]) ), col="blue" )
I did look at the differences with diff and the typical separation in my example was
mean( diff(myspec[ , "x"]) )
[1] 0.0005549795
So I could have gone back and ahead by half that amount to get a reasonable estimate. (I used this as my estimate for "half-height": max(myspec[, "y"])/2)

Extracting gap statistic info to identify K for Kmeans clustering

I was looking at the 'cluster' library which has the function 'clusGap' to extract the number of clusters for Kmeans clustering.
This is the code:
# Compute Gap statistic (http://web.stanford.edu/~hastie/Papers/gap.pdf)
computeGapStatistic() <- function(data) {
gap <<- clusGap(shift_len_avg_data, FUN = kmeans, K.max = 8, B = 3)
if (ENABLE_PLOTS) {
plot(gap, main = "Gap statistic for the Nursing shift data")
}
print(gap)
return(gap)
}
Which gives me the following output when 'gap' is printed out:
> print(gap)
Clustering Gap statistic ["clusGap"].
B=3 simulated reference sets, k = 1..8
--> Number of clusters (method 'firstSEmax', SE.factor=1): 2
logW E.logW gap SE.sim
[1,] 8.702334 9.238385 0.53605067 0.007945542
[2,] 7.940133 8.544323 0.60418996 0.003790244
[3,] 7.772673 8.139836 0.36716303 0.005755805
[4,] 7.325798 7.849233 0.52343473 0.002732731
[5,] 7.233667 7.629954 0.39628748 0.003496058
[6,] 7.020220 7.439709 0.41948820 0.006451708
[7,] 6.707678 7.285907 0.57822872 0.002810682
[8,] 7.166932 7.150724 -0.01620749 0.004274151
and this is how the plot look like:
Question:
How do i extract the number of clusters from the 'gap' variable? 'gap' seems to be a list. From the above description it seems to have found 2 clusters.
I figured this out on my own. This is what i used: with(gap,maxSE(Tab[,"gap"],Tab[,"SE.sim"]))

Resources