add data labels to stripchart - r

I have made a stripchart with a threshold marked in red. I would like to label the point that falls to the left of the threshold, but can't seem to get the 'text' function working at all.
stripchart screenshot
Here is the stripchart code:
stripchart(ctrls$`Staining Green`, method="jitter", pch=4, xlab='Staining Green', cex.lab=2)
abline(v=5,col=2,lty=3)
I first tried to filter only those samples below the threshold:
Staining.Green <- filter(QCcontrols, Staining.Green < 5)
then adding the text with
text(Staining.Green$`Staining Green` + 0.1, 1.1, labels = Staining.Green$Sample_Name, cex = 2)
This didn't add any text to the chart.
Then I tried labeling all the points, in case I was making it too complicated, with variations on:
text(ctrls$`Staining Green` + 0.1, 1.1, labels = ctrls$Sample_Name)
Again, no text added, and no error message.
Any suggestions greatly appreciated!
Update: my ctrls object is more complex than I realized - maybe this is tripping me up:
List of 17
$ Restoration : num [1:504] 0.0799 0.089 0.1015 0.1096 0.1092 ...
..- attr(*, "threshold")= num 0
$ Staining Green : num [1:504] 25.1 23.5 21.1 19.7 22.3 ...
..- attr(*, "threshold")= num 5
$ Staining Red : num [1:504] 39.8 40.9 36.9 33.2 33.2 ...
..- attr(*, "threshold")= num 5.......```

Here is one example using the built in data set for airquality:
stripchart(airquality$Ozone,
main="Mean ozone in parts per billion at Roosevelt Island",
xlab="Parts Per Billion",
ylab="Ozone",
method="jitter",
col="orange",
pch=4
)
abline(v = 5, col = 2, lty = 3)
with(subset(airquality, Ozone < 5), text(Ozone, 1.1, labels = Ozone))
Plot
Data
The lowest values of Ozone are:
head(sort(airquality$Ozone), 5)
[1] 1 4 6 7 7
Edit:
Here's a quick demo with a list with a similar structure:
vec1 <- c(0.0799, 0.089, 0.1015, 0.1096, 0.1092)
attr(vec1, 'threshold') <- 4
vec2 <- c(25.1, 3, 21.1, 19.7, 22.3)
attr(vec2, 'threshold') <- 5
ctrls <- list(Restoration = vec1, `Staining Green` = vec2)
stripchart(ctrls$`Staining Green`,
method="jitter",
pch=4,
xlab='Staining Green',
cex.lab=2
)
abline(v=5,col=2,lty=3)
text(ctrls$`Staining Green`[ctrls$`Staining Green` < 5], 1.1, labels = ctrls$`Staining Green`[ctrls$`Staining Green` < 5])
Note: Instead of explicitly including 5 for threshold you can substitute the threshold from your list attribute:
attr(ctrls$`Staining Green`, "threshold")
[1] 5
Plot

Related

Colouring brain surface with heat map

library(rgl)
library(brainR)
template <- readNIfTI(system.file("MNI152_T1_2mm_brain.nii.gz",
package = "brainR"), reorient = FALSE)
misc3d::contour3d(template, level = 4500, alpha = .7, draw = T)
With the above code one can can generate a 3D model of the brain.
The argument draw = FALSE asks contour3d to compute and return the contour surface as a triangle mesh object without drawing it.
a <- misc3d::contour3d(template, level = 4500, alpha = .7, draw = F)
str(a)
List of 10
$ v1 : num [1:110433, 1:3] 45 45 46 46 47 47 43 43 44 44 ...
$ v2 : num [1:110433, 1:3] 44.1 46 46 47 47 ...
$ v3 : num [1:110433, 1:3] 45 45 45 46 46 ...
$ color : chr "white"
$ color2 : logi NA
$ fill : logi TRUE
$ material: chr "default"
$ col.mesh: logi NA
$ alpha : num 0.7
$ smooth : num 0
- attr(*, "class")= chr "Triangles3D"
I would like to use the external surface of the above object to project heat maps, or say, colouring the surface... I also wonder how to determine certain positions in this model, e.g. EEG channel positions. Is it possible to generate only the surface with a$v1, a$v2 ... using function rgl:::surface3d? Thank you in advance,
I was able to draw the triangle mesh in colors with rgl. Some considerations:
I'm using rgl::triangles3d With this function, points are taken in consecutive triplets, each point v1 v2 v3 a triangle vertex (see
?triangles3d) so i had to extract the points and reorder them.
colors are mapped to vertex, taken in groups of three also. I created a simple color_map based on the x position of each vertex. Of course you must create your desired color map.
Hope this will help you.
data<-misc3d::contour3d(template, level = 4500, alpha = .7, draw = F)
points <- rbind(data$v1, data$v2, data$v3)
points <- points[order(rep(1:(nrow(points)/3),3), rep(1:3, nrow(points)/3)),]
color_map <- c("red","blue","green")[cut(points[,1], 3, labels=F)]
rgl::open3d()
rgl::triangles3d(points, alpha=.7, color = color_map)
Edit
Coordinates x, y, z are in the scale of the positions of the array template i.e. point c(5.4, 10.8, 30.1) is related to the level of the array around [5, 10, 30], that is, that point should be around level=4500

Why the red line in histogram is too short?

test <- NULL
for(i in 1:1000){
p <- rgamma(1,239,10)
yrep <- rpois (10,p)
test <- c(test,yrep)}
hist (test, xlab="T (yrep)", yaxt="n", cex=1,col = "yellow")
lines(rep(22,2), col="red", c(0,100))
print(mean(test<=22))
I got
But why the red line cannot exceed the histogram? How to edit my code to let the red line be normal?
You can try abline instead:
test <- NULL
for(i in 1:1000){
p <- rgamma(1,239,10)
yrep <- rpois (10,p)
test <- c(test,yrep)}
hist (test, xlab="T (yrep)", yaxt="n", cex=1,col = "yellow")
abline(v=22, col="red")
#Vincent's answer fixes the problem using abline. But if you need to know how high to go (perhaps you don't want a full-vertical line), then here's "why":
First, hist(.) returns a list that includes some meta about the histogram.
set.seed(42)
# test <- ...
h <- hist (test, xlab="T (yrep)", yaxt="n", cex=1,col = "yellow")
str(h)
# List of 6
# $ breaks : int [1:20] 8 10 12 14 16 18 20 22 24 26 ...
# $ counts : int [1:19] 18 70 191 405 812 1154 1554 1545 1358 1084 ...
# $ density : num [1:19] 0.0009 0.0035 0.00955 0.02025 0.0406 ...
# $ mids : num [1:19] 9 11 13 15 17 19 21 23 25 27 ...
# $ xname : chr "test"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
The y-axis is defined off of the $counts variable, so we can see that it goes up to at least 1554.
Another way to see what the axis is doing is with
par("usr")
# [1] 6.48 47.52 -62.16 1616.16
This tells us that the x-axis ranges from 6.48 to 47.52, and the y-axis ranges from -62.16 to 1616.16. (The reason y includes negative values is that by default, R expands the plot by 4% in both directions.) From this, you could know that your line would need to span from 0 (or -62.16 if you wanted to start at the true bottom) to 1616.16 (or around up). This says that our look at h$counts would have ended near the top of the hist bars but not at the top of the plotted region.

Drawing Dendogram using R with Agglomerative hierarchical clustering (AHC) techniques with Complete link method

I have calculated the Distance matrix with the complete link method as shown in the image below:
The pairwise distance betwwen the clusters are
{0.5,1.12,1.5,3.61}
But While implementing with the same matrix in R with the code below:
Matrix
x1,x2,x3,x4,x5
0,0.5,2.24,3.35,3
0.5,0,2.5,3.61,3.04
2.24,2.5,0,1.12,1.41
3.35,3.61,1.12,0,1.5
3,3.04,1.41,1.5,0
Implementation:
library(cluster)
dt<-read.csv("cluster.csv")
df<-scale(dt[-1])
dc<-dist(df,method = "euclidean")
hc1 <- hclust(dc, method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Cluster dendrogram", sub = NULL,
xlab = NULL, ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
str(hc1)
List of 7
$ merge : int [1:4, 1:2] -1 -3 -5 1 -2 -4 2 3
$ height : num [1:4] 0.444 1.516 1.851 3.753
$ order : int [1:5] 1 2 5 3 4
$ labels : NULL
$ method : chr "complete"
$ call : language hclust(d = dc, method = "complete")
$ dist.method: chr "euclidean"
- attr(*, "class")= chr "hclust"
I have got the height as: 0.444 1.516 1.851 3.753
Which means the dendogram will be different in both cases, why is that different in both cases? May be i have done something wrong on the implementing on both ways?
Since the provided matrix is the euclidean distance matrix, so i don't need to calculate the distance matrix: rather i should convert the data.frame to dist.matrix. and to as.dist(m).
The below code will give me the exact result which was obtained from the paper calculation:
library(reshape)
dt<-read.csv("C:/Users/Aakash/Desktop/cluster.csv")
m <- as.matrix(dt)
hc1 <- hclust(as.dist(m), method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Complete Method Dendogram", sub = NULL,
xlab = "Items", ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
height : num [1:4] 0.5 1.12 1.5 3.61
Obtained Dendogram:

Plot from package "lomb" in ggplot2

I am using the package "lomb" to calculate Lomb-Scargle Periodograms, a method for analysing biological time series data. The package does create a plot if you tell it to do so. However, the plots are not too nice (compared to ggplot2 plots). Therefore, I would like to plot the results with ggplot. However, I do not know how to access the function for the curve plotted...
This is a sample code for a plot:
TempDiff <- runif(4033, 3.0, 18) % just generate random numbers
Time2 <- seq(1,4033) % Time vector
Rand.LombScargle <- randlsp(repeats=10, TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = T,
trace = T, xlab="period", main = "Lomb-Scargle Periodogram")
I have also tried to find out something about the function looking into the function randlsp itself, but could not really find anything that seemed useful to me there...
getAnywhere(randlsp)
A single object matching ‘randlsp’ was found
It was found in the following places
package:lomb
namespace:lomb
with value
function (repeats = 1000, x, times = NULL, from = NULL, to = NULL,
type = c("frequency", "period"), ofac = 1, alpha = 0.01,
plot = TRUE, trace = TRUE, ...)
{
if (is.ts(x)) {
x = as.vector(x)
}
if (!is.vector(x)) {
times <- x[, 1]
x <- x[, 2]
}
if (plot == TRUE) {
op <- par(mfrow = c(2, 1))
}
realres <- lsp(x, times, from, to, type, ofac, alpha, plot = plot,
...)
realpeak <- realres$peak
pks <- NULL
if (trace == TRUE)
cat("Repeats: ")
for (i in 1:repeats) {
randx <- sample(x, length(x))
randres <- lsp(randx, times, from, to, type, ofac, alpha,
plot = F)
pks <- c(pks, randres$peak)
if (trace == TRUE) {
if (i/10 == floor(i/10))
cat(i, " ")
}
}
if (trace == TRUE)
cat("\n")
prop <- length(which(pks >= realpeak))
p.value <- prop/repeats
if (plot == TRUE) {
mx = max(c(pks, realpeak)) * 1.25
hist(pks, xlab = "Peak Amplitude", xlim = c(0, mx), main = paste("P-value: ",
p.value))
abline(v = realpeak)
par(op)
}
res = realres[-(8:9)]
res = res[-length(res)]
res$random.peaks = pks
res$repeats = repeats
res$p.value = p.value
class(res) = "randlsp"
return(invisible(res))
Any idea will be appreciated!
Best,
Christine
PS: Here an example of the plot with real data.
The key to getting ggplot graphs out of any returned object is to convert the data that you need in to some sort of data.frame. To do this, you can look at what kind of object your returned value is and see what sort of data you can immediately extract into a data.frame
str(Rand.LombScargle) # get the data type and structure of the returned value
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "times" "x"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ random.peaks: num [1:10] 4.99 9.82 7.03 7.41 5.91 ...
$ repeats : num 10
$ p.value : num 0.3
- attr(*, "class")= chr "randlsp"
in the case of randlsp, its a list, which is usually what is returned from statistical functions. Most of this information can also be obtained from ?randlsp too.
It looks as if Rand.LombScargle$scanned and Rand.LombScargle$power contains most of what is needed for the first graph:
There is also a horizontal line on the Periodogram, but it doesn't correspond to anything that was returned by randlsp. Looking at the source code that you provided, it looks as if the Periodogram is actually generated by lsp().
LombScargle <- lsp( TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = F)
str(LombScargle)
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "Time2" "TempDiff"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ alpha : num 0.01
$ sig.level: num 10.7
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ p.value : num 0.274
- attr(*, "class")= chr "lsp"
I am guessing that, based on this data, the line is indicating the significance level LombScargle$sig.level
Putting this together, we can create our data to pass to ggplot from lsp:
lomb.df <- data.frame(period=LombScargle$scanned, power=LombScargle$power)
# use the data frame to set up the line plot
g <- ggplot(lomb.df, aes(period, power)) + geom_line() +
labs(y="normalised power", title="Lomb-Scargle Periodogram")
# add the sig.level horizontal line
g + geom_hline(yintercept=LombScargle$sig.level, linetype="dashed")
For the histogram, it looks like this is based on the vector Rand.LombScargle$random.peaks from randlsp:
rpeaks.df <- data.frame(peaks=Rand.LombScargle$random.peaks)
ggplot(rpeaks.df, aes(peaks)) +
geom_histogram(binwidth=1, fill="white", colour="black") +
geom_vline(xintercept=Rand.LombScargle$peak, linetype="dashed") +
xlim(c(0,12)) +
labs(title=paste0("P-value: ", Rand.LombScargle$p.value),
x="Peak Amplitude",
y="Frequency")
Play around with these graphs to get them looking to your taste.

Draw 3D plot of two classes according to 3 variables with R

I have an R data.frame:
> str(trainTotal)
'data.frame': 1000 obs. of 41 variables:
$ V1 : num 0.299 -1.174 1.192 1.573 -0.613 ...
$ V2 : num -1.227 0.332 -0.414 -0.58 -0.644 ...
etc.
$ V40 : num 0.101 -1.818 2.987 1.883 0.408 ...
$ Class: int 1 0 0 1 0 1 0 1 1 0 ...
and I would like to draw a 3D scatter plot of Class "0" in blue and Class "1" in red according to V13, V5, and V24.
V13, V5, V24 are the top variables when sorted by scaled variance, so my intuition tells me the 3D visualization could be interesting. Not sure if that makes sense.
How can I plot this with R ?
Edit:
I have tried the following:
install.packages("Rcmdr")
library(Rcmdr)
scatter3d(x=trainTotal[[13]], y= trainTotal[[5]], z= trainTotal[[24]], point.col = as.numeric(as.factor(trainTotal[,41])), size = 10)
which gives me this plot:
I am not sure how to read this plot.
I would prefer to see only dots of two colors, for a start.
Maybe something like this? Using scatterplot3d.
library(scatterplot3d)
#random data
DF <- data.frame(V13 = sample(1:100, 10, T), V5 = sample(1:100, 10, T), V24 = sample(1:100, 10, T), class = sample(0:1, 10, T))
#plot
scatterplot3d(x = DF$V13, y = DF$V5, z = DF$V24, color = c("blue", "red")[as.factor(DF$class)], pch = 19)
This gives:
In scatterplot3d there is also an angle argument for different views.
Perspective issues mean that static 3d plots are mostly horrible and misleading. If you really want a 3d scatterplot, it's best to draw one where you can view it from different angles. The rgl package allows this.
EDIT: I've updated the plot to use colours, in this case picked using the colorspace package, though you can define them however you like. Specifying attributes for points is described on the ?rgl.material help page.
library(rgl)
library(colorspace)
n_points <- 50
n_groups <- 5
some_data <- data.frame(
x = seq(0, 1, length.out = n_points),
y = runif(n_points),
z = rnorm(n_points),
group = gl(n_groups, n_points / n_groups)
)
colors <- rainbow_hcl(n_groups)
with(some_data, points3d(x, y, z, color = colors[group], size = 7))
axes3d()

Resources