I was trying to draw some lines in the same plot. The x factor is determined by a date and the y factor by a number. I initially load the data, store it in a list and save the min and max values for the date:
stocks <- list()
stocks.min <- 0
stocks.max <- 0
stocks.min.date <- NULL
stocks.max.date <- NULL
for (name in names(files))
{
stocks[[name]] <- read.csv(files[[name]], sep=";")
# Convert to date in R
stocks[[name]]$Date <- as.Date(stocks[[name]]$Date, "%d/%m/%Y")
# Sets max value for ylim in the plotting
if (stocks.max < max(stocks[[name]]$Close))
{
stocks.max <- max(stocks[[name]]$Close)
}
# Sets the date value for the xlim in the plot
if (is.null(stocks.min.date) || min(stocks[[name]]$Date) < stocks.min.date)
{
stocks.min.date <- min(stocks[[name]]$Date)
}
if (is.null(stocks.max.date) || max(stocks[[name]]$Date) > stocks.max.date)
{
stocks.max.date <- max(stocks[[name]]$Date)
}
}
After that I create an empty plot using the values from above:
plot(0, xlab="Time", ylab="Closing Prices", main="Stock Values",
xlim=c(stocks.min.date, stocks.max.date), ylim=c(stocks.min, stocks.max))
And then I add the lines with the data:
for (name in names(stocks))
{
lines(x=stocks[[name]]$Date, y=stocks[[name]]$Close, col=colors[[name]], type="l",
lwd=2)
}
When the graph is plotted, the data is correctly displayed, but it shows the date as numbers instead of dates in the x axis as seen in the image below:
How can I correct this issue?
I would strongly suggest using a normalized series to plot the stocks data you have. quantmod helps a lot here. It solves two purposes -
Get the x-axis labels as dates.
Normalize series so that you can view any number of series without worrying about the orders of their absolute values (~67 for INR, ~1120 for KRW, so on...)
This is what I generally use for my purposes.
library(quantmod)
tickers <- c('GOOG', 'MSFT', 'AAPL', 'AMZN')
getSymbols(tickers, src = 'yahoo', from = '2015-01-01')
normalise <- function(x) x/as.numeric(x)[1] - 1
chart_theme <- chart_theme()
chart_theme$col$line.col <- "red"
chart_Series(normalise(Cl(GOOG)), theme = chart_theme)
add_TA(normalise(Cl(MSFT)), on = 1, col = "black", lty = 1)
add_TA(normalise(Cl(AMZN)), on = 1, col = "blue", lty =1)
add_TA(normalise(Cl(AAPL)), on = 1, col = "darkgreen", lty =2)
Hope this helps.
Related
I am generating a landscape pattern that evolves over time. The problem with the code is that I have clearly defined a window for the object bringing up the error but the window is not being recognised. I also do not see how any points are falling outside of the window, or how that would make a difference.
library(spatstat)
library(dplyr)
# Define the window
win <- owin(c(0, 100), c(0, 100))
# Define the point cluster
cluster1 <- rMatClust(kappa = 0.0005, scale = 0.1, mu = 20,
win = win, center = c(5,5))
# define the spread of the points
spread_rate <- 1
new_nests_per_year<-5
years<-10
# Plot the initial cluster
plot(win, main = "Initial cluster")
points(cluster1, pch = 20, col = "red")
newpoints<-list()
# Loop for n years
for (i in 1:years) {
# Generate new points that spread from the cluster
newpoints[[1]] <-rnorm(new_nests_per_year, mean = centroid.owin(cluster1)$y, sd = spread_rate)
newpoints[[2]] <-rnorm(new_nests_per_year, mean = centroid.owin(cluster1)$x, sd = spread_rate)
# Convert the list to a data frame
newpoints_df <- data.frame(newpoints)
# Rename the columns of the data frame
colnames(newpoints_df) <- c("x", "y")
# Combine the new points with the existing points
cluster1_df <- data.frame(cluster1)
newtotaldf<-bind_rows(cluster1_df,newpoints_df)
cluster1<-as.ppp(newtotaldf, x = newtotaldf$x, y = newtotaldf$y,
window = win)
# Plot the updated cluster
plot(win, main = paste("Cluster after year", i))
points(cluster1, pch = 20, col = "red")
}
However, when I run line:
cluster1<-as.ppp(newtotaldf, x = newtotaldf$x, y = newtotaldf$y,
window = win)
I recieve the error:
Error: x,y coords given but no window specified
Why would this be the case?
In your code, if you use the command W = win it should solve the issue. I also believe you can simplify the command without specifying x and y:
## ...[previous code]...
cluster1 <- as.ppp(newtotaldf, W = win)
plot(win)
points(cluster1, pch = 20, col = "red")
I was wondering if anyone knows of a package that allows partial row labeling of heatmaps. I am currently using pheatmap() to construct my heatmaps, but I can use any package that has this functionality.
I have plots with many rows of differentially expressed genes and I would like to label a subset of them. There are two main things to consider (that I can think of):
The placement of the text annotation depends on the height of the row. If the rows are too narrow, then the text label will be ambiguous without some sort of pointer.
If multiple adjacent rows are significant (i.e. will be labelled), then these will need to be offset, and again, a pointer will be needed.
Below is an example of a partial solution that really only gets maybe halfway there, but I hope illustrates what I'd like to be able to do.
set.seed(1)
require(pheatmap)
require(RColorBrewer)
require(grid)
### Data to plot
data_mat <- matrix(sample(1:10000, 300), nrow = 50, ncol = 6)
rownames(data_mat) <- paste0("Gene", 1:50)
colnames(data_mat) <- c(paste0("A", 1:3), paste0("B", 1:3))
### Set how many genes to annotate
### TRUE - make enough labels that some overlap
### FALSE - no overlap
tooMany <- T
### Select a few genes to annotate
if (tooMany) {
sigGenes_v <- paste0("Gene", c(5,20,26,42,47,16,28))
newMain_v <- "Too Many Labels"
} else {
sigGenes_v <- paste0("Gene", c(5,20,26,42))
newMain_v <- "OK Labels"
}
### Make color list
colors_v <- brewer.pal(8, "Dark2")
colors_v <- colors_v[c(1:length(sigGenes_v), 8)]
names(colors_v) <- c(sigGenes_v, "No")
annColors_lsv <- list("Sig" = colors_v)
### Column Metadata
colMeta_df <- data.frame(Treatment = c(rep("A", 3), rep("B", 3)),
Replicate = c(rep(1:3, 2)),
stringsAsFactors = F,
row.names = colnames(data_mat))
### Row metadata
rowMeta_df <- data.frame(Sig = rep("No", 50),
stringsAsFactors = F,
row.names = rownames(data_mat))
for (gene_v in sigGenes_v) rowMeta_df[rownames(rowMeta_df) == gene_v, "Sig"] <- gene_v
### Heatmap
heat <- pheatmap(data_mat,
annotation_row = rowMeta_df,
annotation_col = colMeta_df,
annotation_colors = annColors_lsv,
cellwidth = 10,
main = "Original Heat")
### Get order of genes after clustering
genesInHeatOrder_v <- heat$tree_row$labels[heat$tree_row$order]
whichSigInHeatOrder_v <- which(genesInHeatOrder_v %in% sigGenes_v)
whichSigInHeatOrderLabels_v <- genesInHeatOrder_v[whichSigInHeatOrder_v]
sigY <- 1 - (0.02 * whichSigInHeatOrder_v)
### Change title
whichMainGrob_v <- which(heat$gtable$layout$name == "main")
heat$gtable$grobs[[whichMainGrob_v]] <- textGrob(label = newMain_v,
gp = gpar(fontsize = 16))
### Remove rows
whichRowGrob_v <- which(heat$gtable$layout$name == "row_names")
heat$gtable$grobs[[whichRowGrob_v]] <- textGrob(label = whichSigInHeatOrderLabels_v,
y = sigY,
vjust = 1)
grid.newpage()
grid.draw(heat)
Here are a few outputs:
original heatmap:
ok labels:
ok labels, with flags:
too many labels
too many labels, with flags
The "with flags" outputs are the desired final results.
I just saved these as images from the Rstudio plot viewer. I recognize that I could save them as pdfs and provide a larger file size to get rid of the label overlap, but then the individual cells would be larger than I want.
Based on your code, you seem fairly comfortable with gtables & grobs. A (relatively) straightforward way to achieve the look you want is to zoom in on the row label grob, & make some changes there:
replace unwanted labels with "";
evenly spread out labels within the available space;
add line segments joining the old and new label positions.
I wrote a wrapper function for this, which works as follows:
# heat refers to the original heatmap produced from the pheatmap() function
# kept.labels should be a vector of labels you wish to show
# repel.degree is a number in the range [0, 1], controlling how much the
# labels are spread out from one another
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0.5)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 1)
Function (explanations in annotations):
add.flag <- function(pheatmap,
kept.labels,
repel.degree) {
# repel.degree = number within [0, 1], which controls how much
# space to allocate for repelling labels.
## repel.degree = 0: spread out labels over existing range of kept labels
## repel.degree = 1: spread out labels over the full y-axis
heatmap <- pheatmap$gtable
new.label <- heatmap$grobs[[which(heatmap$layout$name == "row_names")]]
# keep only labels in kept.labels, replace the rest with ""
new.label$label <- ifelse(new.label$label %in% kept.labels,
new.label$label, "")
# calculate evenly spaced out y-axis positions
repelled.y <- function(d, d.select, k = repel.degree){
# d = vector of distances for labels
# d.select = vector of T/F for which labels are significant
# recursive function to get current label positions
# (note the unit is "npc" for all components of each distance)
strip.npc <- function(dd){
if(!"unit.arithmetic" %in% class(dd)) {
return(as.numeric(dd))
}
d1 <- strip.npc(dd$arg1)
d2 <- strip.npc(dd$arg2)
fn <- dd$fname
return(lazyeval::lazy_eval(paste(d1, fn, d2)))
}
full.range <- sapply(seq_along(d), function(i) strip.npc(d[i]))
selected.range <- sapply(seq_along(d[d.select]), function(i) strip.npc(d[d.select][i]))
return(unit(seq(from = max(selected.range) + k*(max(full.range) - max(selected.range)),
to = min(selected.range) - k*(min(selected.range) - min(full.range)),
length.out = sum(d.select)),
"npc"))
}
new.y.positions <- repelled.y(new.label$y,
d.select = new.label$label != "")
new.flag <- segmentsGrob(x0 = new.label$x,
x1 = new.label$x + unit(0.15, "npc"),
y0 = new.label$y[new.label$label != ""],
y1 = new.y.positions)
# shift position for selected labels
new.label$x <- new.label$x + unit(0.2, "npc")
new.label$y[new.label$label != ""] <- new.y.positions
# add flag to heatmap
heatmap <- gtable::gtable_add_grob(x = heatmap,
grobs = new.flag,
t = 4,
l = 4
)
# replace label positions in heatmap
heatmap$grobs[[which(heatmap$layout$name == "row_names")]] <- new.label
# plot result
grid.newpage()
grid.draw(heatmap)
# return a copy of the heatmap invisibly
invisible(heatmap)
}
I have made a loop for making multiply plots, however i have no way of saving them, my code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd()
#### Read data & Converting factors ####
dat <- read.table("Complete RING.txt", header =TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$Fly <- as.factor(dat$Fly)
dat$Temp <- as.factor(dat$Temp)
str(dat)
datSUM <- summaryBy(X0.5_sec+X1_sec+X1.5_sec+X2_sec+X2.5_sec+X3_sec~Vial_nr+Concentration+Sex+Line+Vial+Temp,data=dat, FUN=sum)
fl<-levels(datSUM$Line)
colors = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
meltet <- melt(datSUM, id=c("Concentration","Sex","Line","Vial", "Temp", "Vial_nr"))
levels(meltet$variable) <- c('0,5 sec', '1 sec', '1,5 sec', '2 sec', '2,5 sec', '3 sec')
meltet20 <- subset(meltet, Line=="20")
meltet20$variable <- as.factor(meltet20$variable)
AllConcentrations <- levels(meltet20$Concentration)
for (i in AllConcentrations) {
meltet.i <- meltet20[meltet20$Concentration ==i,]
quartz()
print(dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))) }
I have tried with the quartz.save function, but that just overwrites the files. Im using a mac if that makes any difference.
When I want to save multiple plots in a loop I tend to do something like...
for(i in AllConcentrations){
meltet.i <- meltet20[meltet20$Concentration ==i,]
pdf(paste("my_filename", i, ".pdf", sep = ""))
dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))
dev.off()
}
This will create a pdf file for every level in AllConcentrations and save it in your working directory. It will paste together my_filename, the number of the iteration i, and then .pdf together to make each file unique. Of course, you will want to adjust height and width in the pdf function.
This question already has answers here:
Shading a kernel density plot between two points.
(5 answers)
Closed 7 years ago.
I've written code to plot density data for variations of an A/B test. I'd like to improve the visual by shading (with the fill being slightly transparent) the area below each curve. I'm currently using matplot, but understand ggplot might be a better option.
Any ideas? Thanks.
# Setup data frame - these are results from an A/B experiment
conv_data = data.frame(
VarNames = c("Variation 1", "Variation 2", "Variation 3") # Set variation names
,NumSuccess = c(1,90,899) # Set number of successes / conversions
,NumTrials = c(10,100,1070) # Set number of trials
)
conv_data$NumFailures = conv_data$NumTrials - conv_data$NumSuccess # Set number of failures [no conversions]
num_var = NROW(conv_data) # Set total number of variations
plot_col = rainbow(num_var) # Set plot colors
get_density_data <- function(n_var, s, f) {
x = seq(0,1,length.out=100) # 0.01,0.02,0.03...1
dens_data = matrix(data = NA, nrow=length(x), ncol=(n_var+1))
dens_data[,1] = x
# set density data
for(j in 1:n_var) {
# +1 to s[], f[] to ensure uniform prior
dens_data[,j+1] = dbeta(x, s[j]+1, f[j]+1)
}
return(dens_data)
}
density_data = get_density_data(num_var, conv_data$NumSuccess, conv_data$NumFailures)
matplot(density_data[,1]*100, density_data[,-1], type = "l", lty = 1, col = plot_col, ylab = "Probability Density", xlab = "Conversion Rate %", yaxt = "n")
legend("topleft", col=plot_col, legend = conv_data$VarNames, lwd = 1)
This produces the following plot:
# Setup data frame - these are results from an A/B experiment
conv_data = data.frame(
VarNames = c("Variation 1", "Variation 2", "Variation 3") # Set variation names
,NumSuccess = c(1,90,899) # Set number of successes / conversions
,NumTrials = c(10,100,1070) # Set number of trials
)
conv_data$NumFailures = conv_data$NumTrials - conv_data$NumSuccess # Set number of failures [no conversions]
num_var = NROW(conv_data) # Set total number of variations
plot_col = rainbow(num_var) # Set plot colors
get_density_data <- function(n_var, s, f) {
x = seq(0,1,length.out=100) # 0.01,0.02,0.03...1
dens_data = matrix(data = NA, nrow=length(x), ncol=(n_var+1))
dens_data[,1] = x
# set density data
for(j in 1:n_var) {
# +1 to s[], f[] to ensure uniform prior
dens_data[,j+1] = dbeta(x, s[j]+1, f[j]+1)
}
return(dens_data)
}
density_data = get_density_data(num_var, conv_data$NumSuccess, conv_data$NumFailures)
matplot(density_data[,1]*100, density_data[,-1], type = "l",
lty = 1, col = plot_col, ylab = "Probability Density",
xlab = "Conversion Rate %", yaxt = "n")
legend("topleft", col=plot_col, legend = conv_data$VarNames, lwd = 1)
## and add this part
for (ii in seq_along(plot_col))
polygon(c(density_data[, 1] * 100, rev(density_data[, 1] * 100)),
c(density_data[, ii + 1], rep(0, nrow(density_data))),
col = adjustcolor(plot_col[ii], alpha.f = .25))
Was able to answer own question with:
df = as.data.frame(t(conversion_data))
dfs = stack(df)
ggplot(dfs, aes(x=values)) + geom_density(aes(group=ind, colour=ind, fill=ind), alpha=0.3)
I found a nice tutorial of self organizing map clustering in R in which, it is explained how to display your input data in the unit space (see below). In order to set up some rules for the labeling, I would like to compute the probability of each class in each neuron and plot it. Computing the probability is rather easy: take for each unit the number of observations of class i and divide it by the total number of observations in this unit. I end up with data.frame pc. Now I struggle to map this result, any clue on how to do it?
library(kohonen)
data(yeast)
set.seed(7)
yeast.supersom <- supersom(yeast, somgrid(8, 8, "hexagonal"),whatmap = 3:6)
classes <- levels(yeast$class)
colors <- c("yellow", "green", "blue", "red", "orange")
par(mfrow = c(3, 2))
plot(yeast.supersom, type = "mapping",pch = 1, main = "All", keepMargins = TRUE,bgcol = gray(0.85))
library(plyr)
pc <- data.frame(Var1=c(1:64))
for (i in seq(along = classes)) {
X.class <- lapply(yeast, function(x) subset(x, yeast$class == classes[i]))
X.map <- map(yeast.supersom, X.class)
plot(yeast.supersom, type = "mapping", classif = X.map,
col = colors[i], pch = 1, main = classes[i], add=TRUE)
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
}
OKay guys here is a solution:
Once I have computed the probability, it derives a color code from a defined gradient (rbPal). The gradient is defined by a upper and a lower bound and the shade of the colors are proportional to their interval. THis is done with the function findInterval.
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
rbPal <- colorRampPalette(c('blue','yellow','red'))
plot(yeast.supersom, type="mapping", bgcol = rbPal((100))[(findInterval(pc[,which(colnames(pc)==as.character(classes[i]))], seq(0:100))+1)], main = paste("Probabily Clusters:", classes[i]))