Legend colours to call on dictionary R - r

I have a plot which plots points with a particular symbol and color. I want my legend to show the exact same colors and symbols as those in the plot. I can do this manually, but I have over 50 plots to generate and data is going to be conually updated so I would like to automate the process. I tried to create a dictionary and wanted to search the dictionary. If the value was found in levels(Color_test), then color the symbol in legend the same as outlined in the dictionary.
My legend code is as follows:
legend(legend_X, legend_Y,
xjust=x_adj, yjust=y_adj,
levels(Color_test),
col=Labels.col,
pch=Labels.sym,
horiz=FALSE)

May be what you are looking for is some kind of merging your data with the dictionary. Here is how it is done with only colors as it is just an example
data <- data.frame(type = sample(letters[1:3],20,replace=T),
x = runif(20),
y = runif(20))
dict <- data.frame(type = letters[1:4],
color = c("red","green","blue","black"))
plot(data$x, data$y, col = merge(data,dict)$color)
legend("topleft",legend=dict$type, col=dict$color, pch=1)
Easily you can modify the legend so that is justs displays the actually used colors.
data_dict <- merge(data,dict)
plot(y~x, col=color, data=data_dict, pch=as.vector(type))
legend("topleft",legend=unique(data_dict$type), col=unique(data_dict$color), pch=1)

Related

How to combine state distribution plot and separate legend in traminer?

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.
The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.
AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

Plot a table with box size changing

Does anyone have an idea how is this kind of chart plotted? It seems like heat map. However, instead of using color, size of each cell is used to indicate the magnitude. I want to plot a figure like this but I don't know how to realize it. Can this be done in R or Matlab?
Try scatter:
scatter(x,y,sz,c,'s','filled');
where x and y are the positions of each square, sz is the size (must be a vector of the same length as x and y), and c is a 3xlength(x) matrix with the color value for each entry. The labels for the plot can be input with set(gcf,properties) or xticklabels:
X=30;
Y=10;
[x,y]=meshgrid(1:X,1:Y);
x=reshape(x,[size(x,1)*size(x,2) 1]);
y=reshape(y,[size(y,1)*size(y,2) 1]);
sz=50;
sz=sz*(1+rand(size(x)));
c=[1*ones(length(x),1) repmat(rand(size(x)),[1 2])];
scatter(x,y,sz,c,'s','filled');
xlab={'ACC';'BLCA';etc}
xticks(1:X)
xticklabels(xlab)
set(get(gca,'XLabel'),'Rotation',90);
ylab={'RAPGEB6';etc}
yticks(1:Y)
yticklabels(ylab)
EDIT: yticks & co are only available for >R2016b, if you don't have a newer version you should use set instead:
set(gca,'XTick',1:X,'XTickLabel',xlab,'XTickLabelRotation',90) %rotation only available for >R2014b
set(gca,'YTick',1:Y,'YTickLabel',ylab)
in R, you should use ggplot2 that allows you to map your values (gene expression in your case?) onto the size variable. Here, I did a simulation that resembles your data structure:
my_data <- matrix(rnorm(8*26,mean=0,sd=1), nrow=8, ncol=26,
dimnames = list(paste0("gene",1:8), LETTERS))
Then, you can process the data frame to be ready for ggplot2 data visualization:
library(reshape)
dat_m <- melt(my_data, varnames = c("gene", "cancer"))
Now, use ggplot2::geom_tile() to map the values onto the size variable. You may update additional features of the plot.
library(ggplot2)
ggplot(data=dat_m, aes(cancer, gene)) +
geom_tile(aes(size=value, fill="red"), color="white") +
scale_fill_discrete(guide=FALSE) + ##hide scale
scale_size_continuous(guide=FALSE) ##hide another scale
In R, corrplotpackage can be used. Specifically, you have to use method = 'square' when creating the plot.
Try this as an example:
library(corrplot)
corrplot(cor(mtcars), method = 'square', col = 'red')

Dynamically coloring boxplot in R

I have data with the following columns: lot, sublot, size, data. I have multiple lot(s) and each lot can have multiple sublot(s). Each sublot has size(s) of 1 to 4.
I have created a boxplot for this data using the following code:
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
x11()
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
title(xlab='Size.Sublot.Lot', line=9)
I wanted to use the boxfill command to color each boxplot based on the lot#. I have seen two solutions:
create a vector and explicitly specify the colors to be used e.g. colr = c("red", "red", "red", .... "green", "green", "green", ... "blue"). The problem with this solution is that it requires me to know apriori the number of lots in df and number of times the color needs to be repeated.
use "ifelse" statement. The problem with this solution is that (a) I need to know the number of lots and (b) I need to create multiple nested ifelse statements.
I would prefer to create a "dynamic" solution which creates the color vector based on the number of lot entries I have in my file.
I have tried to create:
uniqlot <- unique(df$lot)
colr <- palette(rainbow(length(uniqlot)))
but am stuck since the entries in the colr vector do not repeat for the number of unique combinations of size.sublot.lot. Note: I want all boxplots for lot ABC to be colored with one color, all boxplots for lot DEF to be colored with another color etc.
I am attaching a picture of the uncolored boxplot. Uncolored Boxplot
Raw data (example.xlsx) can be accessed at the following link:
example.xlsx
This is what I would do:
n1 <- length(unique(df$sublot))
n2 <- length(unique(df$size))
colr <- palette(rainbow(length(n)))
colr <- rep(colr, each = n1*n2)
boxplot(data ~ size*sublot*lot,
col = colr,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
Using ggplot:
df$size <- as.factor(df$size)
ggplot(df, aes(sublot, data, group = interaction(size, sublot), col = size)) +
geom_boxplot() +
facet_wrap(~lot, nrow = 1)
Also, you can get rid of df$size <- as.factor(df$size) if you want continuous colour.
thanks to the pointers provided in the responses and after digging around a little more, I was able to find a solution to my own question. I wanted to submit this piece of code in case someone needed to replicate.
Here is a picture of the boxplot this code creates (and I wanted to create). colored boxplot
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
unqlot <- unique(df$lot)
unqsublot <- unique(df$sublot)
unqsize <- unique(df$size)
cul <- palette(rainbow(length(unqlot)))
culur <- character()
for (i in 1:length(unqsize)) {
culur_temp = rep(cul[i], each=(length(unqsize)*length(unqsublot)))
culur = c(culur, culur_temp)
}
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
col = culur,
las=2,
data=df)

Changing descriptive statistics parameters in a map. R

I create a UK map representing some info by downloading an Spatial Polygons Data Frame from GADM.org and the following script.
lat<-c(51.5163,52.4847,51.4544,53.5933,51.481389,51.367778,55.953056,55.864167,51.482778)
lon<-c(-0.061389,-1.89,-2.587778,-2.296389,-3.178889,-0.07,-3.188056,-4.251667,-0.388056)
fr<-c(0.004278509,0.004111901,0.004150415,0.00421649,0.004221205,0.004191472,0.004507773,0.004314193,0.004098154)
uk<-data.frame(cbind(lat,lon,fr))
plotvar<-uk$fr
nclr<-4
plotclr <- brewer.pal(nclr,"Blues")
max.symbol.size=6
min.symbol.size=1
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)
symbol.size <- ((plotvar-min(plotvar))/
(max(plotvar)-min(plotvar))*(max.symbol.size-min.symbol.size)
+min.symbol.size)
windows()
par(mai=c(0,0,0,0))
plot(UnK, col = 'lightgrey', border = 'darkgrey',xlim=c(-6,0),ylim=c(50,60)) #Unk is the map downloaded from GADM
points(uk$lon, uk$lat, col=2, pch=18)
points(uk$lon, uk$lat, pch=16, col=colcode, cex=symbol.size)
points(uk$lon, uk$lat, cex = symbol.size)
text(-120, 46.5, "Area: Frho")
legend(locator(1), legend=names(attr(colcode, "table")),
fill=attr(colcode, "palette"), cex=1, bty="n")
The following figure is the outcome of the above script.
Now, my problem is that I'm not happy with the colors and the breaks of the variable uk$fr. I need to change then in order to be able to compare this map with others, but I dont know how to do the following. My intention is to break this variable in 3 different classes like this (0-0.0125],(0.0125-0.0625],(0.0625-0.125]. And represent this classes by "Blues" and by different sizes circles. Also I want to force the legend to include these three classes.
One last question, how can I put title to the legend?
Thanks.

Changing the legend in chartSeries to display values - Quantmod addTA

Is it possible to change the legend on the plot displayed in Quantmod so that values are displayed rather than the variable name? For example:
library("quantmod")
getSymbols("YHOO")
temp1 <- 6
temp2 <- "SMA"
barChart(YHOO)
addTA(ADX(YHOO, n=temp1, maType=temp2))
The legend that is displayed in the plot is ADX(YHOO, n=temp1, maType=temp2). I would like it to display the specific values instead i.e. ADX(YHOO, n=6, maType='SMA').
There isn't a way to do this automatically with addTA, because it would need to know which of the the parameters of the TA call it needs to evaluate. But you can do it manually by setting the legend= argument yourself.
One way to do it is to use paste (or paste0).
barChart(YHOO)
Legend <- paste0('ADX(YHOO, n=',temp1,', maType=',temp2,')')
addTA(ADX(YHOO, n=temp1, maType=temp2), legend=Legend)
Or you could create and manipulate the call to get what you want.
barChart(YHOO)
callTA <- call("ADX",quote(YHOO),n=temp1,maType=temp2)
eval(call("addTA", callTA, legend=deparse(callTA)))
The following is a partial solution which displays the values rather than variable names in the legend as well as the relevant output values for the TA. However, unlike the default settings of addTA, the text for each output value doesn't match the colour of the line on the addTA plot. Unfortunately I haven't worked out how to get the text of the output values to match the colour of its relevant line on the addTA plot. Any suggestions?
library("quantmod")
getSymbols("YHOO")
barChart(YHOO, subset="last 4 months")
col <- c("red", "blue", "green", "orange")
temp1 <- 8
temp2 <- "SMA"
temp <- ADX(HLC(YHOO), n=temp1, maType=temp2)
legend <- rep(NA, NCOL(temp)+1)
legend[1] <- paste("ADX(HLC(YHOO)", "n=", temp1, "maType=", temp2)
for(x in 2:(NCOL(temp)+1)){
legend[x] <- (paste(colnames(temp[,(x-1)]),": ", round(last(temp[,(x-1)]),3), sep=""))
}
addTA(temp, legend = legend, col=col)

Resources