I am totally new to R.
I have expression profile data which is preprocessed and combined. Looks like this ("exp.txt")
STUDY_1_CANCER_1 STUDY_1_CON_1 STUDY_2_CANCER_1 STUDY_2_CANCER_2
P53 1.111 1.22 1.3 1.4
.....
Also, I created phenotype data. Looks lite this ("pheno.txt")
Sample Disease Study
STUDY_1_CANCER_1 Cancer GSE1
STUDY_1_CON_1 Normal GSE1
STUDY_2_CANCER_1 Cancer GSE2
STUDY_2_CON_1 Normal GSE2
Here, I tried to make MDS plot using classical cmdscale command like this.
data=read.table("exp.txt", row.names=1, header=T)
DATA=as.matrix(data)
pc=cor(DATA, method="p")
mds=cmdscale(as.dist(1-pc),2)
plot(mds)
I'd like to create plot like this figure with color double-labeling (Study and Disease). How should I do?
First create an empty plot, then add the points with specified colors/shapes.
Here's an example:
require(vegan)
data(dune)
data(dune.env)
mds <- cmdscale(vegdist(dune, method='bray'))
# set colors and shapes
cols = c('red', 'blue', 'black', 'steelblue')
shps = c(15, 16, 17)
# empty plot
plot(mds, type = 'n')
# add points
points(mds, col = cols[dune.env$Management], pch = shps[dune.env$Use])
# add legend
legend('topright', col=cols, legend=levels(dune.env$Management), pch = 16, cex = 0.7)
legend('bottomright', legend=levels(dune.env$Use), pch = shps, cex = 0.7)
Note that factors are internally coded as integers, which is helpful here.
> levels(dune.env$Management)
[1] "BF" "HF" "NM" "SF"
so
cols[dune.env$Management]
will take the first entry of cols for the first factor levels. Similariy for the different shapes.
Finally add the legend. Of course this plot still needs some polishing, but thats the way to go...
BTW: Gavin Simpson has a nice blogpost about customizing ordination plots.
Actually, you can do this directly in default plot command which can take pch and col arguments as vectors. Use:
with(data, plot(mds, col = as.numeric(Study), pch = as.numeric(Disease), asp = 1)
You must use asp = 1 when you plot cmdscale results: both axes must be scaled similarly. You can also add xlab and ylab arguments for nicer axis labels. For adding legend and selecting plotting characters and colours, see other responses.
Related
I'm trying to produce a cumulative incidence plot for a competing hazards survival analysis using plot() in R. For some reason, the plot that is produced has a legend that I have not called. The legend is intersecting with the lines on my graph and I can't figure out how to get rid of it. Please help!
My code is as follows:
CompRisk2 <- cuminc(ftime=ADI$time_DeathTxCensor, fstatus=ADI$status, group=ADI$natADI_quart)
cols <- c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4")
par(bg="white")
plot(CompRisk2,
col=cols,
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
Which produces the following plot:
I tried adding the following code to move the legend out of the frame, but I got an error:
legend(0,5, legend=c(11,21,31,41,12,22,32,42),
col=c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4"),
lty=1:2, cex=0.8, text.font=4, box.lty=0)
Error: Error in title(...) : invalid graphics parameter
Any help would be much appreciated!
You are using the cuminc function from the cmprsk package. This produces an object of class cuminc, which has an S3 plot method. ?plot.cuminc shows you the documentation and typing plot.cuminc shows you the code.
There is some slightly obscure code that suggests a workaround:
u <- list(...)
if (length(u) > 0) {
i <- pmatch(names(u), names(formals(legend)), 0)
do.call("legend", c(list(x = wh[1], y = wh[2], legend = curvlab,
col = color, lty = lty, lwd = lwd, bty = "n", bg = -999999),
u[i > 0]))
}
This says that any additional arguments passed in ... whose names match the names of arguments to legend will be passed to legend(). legend() has a plot argument:
plot: logical. If ‘FALSE’, nothing is plotted but the sizes are returned.
So it looks like adding plot=FALSE to your plot() command will work.
In principle you could try looking at the other arguments to legend() and see if any of them will adjust the legend position/size as you want. Unfortunately the x argument to legend (which would determine the horizontal position) is masked by the first argument to plot.cuminc.
I don't think that the ellipsis arguments are intended for the legend call inside plot.cuminc. The code offered in Ben's answer suggests that there might be a wh argument that determines the location of the legend. It is not named within the parameters as "x" in the code he offered, but is rather given as a positionally-defined argument. If you look at the plot.cuminc function you do in fact find that wh is documented.
I cannot test this because you have not offered us access to the ADI-object but my suggestion would be to try:
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CompRisk2,
col=cols, wh=c(-.5, 7),
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
par(opar) # restores original graphics parameters
It's always a bit risky to put out a code chunk without testing, but I'm happy to report that I did find a suitable test and it seems to work reasonably as predicted. Using the code below on the object in the SO question prior question about using the gg-packages for cmprsk:
library(cmprsk)
# some simulated data to get started
comp.risk.data <- data.frame("tfs.days" = rweibull(n = 100, shape = 1, scale = 1)*100,
"status.tfs" = c(sample(c(0,1,1,1,1,2), size=50, replace=T)),
"Typing" = sample(c("A","B","C","D"), size=50, replace=T))
# fitting a competing risks model
CR <- cuminc(ftime = comp.risk.data$tfs.days,
fstatus = comp.risk.data$status.tfs,
cencode = 0,
group = comp.risk.data$Typing)
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CR,
wh=c(-15, 1.1), # obviously different than the OP's coordinates
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,400),
ylim=c(0,1))
par(opar) # restores graphics parameters
I get the legend to move up and leftward from its original position.
I´m recently trying to analyse my data and want to make the graphs a little nicer but I´m failing at this.
So I have a data set with 144 sites and 5 environmental variables. It´s basically about the substrate composition around an island and the fish abundance. On this island there is supposed to be a difference in the substrate composition between the north and the southside. Right now I am doing a pca and with the biplot function it works quite fine, but I would like to change the plot a bit.
I need one where the sites are just points and not numbered, arrows point to the different variable and the sites are colored according to their location (north or southside). So I tried everything i could find.
Most examples where with the dune data and suggested something like this:
library(vegan)
library(biplot)
data(dune)
mod <- rda(dune, scale = TRUE)
biplot(mod, scaling = 3, type = c("text", "points"))
So according to this I would just need to say text and points and R would label the variables and just make points for the sites. When i do this, however I get the Error:
Error in plot.default(x, type = "n", xlim = xlim, ylim = ylim, col = col[1L], :
formal argument "type" matched by multiple actual arguments
No idea how to get around this.
So next strategy I found, is to make a plot manually like this:
require("vegan")
data(dune, dune.env)
mod <- rda(dune, scale = TRUE)
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod,display="species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("bottomright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
This works fine so far as well, I get different colors and points, but now the arrows are missing. So I found that this should be corrected easy, if i just put "display="bp"" in the text line. But this doesn´t work either. Everytime I put "bp" R says:
Error in match.arg(display) :
argument "display" is missing, with no default
So I´m kind of desperate now. I looked through all the answers here and I don´t understand why display="bp" and type=c("text","points") is not working for me.
If anyone has an idea i would be super grateful.
https://www.dropbox.com/sh/y8xzq0bs6mus727/AADmasrXxUp6JTTHN5Gr9eufa?dl=0
This is the link to my dropbox folder. It contains my R-script and the csv files. The one named environmentalvariables_Kon1 also contains the data about north and southside.
So yeah...if anyone could help me. That would be awesome. I really don´t know what to do anymore.
Best regards,
Nancy
You can add arrows with arrows(). See the code for vegan:::biplot.rda to see how it works in the original function.
With your plot, add
g <- scores(mod, display = "species")
len <- 1
arrows(0, 0, len * g[, 1], len * g[, 2], length = 0.05, col = "darkcyan")
You might want to adjust the value of len to make the arrows longer
I have data with the following columns: lot, sublot, size, data. I have multiple lot(s) and each lot can have multiple sublot(s). Each sublot has size(s) of 1 to 4.
I have created a boxplot for this data using the following code:
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
x11()
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
title(xlab='Size.Sublot.Lot', line=9)
I wanted to use the boxfill command to color each boxplot based on the lot#. I have seen two solutions:
create a vector and explicitly specify the colors to be used e.g. colr = c("red", "red", "red", .... "green", "green", "green", ... "blue"). The problem with this solution is that it requires me to know apriori the number of lots in df and number of times the color needs to be repeated.
use "ifelse" statement. The problem with this solution is that (a) I need to know the number of lots and (b) I need to create multiple nested ifelse statements.
I would prefer to create a "dynamic" solution which creates the color vector based on the number of lot entries I have in my file.
I have tried to create:
uniqlot <- unique(df$lot)
colr <- palette(rainbow(length(uniqlot)))
but am stuck since the entries in the colr vector do not repeat for the number of unique combinations of size.sublot.lot. Note: I want all boxplots for lot ABC to be colored with one color, all boxplots for lot DEF to be colored with another color etc.
I am attaching a picture of the uncolored boxplot. Uncolored Boxplot
Raw data (example.xlsx) can be accessed at the following link:
example.xlsx
This is what I would do:
n1 <- length(unique(df$sublot))
n2 <- length(unique(df$size))
colr <- palette(rainbow(length(n)))
colr <- rep(colr, each = n1*n2)
boxplot(data ~ size*sublot*lot,
col = colr,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
las=2,
data=df)
Using ggplot:
df$size <- as.factor(df$size)
ggplot(df, aes(sublot, data, group = interaction(size, sublot), col = size)) +
geom_boxplot() +
facet_wrap(~lot, nrow = 1)
Also, you can get rid of df$size <- as.factor(df$size) if you want continuous colour.
thanks to the pointers provided in the responses and after digging around a little more, I was able to find a solution to my own question. I wanted to submit this piece of code in case someone needed to replicate.
Here is a picture of the boxplot this code creates (and I wanted to create). colored boxplot
df <-
readXL("Z:/R_Files/example.xlsx",
rownames=FALSE, header=TRUE, na="", sheet="Sheet1",
stringsAsFactors=TRUE)
unqlot <- unique(df$lot)
unqsublot <- unique(df$sublot)
unqsize <- unique(df$size)
cul <- palette(rainbow(length(unqlot)))
culur <- character()
for (i in 1:length(unqsize)) {
culur_temp = rep(cul[i], each=(length(unqsize)*length(unqsublot)))
culur = c(culur, culur_temp)
}
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
col = culur,
las=2,
data=df)
I create a UK map representing some info by downloading an Spatial Polygons Data Frame from GADM.org and the following script.
lat<-c(51.5163,52.4847,51.4544,53.5933,51.481389,51.367778,55.953056,55.864167,51.482778)
lon<-c(-0.061389,-1.89,-2.587778,-2.296389,-3.178889,-0.07,-3.188056,-4.251667,-0.388056)
fr<-c(0.004278509,0.004111901,0.004150415,0.00421649,0.004221205,0.004191472,0.004507773,0.004314193,0.004098154)
uk<-data.frame(cbind(lat,lon,fr))
plotvar<-uk$fr
nclr<-4
plotclr <- brewer.pal(nclr,"Blues")
max.symbol.size=6
min.symbol.size=1
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)
symbol.size <- ((plotvar-min(plotvar))/
(max(plotvar)-min(plotvar))*(max.symbol.size-min.symbol.size)
+min.symbol.size)
windows()
par(mai=c(0,0,0,0))
plot(UnK, col = 'lightgrey', border = 'darkgrey',xlim=c(-6,0),ylim=c(50,60)) #Unk is the map downloaded from GADM
points(uk$lon, uk$lat, col=2, pch=18)
points(uk$lon, uk$lat, pch=16, col=colcode, cex=symbol.size)
points(uk$lon, uk$lat, cex = symbol.size)
text(-120, 46.5, "Area: Frho")
legend(locator(1), legend=names(attr(colcode, "table")),
fill=attr(colcode, "palette"), cex=1, bty="n")
The following figure is the outcome of the above script.
Now, my problem is that I'm not happy with the colors and the breaks of the variable uk$fr. I need to change then in order to be able to compare this map with others, but I dont know how to do the following. My intention is to break this variable in 3 different classes like this (0-0.0125],(0.0125-0.0625],(0.0625-0.125]. And represent this classes by "Blues" and by different sizes circles. Also I want to force the legend to include these three classes.
One last question, how can I put title to the legend?
Thanks.
I can not figure out how the lattice levelplot works. I have played with this now for some time, but could not find reasonable solution.
Sample data:
Data <- data.frame(x=seq(0,20,1),y=runif(21,0,1))
Data.mat <- data.matrix(Data)
Plot with levelplot:
rgb.palette <- colorRampPalette(c("darkgreen","yellow", "red"), space = "rgb")
levelplot(Data.mat, main="", xlab="Time", ylab="", col.regions=rgb.palette(100),
cuts=100, at=seq(0,1,0.1), ylim=c(0,2), scales=list(y=list(at=NULL)))
This is the outcome:
Since, I do not understand how this levelplot really works, I can not make it work. What I would like to have is the colour strips to fill the whole window of the corresponding x (Time).
Alternative solution with other method.
Basically, I'm trying here to plot the increasing risk over time, where the red is the highest risk = 1. I would like to visualize the sequence of possible increase or clustering risk over time.
From ?levelplot we're told that if the first argument is a matrix then "'x' provides the
'z' vector described above, while its rows and columns are
interpreted as the 'x' and 'y' vectors respectively.", so
> m = Data.mat[, 2, drop=FALSE]
> dim(m)
[1] 21 1
> levelplot(m)
plots a levelplot with 21 columns and 1 row, where the levels are determined by the values in m. The formula interface might look like
> df <- data.frame(x=1, y=1:21, z=runif(21))
> levelplot(z ~ y + x, df)
(these approaches do not quite result in the same image).
Unfortunately I don't know much about lattice, but I noted your "Alternative solution with other method", so may I suggest another possibility:
library(plotrix)
color2D.matplot(t(Data[ , 2]), show.legend = TRUE, extremes = c("yellow", "red"))
Heaps of things to do to make it prettier. Still, a start. Of course it is important to consider the breaks in your time variable. In this very simple attempt, regular intervals are implicitly assumed, which happens to be the case in your example.
Update
Following the advice in the 'Details' section in ?color2D.matplot: "The user will have to adjust the plot device dimensions to get regular squares or hexagons, especially when the matrix is not square". Well, well, quite ugly solution.
par(mar = c(5.1, 4.1, 0, 2.1))
windows(width = 10, height = 2.5)
color2D.matplot(t(Data[ , 2]),
show.legend = TRUE,
axes = TRUE,
xlab = "",
ylab = "",
extremes = c("yellow", "red"))