Concatenate a list for plot labels - r

I want to create intervals (discretize/bin) of continuous variables to plot a choropleth map using ggplot. After reading various threads, I decided to use cut and quantile to eliminate the problems of: a) manually creating bins, and b) taking care of dominant states (otherwise, I had to manually to create bins and see the map and readjust the bins).
However, I am facing another problem now. Intervals coming out of cut are hardly pretty. So, I am trying to follow this example and this example to come up with my pretty labels.
Here is my list:
x <- seq(1,50)
Rounded quantiles:
qs_x <- round(quantile(x, probs=c(seq(0,0.8,by=0.2),0.9)))
which results:
0% 20% 40% 60% 80% 90%
1 11 21 30 40 45
Using these cuts, I want to come up with these labels:
1-11, 12-21, 22-30, 31-40, 41-45, 45+
I am sure there is an easy solution to convert a list using some apply function, but I am not well-versed with those functions.
Help appreciated.

A 3-liner produces the output you want, without using apply.
labels <- paste(qs_x+1, qs_x[-1], sep="-")
labels[1] <- paste(qs_x[1], qs_x[2], sep="-")
labels[length(labels)] <- paste(tail(qs_x, 1), "+", sep = "")
The first line constructs labels of the form (x1 + 1) - x2, the second line fixes the first label, and the third line fixes the last label. Here is the output
> labels
[1] "1-11" "12-21" "22-30" "31-40" "41-45" "45+"

Related

R rasterVis levelplot: a white line erroneously appears

I am plotting maps of atmospheric pollutant fields, or meteorological field, difference between such fields, often overlayed with orography.
My fields are gridded.
A white line misteriously appears, sometimes two.
This seems to happen a bit randomly. I mean: same code and fields, same line; but when I change fields, or color scales, it changes position, or it disappears, or another one appears. Sometimes horizontal, sometimes vertical.
Here is my code
#!/usr/bin/env Rscript
library(rasterVis)
library(RColorBrewer)
NX <- 468
NY <- 421
hgt <- matrix(0.,NX,NY)
# read from file:
ucon <- file("hgt.dat", open="rb")
for (n in seq(1,NX)) {
hgt[n,] <- readBin(ucon, "numeric", n=NY, size=4)
}
close(ucon)
hgtbks <- c(-100,10,500,1000,1500,2000,2500,3000,3500)
hgtcols <- colorRampPalette(c("gray30","white"))(length(hgtbks)-1)
tit <- "Orography"
bkstart=50.0; bkmax=1500.; bkby=100.
bks <- seq(bkstart, bkmax, bkby)
nbks <- length(bks)
cols <- rev(colorRampPalette(brewer.pal(11,"Spectral"))(nbks-2))
cols <- c("white",cols)
legendbreaks <- seq(1,nbks)
legendlabels <- formatC(bks,digits=3)
legendlabpos <- legendbreaks
rpl <-
levelplot(hgt, margin=FALSE , col.regions= hgtcols, at= hgtbks
, main= list(label=tit, cex=1.8)
, colorkey=list(draw= TRUE, col=cols, at=legendbreaks
, labels=list(labels=legendlabels, at=legendlabpos, cex=1.2))
, xlab=NULL, ylab=NULL, scales= list(draw= FALSE))
png("whiteline.png", width=800, height=840)
plot(rpl)
graphics.off()
I would really like to upload a file with my data, but for the moment
I could not find a way to do it (I don't think I can do it, not even an ASCII file). The data matrix (468x421) is too big to be explicitly included in the code, but it really is the orography file
shown in the picture (elevation in meters above mean sea level).
And here is the resulting "white line" map:
Really, I think this might be a levelplot bug. It seems to happen both when hgt is a matrix and when it is a proper raster object: this doesn't seem to make a difference.
Any idea?
I think I found a workaround.
By setting zero padding on the 4 sides, I managed to make the whiteline disappear from a series of maps.
First I defined:
zpadding <- list(layout.heights= list(top.padding=0, bottom.padding=0),
layout.widths= list(left.padding=0, right.padding=0))
then I added, among the parameters of the levelplot call:
par.settings=zpadding
As I said, I don't think this is a proper solution, but a workaround.
The problem seems related to any rescaling of the plot area.
In fact, when a rescaling is forced by, for example, having 4 or 5 digits (instead of 2 or 3) in the colorbar labels, a white line may reappear.
I hope this may point in the right direction other people, either users or developers of levelplot and related software.

R: Problems while plotting sampled values from a curve

I am trying to simulate a signal in order to apply some methods of non-linear fittings, but I have some problems when plotting it.
x<-sample(seq(0,1,length.out = 1000),200)
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
The idea I want to have 200 observations uniformly sampled with an additive white noise term, and the I would like to plot this "perturbed" signal together with the original signal. (y and s respectively).
The fact is that if I use the code that I wrote I obtain as result something like:
Probably is such a simple thing, but I'm kinda stuck with this.
Any hint or suggestion will be greatly appreciated.
Lines are plotted sequentially, and you decided to randomly draw your X values, so x values sitting next to each other in x are not next to each other on the axis - hence the mess. Just sort it:
x<-sort(sample(seq(0,1,length.out = 1000),200))
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
Another way to do this on the fly mentioned by mickey is:
ord = order(x)
lines(x[ord], s[ord], col = 'red')
You need to reorder the x observations order in ascending order, you can do that by storing everything in a dataframe object and then ordering it:
x<-sample(seq(0,1,length.out = 1000),200)
df_p= data.frame(x)
df_p$y<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x+rnorm(200,0,0.5)
df_p$s<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x
df_p = df_p[order(df_p$x),]
plot(df_p$x,df_p$y)
lines(df_p$x, df_p$s,col="red")
Also if you want to avoid this step you can use the ggplot2 library:
p <- ggplot(df_p) + geom_point(aes(x = x,y= y)) + geom_line(aes(x=x,y=s,color='red'))
plot(p)

How to make multiple plots with a for loop?

I was experimenting with the waffle package in r, and was trying to use a for loop to make multiple plots at once but was not able to get my code to work. I have a dataset with values for each year of renewables,and since it is over 40 years of data, was looking for a simple way to plot these with a for loop rather than manyally year by year. What am I doing wrong?
I have it from 1:16 as an experiment to see if it would work, although in reality I would do it for all the years in my dataset.
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If I get your question correctly you want to plot all the 16 iterations in a same panel? You can parametrise your plot window to be divided into 16 smaller plots using par(mfrow = c(4,4)) (creating a 4 by 4 matrix and plotting into each cells recursively).
## Setting the graphical parameters
par(mfrow = c(4,4))
## Running the loop normally
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If you need more plots (e.g. 40) you can increase the numbers in the graphical parameters (e.g. par(mfrow = c(6,7))) but that will create really tiny plots. One solution is to do it in multiple loops (for(i in 1:16); for(i in 17:32); etc.)
UPDATE: The code simply wasn't plotting anything when i tried putting in anything above one value (ex. 1:16) or a letter, both in terms of separate plots or many in one plot window (which I think perhaps waffle does not support in the same way as regular plots). In the end, I managed by making it into a function, although I'm still not sure why my original method wouldn't work if this did. See the code that worked below. I also tweaked it a bit, adding ggsave for example.
#function
waffling <- function(x){
renperc<-islren$Value[x]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"), title="",
xlab=islren$TIME[x])
ggsave(file=paste0("plot_", x,".png"))}
for(i in 1:57){
waffling(i)
}

Stretch x-axis between two values

I have to plot several IR-spectrums. The x-axis with this plots has to be stretched between 2000 and 500. I've tried axis(side=1,at=c(4000,3500,2000,1500,1000,500)), but this does not produce the same distance between the labels. I've searched nearly 2 hours but can't figure out how to achieve this.
Help would be appreciated.
Thanks in advance
I don't think that there's a particularly clean way to do this in base graphics - no doubt there's something in one of the many graphics packages that would do it, but heres' my workaround for what I think you're trying to do.
#Some data to plot
x <- 0:4000
y <- sin(x/100)
#A function to do the stretching that you describe
stretcher <- function(x)
{
lower <- 500 ##lower end of expansion
upper <- 2000 ##upper end of expansion
stretchfactor <- 3 ##must be greater than 1, factor of expansion
x[x>upper] <- x[x>upper] + (stretchfactor-1) * (upper-lower)
x[x<=upper & x>lower] <- (x[x<=upper & x>lower] - lower) * stretchfactor + lower
x
}
#Create the plot
plot(stretcher(x),y,axes=FALSE)
labels <- c(4000,3500,3000,2500,2000,1500,1000,500)
box()
axis(2)
axis(1,labels=labels,at=stretcher(labels))
I'd also emphasis the breaks with something like:
abline(v=stretcher(2000),col='red',lty=2)
abline(v=stretcher(500),col='red',lty=2)

Long vector-plot/Coverage plot in R

I really need your R skills here. Been working with this plot for several days now. I'm a R newbie, so that might explain it.
I have sequence coverage data for chromosomes (basically a value for each position along the length of every chromosome, making the length of the vectors many millions). I want to make a nice coverage plot of my reads. This is what I got so far:
Looks alright, but I'm missing y-labels so I can tell which chromosome it is, and also I've been having trouble modifying the x-axis, so it ends where the coverage ends. Additionally, my own data is much much bigger, making this plot in particular take extremely long time. Which is why I tried this HilbertVis plotLongVector. It works but I can't figure out how to modify it, the x-axis, the labels, how to make the y-axis logged, and the vectors all get the same length on the plot even though they are not equally long.
source("http://bioconductor.org/biocLite.R")
biocLite("HilbertVis")
library(HilbertVis)
chr1 <- abs(makeRandomTestData(len=1.3e+07))
chr2 <- abs(makeRandomTestData(len=1e+07))
par(mfcol=c(8, 1), mar=c(1, 1, 1, 1), ylog=T)
# 1st way of trying with some code I found on stackoverflow
# Chr1
plotCoverage <- function(chr1, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr1[start:end]), type="l")
}
plotCoverage(chr1, start=1, end=length(chr1)) # Plots coverage result.
# Chr2
plotCoverage <- function(chr2, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr2[start:end]), type="l")
}
plotCoverage(chr2, start=1, end=length(chr2)) # Plots coverage result.
# 2nd way of trying with plotLongVector
plotLongVector(chr1, bty="n", ylab="Chr1") # ylab doesn't work
plotLongVector(chr2, bty="n")
Then I have another vector called genes that are of special interest. They are about the same length as the chromosome-vectors but in my data they contain more zeroes than values.
genes_chr1 <- abs(makeRandomTestData(len=1.3e+07))
genes_chr2 <- abs(makeRandomTestData(len=1e+07))
These gene vectors I would like plotted as a red dot under the chromosomes! Basically, if the vector has a value there (>0), it is presented as a dot (or line) under the long vector plot. This I have not idea how to add! But it seems fairly straightforward.
Please help me! Thank you so much.
DISCLAIMER: Please do not simply copy and paste this code to run off the entire positions of your chromosome. Please sample positions (for example, as #Gx1sptDTDa shows) and plot those. Otherwise you'd probably get a huge black filled rectangle after many many hours, if your computer survives the drain.
Using ggplot2, this is really easily achieved using geom_area. Here, I've generated some random data for three chromosomes with 300 positions, just to show an example. You can build up on this, I hope.
# construct a test data with 3 chromosomes and 100 positions
# and random coverage between 0 and 500
set.seed(45)
chr <- rep(paste0("chr", 1:3), each=100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
require(ggplot2)
p <- ggplot(data = df, aes(x=pos, y=cov)) + geom_area(aes(fill=chr))
p + facet_wrap(~ chr, ncol=1)
You could use the ggplot2 package.
I'm not sure what exactly you want, but here's what I did:
This has 7000 random data points (about double the amount of genes on Chromosome 1 in reality). I used alpha to show dense areas (not many here, as it's random data).
library(ggplot2)
Chr1_cov <- sample(1.3e+07,7000)
Chr1 <- data.frame(Cov=Chr1_cov,fil=1)
pl <- qplot(Cov,fil,data=Chr1,geom="pointrange",ymin=0,ymax=1.1,xlab="Chromosome 1",ylab="-",alpha=I(1/50))
print(pl)
And that's it. This ran in less than a second. ggplot2 has a humongous amount of settings, so just try some out. Use facets to create multiple graphs.
The code beneath is for a sort of moving average, and then plotting the output of that. It is not a real moving average, as a real moving average would have (almost) the same amount of data points as the original - it will only make the data smoother. This code, however, takes an average for every n points. It will of course run quite a bit faster, but you will loose a lot of detailed information.
VeryLongVector <- sample(500,1e+07,replace=TRUE)
movAv <- function(vector,n){
chops <- as.integer(length(vector)/n)
count <- 0
pos <- 0
Cov <-0
pos[1:chops] <- 0
Cov[1:chops] <- 0
for(c in 1:chops){
tmpcount <- count + n
tmppos <- median(count:tmpcount)
tmpCov <- mean(vector[count:tmpcount])
pos[c] <- tmppos
Cov[c] <- tmpCov
count <- count + n
}
result <- data.frame(pos=pos,cov=Cov)
return(result)
}
Chr1 <- movAv(VeryLongVector,10000)
qplot(pos,cov,data=Chr1,geom="line")

Resources