Sorting values in for plotting in R - r

I have a data that looks like this:
> print(dat)
cutoff tp fp
1 0.6 414 45701
2 0.7 172 16820
3 0.8 51 4326
4 0.9 49 3727
5 1.0 0 0
I want to plot them in reverse-order from smallest dat$tp to largest.
However this code plot them in order like above (i.e. largest to smallest) instead.
> fp_max <- max(dat$fp);
> tp_max <- max(dat$tp);
> op <- par(xaxs = "i", yaxs = "i")
> plot(tp ~ fp, data = dat, xlim = c(0,fp_max),ylim = c(0,tp_max), type = "n")
> with(dat, lines(c(0, fp, fp_max), c(0, tp, tp_max), lty=1, type = "l", col = "black"))
> lines( par()$usr[1:2], par()$usr[3:4], col="red" )
How can I modify the code above to address the problem?
Of course, the x-axis & y-axis coordinates should be from smallest to largest value
The following shows the result of my current code.
Notice that the line started at 0,0 and it 'goes back' to 0 again.
we want to avoid it going back to 0.

Ahh, I understand.
It's because lines draws lines between the points in the order they are given.
There are a few ways you could get around this:
do type='l' in your plot command and then with(dat,lines(...)) is not necessary:
# can also do the col='black',lty=1 in here.
plot(tp ~ fp, data = dat, xlim = c(0,fp_max),ylim = c(0,tp_max), type = "l")
Note that by definition of your fp_max and tp_max, you will include the point (fp_max,tp_max) already. And as long as you have a row with (0,0) for tp and fp in dat, you'll also get the (0,0) point.
Sort dat$tp and use that to sort dat$fp too:
plot(tp ~ fp, ..., type='n')
# sort dat$tp
obj <- sort(dat$fp,index.return=T)
# use obj$x as tp and obj$ix to sort dat$fp prior to plotting
with(dat,
lines(c(0, obj$x, fp_max), c(0, tp[obj$ix], tp_max),
lty=1, type = "l", col = "black"))

#Get order of rows
idx <- order(dat$tp)
#Select data in sorted order
sorted <- dat[idx,]

Related

Creating boxplot on log scale in R

I am trying to plot a boxplot in R, where the input file has multiple columns and each column has different number of rows. With the help given on help on the following link:
boxplot of vectors with different length
I am trying:
x <- read.csv( 'filename.csv', header = T )
plot(
1, 1,
xlim=c(1,ncol(x)), ylim=range(x[-1,], na.rm=TRUE),
xaxt='n', xlab='', ylab=''
)
axis(1, labels=colnames(x), at=1:ncol(x))
for(i in 1:ncol(x)) {
p <- x[,i]
boxplot(p, add=T, at=i)
}
I am trying to plot the values in log scale. But defining log ="y", I am getting the following error:
Error in xypolygon(xx, yy, lty = "blank", col = boxfill[i]) :
plot.new has not been called yet
Following is the sample of my input csv data:
A B C D
2345.42 932.19 40.8 26.19
138.48 1074.1 4405.62 4077.16
849.35 0.0 1451.66 1637.39
451.38 146.22 4579.6 5133.14
5749.01 7250.08 12.23 0.09
4125.48 129.46 49.51
440.38 6405.02
Your data as a reproducible example
Note I had to remove an extra element
library(data.table)
df <- fread("A,B,C,D
2345.42,932.19,40.8,26.19
138.48,1074.1,4405.62,4077.16
849.35,0.0,1451.66,1637.39
451.38,146.22,4579.6,5133.14
5749.01,7250.08,12.23,0.09
4125.48,129.46,49.51,440.38", sep=",", header=T)
dplyr and tidyr solution
library(dplyr)
library(tidyr)
df1 <- df %>%
replace(.==0,NA) %>% # make 0 into NA
gather(var,values,A:D) %>% # convert from wide (4-col) to long (2-col) format
mutate(values = log10(values)) # log10 transform
If you want log2, simply replace log10 with log2
Output
boxplot(values ~ var, df1)
A little extra
For log10 scale, I like to add 1 to my values to eliminate negative values since log10(0 < x < 1) = -value. This sets the minimum value on your plot as 0 since 0 + 1 = 1 and log10(1) = 0

Add a line to coplot {graphics}, classic approaches don't work

I found coplot {graphics} very useful for my plots. However, I would like to include there not only one line, but add there one another. For basic graphic I just need to add = TRUE to add another line, or tu use plot(..) and lines(..). For {lattice} I can save my plots as objects
a<-xyplot(..)
b<-xyplot(..)
and display it simply by a + as.layer(b). No one of these approaches works for coplot(), apparently because creating objects as a<-coplot() doesn't produce trellis graphic but NULL object.
Please, any help how to add data line in coplot()? I really like its graphic so I wish to keep it. Thank you !!
my exemle data are here: http://ulozto.cz/xPfS1uRH/repr-exemple-csv
My code:
sub.tab<-read.csv("repr_exemple.csv", , header = T, sep = "")
attach(sub.tab)
cells.f<-factor(cells, levels=c(2, 25, 100, 250, 500), # unique(cells.in.cluster)???
labels=c("size2", "size25", "size100", "size250", "size500"))
perc.f<-factor(perc, levels=c(5, 10), # unique(cells.in.cluster)???
labels=c("perc5", "perc10"))
# how to put these plots together?
a<- coplot(max_dist ~ time |cells.f + perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "black", lwd = 1)
b<- coplot(mean_dist ~ time |cells.f * perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "grey", lwd = 1)
a + as.layer(b) # this doesn't work
Please, how to merge these two plots (grey and black lines)? I couldn't figure it out... Thank you !
Linking to sample data isn't really as helpful. Here's a randomly created sample data set
set.seed(15)
dd <- do.call("rbind",
do.call("Map", c(list(function(a,b) {
cbind.data.frame(a,b, x=1:5,
y1=cumsum(rpois(5,7)),
y2=cumsum(rpois(5,9)))
}),
expand.grid(a=letters[1:5], b=letters[20:22])))
)
head(dd)
# a b x y1 y2
# 1 a t 1 8 16
# 2 a t 2 13 28
# 3 a t 3 25 35
# 4 a t 4 33 45
# 5 a t 5 39 57
# 6 b t 1 4 12
I will note the coplot is a base graphics function, not Lattice. But it does have a panel= parameter. And you can have the coplot() take care of subsetting your data for you (well, calculating the indexes at least). But, like other base graphics functions, plotting different groups isn't exactly trivial. You can do it in this case with
coplot(y~x|a+b,
# make a fake y col to cover range of all y1 and y2 values
cbind(dd, y=seq(min(dd$y1, dd$y2), max(dd$y1, dd$y2), length.out=nrow(dd))),
#request subscripts to be sent to panel function
subscripts=TRUE,
panel=function(x,y,subscripts, ...) {
# draw group 1
lines(x, dd$y1[subscripts])
# draw group 2
lines(x, dd$y2[subscripts], col="red")
})
This gives

Plotting raster images using custom colours in R

This might sound like a strange process, but its the best I can think of to control rasterised colour gradients with respect to discrete objects (points, lines, polygons). I'm 95% there but can't quite plot correctly.
This should illustrate proof of concept:
require(raster)
r = matrix(56:255, ncol=20) # reds
b = t(matrix(56:255, ncol=10)) # blues
col = matrix(rgb(r, 0, b, max=255), ncol=20) # matrix of colour strings
ras = raster(r) # data raster object
extent(ras) = extent(1,200,1,100) # set extent for aspect
plot(ras, col = col, axes=F, asp=T) # overwrite data with custom colours
Here I want to clip a raster to a triangle and create colour gradient of pixels inside based on their distances to one of the sides. Sorry for length but its the most minimal example I can design.
require(raster); require(reshape2); require(rgeos)
# equilateral triangle
t_s = 100 # half side
t_h = floor(tan(pi*60/180) * t_s) # height
corners = cbind(c(0, -t_s, t_s, 0), c(t_h, 0, 0, t_h))
trig = SpatialPolygons(list(Polygons(list(Polygon(corners)),"triangle")))
# line to measure pixel distances to
redline = SpatialLines(list(Lines(Line(corners[1:2,]), ID='redline')))
plot(trig); plot(redline, add=T, col='red', lwd=3)
# create a blank raster and clip to triangle
r = raster(mat.or.vec(nc = t_s*2 + 1, nr = t_h))
extent(r) = extent(-t_s, t_s, 0, t_h)
r = mask(r, trig)
image(r, asp=T)
# extract cell coordinates into d.f.
cells = as.data.frame(coordinates(rasterToPoints(r, spatial=T)))
# calculate distance of each pixel to redline with apply
dist_to_line = function(xy, line){
point = readWKT(paste('POINT(', xy[1], xy[2], ')'))
gDistance(point, line) / t_h
}
cells$dists = apply(cells, 1, dist_to_line, line=redline)
cells$cols = rgb(1 - cells$dists, 0, 0)
length(unique(cells$cols)) # count unique colours
# use custom colours to colour triangle pixels
image(r, col = cells$cols, asp=T)
plot(r, col = cells$cols, asp=T)
As you can see the plotting fails to overwrite as in the first example, but the data seems fine. Trying to convert to matrix also fails:
# try convertying colours to matrix
col_ras = acast(cells, y~x, value.var='cols')
col_ras = apply(col_ras, 1, rev) # rotate acw to match r
plot(r, col = col_ras, asp=T)
Very grateful for any assistance on what's going wrong.
Edit:
To show Spacedman's plotRGB method:
b = brick(draster, 1-draster, 1-draster)
plotRGB(b, scale=1)
plot(trig, col=NA, border='white', lwd=5, add=T)
Easy way is to go from your points to a spatial pixels data frame to a raster, then do the colour mapping...
Start with:
> head(cells)
x y dists
1 0.0000000 172.5 0.0014463709
2 0.0000000 171.5 0.0043391128
3 -0.9950249 170.5 0.0022523089
4 0.0000000 170.5 0.0072318546
5 0.9950249 170.5 0.0122114004
convert:
> coordinates(cells)=~x+y
> draster = raster(as(cells,"SpatialPixelsDataFrame"))
colourise:
> cols=draster
> cols[!is.na(draster)]= rgb(1-draster[!is.na(draster)],0,0)
> plot(cols, col=cols)
I'm not sure this is the right way to do things though, you might be better off creating an RGB raster stack and using plotRGB if you want fine colour control.

Print frequencies (as numbers) in plot

In R, I would like to insert frequencies (as numbers) in a plot:
my code to create the plot:
par(mar=c(4.5,4.5,9.5,4), xpd=TRUE)
plot(factor(ArtMehrspr)~Mehrspr_Vielf, data=datProjektMehr, col=terrain.colors(4),
bty='L', main="Vielfalt nutzen")
legend("topright", inset=c(0,-.225), title="Art der Mehrsprachigkeit", levels(factor(datProjektMehr$ArtMehrspr)),
fill=terrain.colors(4), horiz=TRUE)
par(mar=c(5,4,4,2)+0.1)
In the plot, 2 columns of my dataframe are depicted: ArtMehrspr and Mehrspr_Vielf.
Now what I would like to know is, how many "Kombi" are in category "1", how many "Paral" are in category "1" and so on, and then to print this number in the plot, so that in every box of the plot, I can see the corresponding number of observations. R must know these numbers, otherwise it could not vary the height of the different boxes according to the number of observations. So it cannot be that hard to get these numbers into the plot, can it?
With the command table(), I can get these numbers, but I would have to have 5 table()-commands to get all the numbers. Example for category = 1:
> table(subset(datProjektMehr, Mehrspr_Vielf=="1")$ArtMehrspr)
einspr Kombi Paral Versc Wechs
0 1 9 2 1
Apparently, you can achieve what I am looking for by adding the command labels = TRUE. But it does not work:
par(mar=c(4.5,4.5,9.5,4), xpd=TRUE, labels = TRUE)
plot(factor(ArtMehrspr)~Mehrspr_Vielf, data=datProjektMehr, col=terrain.colors(4),
bty='L', main="Vielfalt nutzen")
legend("topright", inset=c(0,-.225), title="Art der Mehrsprachigkeit", levels(factor(datProjektMehr$ArtMehrspr)),
fill=terrain.colors(4), horiz=TRUE)
par(mar=c(5,4,4,2)+0.1)
R gives me the following warning message:
Warning message:
In par(mar = c(4.5, 4.5, 9.5, 4), xpd = TRUE, labels = TRUE) :
"labels" is not a graphical parameter
Is this not the right command? Does anyone know how to do this?
First of all, the warning informs that there is not a labels argument you can use inside par.
Regarding the plotting of the table output, I'm not aware if there is an easy way of doing this, but I managed a pretty UNreliable and, maybe, inefficient code. In my machine, though, it works every time I run it.
The concept I had in mind is to text all values from your table inside the plot. To do so, coordinates in xx' and yy' had to be estimated. I prefer the term "estimated" instead of "calculated" because I didn't find a way to compute absolute values for the coordinates, due to the fact that the plot method was plot.factor.
So:
#random data. DF = datProjektMehr, artmehr = ArtMehrspr, mehrviel = Mehrspr_Vielf
DF <- data.frame(artmehr = sample(letters[1:4], 20, T), mehrviel = as.factor(sample(1:5, 20, T)))
#your code of plotting
par(mar = c(4.5,4.5,9.5,4), xpd = TRUE)
plot(factor(artmehr) ~ mehrviel, data = DF, col = terrain.colors(4),
bty = 'L', main = "Vielfalt nutzen")
legend("topright", inset=c(0,-.225), title="Art der Mehrsprachigkeit", levels(factor(DF$artmehr)),
fill=terrain.colors(4), horiz=TRUE)
#no need to "table()" many times
tab = table(DF$artmehr, DF$mehrviel)
#maximum value of x axis (at least in my machine)
#I found -through trial and error- that for a factor of n levels, x.max = 1 + (n-1)*0.02
x.max = 1 + (length(levels(DF$mehrviel)) - 1) * 0.02
#coordinates of "mehrviel" (as I named it)
mehrviel.coords = ((cumsum(apply(tab, 2, sum)) / sum(tab)) * x.max) - ((apply(tab, 2, sum) / sum(tab)) / 2)
#coordinates of "artmehr" (as I named it)
artmehr.coords <- apply(tab, 2, function(x) { cumsum(x / sum(x)) })
artmehr.coords <- apply(artmehr.coords, 2, function(x) { x - c(x[1]/2, diff(x)/2) })
#"text" the values in your table
#don't plot "0"s
for(i in 1:ncol(artmehr.coords))
{
text(x = mehrviel.coords[i], y = artmehr.coords[,i], labels = ifelse(tab[,i] != 0, tab[,i], ""), cex = 2)
}
The values of table:
tab
1 2 3 4 5
a 1 1 0 1 0
b 0 0 2 1 2
c 1 1 2 1 0
d 2 0 0 3 2
The plot:
EDIT: 1) "Tidied" the answer. 2) Aadded an extra level to the factor ploted in xx' axis to match your data exactly. 3)texted the frequencies in the middle of each box.

Generate multiple serial graphs/scatterplots from data in two dataframes

I have 2 dataframes, Tg and Pf, each of 127 columns. All columns have at least one row and can have up to thousands of them. All the values are between 0 and 1 and there are some missing values (empty cells). Here is a little subset:
Tg
Tg1 Tg2 Tg3 ... Tg127
0.9 0.5 0.4 0
0.9 0.3 0.6 0
0.4 0.6 0.6 0.3
0.1 0.7 0.6 0.4
0.1 0.8
0.3 0.9
0.9
0.6
0.1
Pf
Pf1 Pf2 Pf3 ...Pf127
0.9 0.5 0.4 1
0.9 0.3 0.6 0.8
0.6 0.6 0.6 0.7
0.4 0.7 0.6 0.5
0.1 0.6 0.5
0.3
0.3
0.3
Note that some cell are empty and the vector lengths for the same subset (i.e. 1 to 127) can be of very different length and are rarely the same exact length.
I want to generate 127 graph as follow for the 127 vectors (i.e. graph is for col 1 from each dataframe, graph 2 is for col 2 for each dataframe etc...):
Hope that makes sense. I'm looking forward to your assistance as I don't want to make those graphs one by one...
Thanks!
Here is an example to get you started (data at https://gist.github.com/1349300). For further tweaking, check out the excellent ggplot2 documentation that is all over the web.
library(ggplot2)
# Load data
Tg = read.table('Tg.txt', header=T, fill=T, sep=' ')
Pf = read.table('Pf.txt', header=T, fill=T, sep=' ')
# Format data
Tg$x = as.numeric(rownames(Tg))
Tg = melt(Tg, id.vars='x')
Tg$source = 'Tg'
Tg$variable = factor(as.numeric(gsub('Tg(.+)', '\\1', Tg$variable)))
Pf$x = as.numeric(rownames(Pf))
Pf = melt(Pf, id.vars='x')
Pf$source = 'Pf'
Pf$variable = factor(as.numeric(gsub('Pf(.+)', '\\1', Pf$variable)))
# Stack data
data = rbind(Tg, Pf)
# Plot
dev.new(width=5, height=4)
p = ggplot(data=data, aes(x=x)) + geom_line(aes(y=value, group=source, color=source)) + facet_wrap(~variable)
p
Highlighting the area between the lines
First, interpolate the data onto a finer grid. This way the ribbon will follow the actual envelope of the lines, rather than just where the original data points were located.
data = ddply(data, c('variable', 'source'), function(x) data.frame(approx(x$x, x$value, xout=seq(min(x$x), max(x$x), length.out=100))))
names(data)[4] = 'value'
Next, calculate the data needed for geom_ribbon - namely ymax and ymin.
ribbon.data = ddply(data, c('variable', 'x'), summarize, ymin=min(value), ymax=max(value))
Now it is time to plot. Notice how we've added a new ribbon layer, for which we've substituted our new ribbon.data frame.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3, data=ribbon.data)
Dynamic coloring between the lines
The trickiest variation is if you want the coloring to vary based on the data. For that, you currently must create a new grouping variable to identify the different segments. Here, for example, we might use a function that indicates when the "Tg" group is on top:
GetSegs <- function(x) {
segs = x[x$source=='Tg', ]$value > x[x$source=='Pf', ]$value
segs.rle = rle(segs)
on.top = ifelse(segs, 'Tg', 'Pf')
on.top[is.na(on.top)] = 'Tg'
group = rep.int(1:length(segs.rle$lengths), times=segs.rle$lengths)
group[is.na(segs)] = NA
data.frame(x=unique(x$x), group, on.top)
}
Now we apply it and merge the results back with our original ribbon data.
groups = ddply(data, 'variable', GetSegs)
ribbon.data = join(ribbon.data, groups)
For the plot, the key is that we now specify a grouping aesthetic to the ribbon geom.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, group=group, fill=on.top), alpha=0.3, data=ribbon.data)
Code is available together at: https://gist.github.com/1349300
Here is a three-liner to do the same :-). We first reshape from base to convert the data into long form. Then, it is melted to suit ggplot2. Finally, we generate the plot!
mydf <- reshape(cbind(Tg, Pf), varying = 1:8, direction = 'long', sep = "")
mydf_m <- melt(mydf, id.var = c(1, 4), variable = 'source')
qplot(id, value, colour = source, data = mydf_m, geom = 'line') +
facet_wrap(~ time, ncol = 2)
NOTE. The reshape function in base R is extremely powerful, albeit very confusing to use. It is used to transform data between long and wide formats.
Kudos for automating something you used to do in Excel using R! That's exactly how I got started with R and a common path to R enlightenment :)
All you really need is a little looping. Here's an example, most of which is creating example data that represents your data structure:
## create some example data
Tg <- data.frame(Tg1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Tg[paste("Tg", i, sep="")] <- vec[1:10]
}
Pf <- data.frame(Pf1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Pf[paste("Pf", i, sep="")] <- vec[1:10]
}
## ok, sample data created
## now lets loop through all the columns
## if you didn't know how many columns there are you could
## use ncol(Tg) to figure out
for (i in 1:10) {
plot(1:10, Tg[,i], type = "l", col="blue", lwd=5, ylim=c(-3,3),
xlim=c(1, max(length(na.omit(Tg[,i])), length(na.omit(Pf[,i])))))
lines(1:10, Pf[,i], type = "l", col="red", lwd=5, ylim=c(-3,3))
dev.copy(png, paste('rplot', i, '.png', sep=""))
dev.off()
}
This will result in 10 graphs in your working directory that look like the following:

Resources