In my problem there are subregions of a larger region that can be classified as positive or negative. I have several files with different classifications, in the following format:
start | end
10 | 20
60 | 120
178 | 220
They are sorted, and they have only positive subregions, the rest are assumed negative.
I would like to represent this data in a 2D graphic in R, but I don't know what type of graph I should use. It's something like this:
http://i.imgur.com/VaSvEKr.jpg
That kind of chart is called "Gantt", here's a possible way to draw it in base R :
# input example
DF <-
read.csv(text=
'"file","start","end"
"file1",10,20
"file1",60,120
"file1",178,220
"file2",10,20
"file2",25,100
"file2",130,140
"file2",190,210
"file3",0,50
"file3",55,400',stringsAsFactors=F)
minval <- min(DF$start) # or different if you know the limits
maxval <- max(DF$end) # or different if you know the limits
files <- rev(unique(DF$file))
nfiles <- length(files)
# empty plot to make space for everything
filehigh <- 1.0
plot(c(minval,maxval),c(filehigh/2,nfiles+filehigh/2),type='n', xlab='Time',ylab=NA,yaxt='n' )
# add y labels
axis(side=2,at=1:nfiles,labels=files,las=1)
# plot the rectangles
negcolor <- 'red'
poscolor <- 'green'
for(i in 1:nfiles){
file <- files[i]
subDF <- DF[DF$file == file,]
lastend <- minval
for(r in 1:nrow(subDF)){
yTop <- i+(filehigh/2)
yBottom <- i-(filehigh/2)
start <- subDF[r,'start']
end <- subDF[r,'end']
if(start > lastend){
rect(lastend,yBottom,start,yTop,col=negcolor )
}
rect(start,yBottom,end,yTop,col=poscolor)
lastend <- end
}
if(lastend < maxval){
rect(lastend,yBottom,maxval,yTop,col=negcolor )
}
}
Result :
Related
I would like to remove the pixels that form a large cluster and keep only the small cluster to analyse (means get pixels number and locations). First I apply a filter to color in white all pixels that has a value lower to 0.66. Then I use the function clump() in R. The model works but I cannot remove only the large cluster. I do not understand how clump function works.
Initial image:
Results image: plot_r is the image where the pixels with value < 0.66 are changed to 0. plot_rc is the results after clump() function. As observed I cannot remove only the large cluster of pixels (on top of the image plot_r). I changed the value (700 in the code) but not better, how to do?
Here the code:
library(magick)
library(pixmap)
library(raster)
library(igraph)
f <- "https://i.stack.imgur.com/2CjCh.jpg"
x <- image_read(f)
x <- image_convert(x, format = "pgm", depth = 8)
# Save the PGM file
f <- tempfile(fileext = ".pgm")
image_write(x, path = f, format = "pgm")
# Read in the PGM file
picture <- read.pnm(file = f, cellres = 1)
str(picture)
picture#size
mat <- picture#grey
mat[mat<0.66] <- 0; x
##############################################################
##Remove clumps of pixels in R using package Raster and igraph
#Detect clumps (patches) of connected cells
r <-raster(mat)
rc <- clump(r)
#extract IDs of clumps according to some criteria
clump9 = data.frame(freq(rc))
#remove clump observations with frequency smaller/larger than N
clump9 = clump9[ ! clump9$count > 700, ]
# record IDs from clumps which met the criteria in previous step
clump9 = as.vector(clump9$value)
#replace cells with IDs which do not belong to the group of interest
rc[rc != clump9[1] & rc != clump9[2]] = NA
# converting rasterlayer to matrix
n <- as.matrix(r)
m <- as.matrix(rc)
Perhaps something like this
library(raster)
library(igraph)
Short-cutting your approach a bit
f <- "https://i.stack.imgur.com/2CjCh.jpg"
b <- brick(f)
x <- sum(b)
r <- x > 450
rc <- clump(r)
f <- freq(rc, useNA="no")
Replace the clumps with the number of cells they consist of and then set the larger one (here more than 100 cells) to NA, and use the result to mask the original raster
rs <- subs(rc, data.frame(f))
rsc <- reclassify(rs, cbind(100,Inf,NA))
m <- mask(b, rsc)
plotRGB(m)
This is more a question to see if anyone has seen anything like this in their travels. I am working with a lot of weather data and I would like to plot wind based on wind barbs.
I have looked into the package RadioSonde however its plotwind() function is not doing the job I had anticipated. It does have a good example of the type of data data(ExampleSonde)
Arguably I can use TeachingDemos in conjunction with my.symbols() to create these wind barbs. I was just curious if anyone has found (or created) a way to plot wind barbs. Otherwise my.symbols() it is.
Thanks,
Badger
Another way is to create the wind barbs using grid graphics.
First step is to calculate how many, and what type of barb is needed. As described here, I created three types, that represent 50, 10, and 5 knots - I round down the speed to the nearest five.
The function below wind_barb generates a new grob for each wind speed it is given. Using an idea from Integrating Grid Graphics Output with Base Graphics Output - Murrell (pg4) you can plot the grobs easily and represent the wind direction by rotating the viewport.
An example
Create some data
set.seed(1)
dat <- data.frame(x=-2:2, y=-2:2,
direction=sample(0:360, 5),
speed=c(10, 15, 50, 75, 100))
# x y direction speed
# 1 -2 -2 95 10
# 2 -1 -1 133 15
# 3 0 0 205 50
# 4 1 1 325 75
# 5 2 2 72 100
Plot
library(gridBase)
library(grid)
with(dat, plot(x, y, ylim=c(-3, 3), xlim=c(-3, 3), pch=16))
vps <- baseViewports()
pushViewport(vps$inner, vps$figure, vps$plot)
# Plot
for (i in 1:nrow(dat)) {
pushViewport(viewport(
x=unit(dat$x[i], "native"),
y=unit(dat$y[i], "native"),
angle=dat$direction[i]))
wind_barb(dat$speed[i])
popViewport()
}
popViewport(3)
Which produces
wind_barb function to create barbs (please simplify me!). You can change the height and width of the barb by adjusting mlength and wblength arguments respectively.
wind_barb <- function(x, mlength=0.1, wblength=0.025) {
# Calculate which / how many barbs
# any triangles (50)
fif <- floor(x /50)
# and then look for longer lines for remaining speed (10)
tn <- floor( (x - fif* 50)/10)
# and then look for shorter lines for remaining speed (5)
fv <- floor( (x - fif* 50 - tn* 10)/5)
# Spacing & barb length
yadj <- 0.5+mlength
dist <- (yadj-0.5) / 10
xadj <- 0.5+wblength
xfadj <- 0.5+wblength/2
# Create grobs
main_grob <- linesGrob(0.5, c(0.5, yadj ))
# 50 windspeed
if(fif != 0) {
fify <- c(yadj, yadj-dist*seq_len(2* fif) )
fifx <- c(0.5, xadj)[rep(1:2, length=length(fify))]
fif_grob <- pathGrob(fifx, fify, gp=gpar(fill="black"))
} else {
fif_grob <- NULL
fify <- yadj+dist
}
# Ten windspeed
if(tn != 0) {
tny <- lapply(seq_len(tn) , function(x) min(fify) - dist*c(x, x-1))
tn_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xadj)), y=tny, SIMPLIFY=FALSE))
} else {
tn_grob <- NULL
tny <- fify
}
# Five windspeed
if(fv != 0) {
fvy <- lapply(seq_len(fv) , function(x) min(unlist(tny)) -dist* c(x, x-0.5))
fv_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xfadj)), y=fvy, SIMPLIFY=FALSE))
} else {
fv_grob <- NULL
}
# Draw
#grid.newpage()
grid.draw(gList(main_grob, fif_grob, tn_grob, fv_grob))
}
-------------------------------------
comment from sezen below
The plotted wind direction is wrong. To have right meteorological wind direction, use angle = 360 - dat$direction[i]. See http://tornado.sfsu.edu/geosciences/classes/m430/Wind/WindDirection.html
I have a dataframe of numbers (genetics data) on different chromosomes that can be considered factors to separate the numbers on. It looks like this (with adjacent columns containing Sample info for each position):
awk '{print $2 "\t" $3}' log_values | head
Chr Start Sample1 Sample2
1 102447376 0.46957632 0.38415043
1 102447536 0.49194950 0.30094824
1 102447366 0.49874880 -0.17675325
2 102447366 -0.01910729 0.20264680
1 108332063 -0.03295081 0.07738970
1 109472445 0.02216355 -0.02495788
What I want do is to make a series of plots taking values from other columns in this file. Instead of plotting one for each row (which would represent the results in a different region and/or different sample), I want to draw plots covering ranges if there are values in the Start column close enough to each other. To start, I would like a plot to be made if there are three values in the Start column within say 1000 of each other. That is, a 1000 from A to B to C inclusive so that A to B <= 1000 and B to C is <= 1000 but A to C does not have to be <= 1000. In the code below, this 1000 is "CNV_size". The "flanking_size" variable is just to zoom the plot out a bit so I can give it some context.
Taking the sample values, Rows 1 2 and 3 would be highlighted as one plot for Sample1. These sample numbers are log2Ratios so I only want to plot the significant ones. I define this as above 0.4 or below -0.6. This means that the same three rows would not yield a plot for sample 2.
The fourth row would not be included as the Chr column number/factor is different. That's a separate plot for each column showing the values only in the rows that meet this condition. So I can have more than plot per sample but each set of regions that meets this criterion will be plotted in all samples. If this doesn't make sense, perhaps my ineffective attempt below will help explain what I'm waffling about.
pdf("All_CNVs_closeup.pdf")
CNV_size <- 1000 # bp
flanking_size <- 1000 # bp
#for(chr in 1:24){
for(chr in 1:1){
#for(array in 1:24) {
for(array in 1:4) {
dat <- subset(file, file$Chr == chr )
dat <- subset(dat, dat[,array+6] > 0.4 | dat[,array+6] < -0.6)
if(length(dat$Start) > 1 ) {
dat <- dat[with(dat, order(Start)), ]
x=dat$Start[2:length(dat$Start)]-dat$Start[1:(length(dat$Start)-1)]
cnv <- 1
while(cnv <= length(x)) {
for(i in cnv:length(x) ) {
if(x[i] >= CNV_size) {
plot_title <- paste(sample_info$Sample.ID[array], files[array], sep = " ")
plot(dat$Start, -dat[,array+6], main = plot_title , ylim = c(-2,2), xlim = c(dat$Start[cnv] - flanking_size , dat$Start[i ] + flanking_size) , xlab = chr, ylab = "Log2 Ratio")
abline(h = 0.4, col="blue")
abline(h = 0, col="red")
abline(h = -0.6, col="blue")
break
} # if(x[i] >= CNV_size) {
#if(x[i] < CNV_size) i <- i + 1
} # for(i in cnv:length(x) ) {
cnv <- i
} # while(x[cnv] <= length(x)) {
} # if(length(dat$Start) > 1 ) {
} # for(array in 1:24) {
} # for(chr in 1:24){
dev.off()
You could write a loop that accumulates indices given the criteria, then plot each window:
# Assuming your dataframe is called SNPs.
getChrWindows <- function(snps, windowSize) {
curWindow <- 1
windows <- list()
for(i in 2:nrow(snps)) {
# If pair is on the same chromosome and within the window, add it
if(snps$Chr[i-1] == snps$Chr[i] &&
(snps$Start[i] - snps$Start[i-1]) <= windowSize) {
tryCatch({
windows[[curWindow]]
}, error = function(e) {
windows[[curWindow]] <- c(i-1)
}, finally {
windows[[curWindow]] <- c(windows[[curWindow]], i)
})
} else {
# If there is an existing window, create a new one.
tryCatch({
windows[[curWindow]]
}, error = function(e) {
curWindow <- curWindow + 1
})
}
}
return(windows)
}
Now you can get a list of all the windows in your data.frame, and plot each one:
windows <- getChrWindows(snps, 1000)
for (i in seq_along(windows)) {
# plot snps[windows[[i]],] using your plotting code.
}
I have a Landfire Existing Vegetation dataset (http://www.landfire.gov/), which I have projected and cropped to my study site. The raster has ~12,000,000 cells. The cell values represent a particular vegetation type, and the values range from 16:2200. All of those values are not represented in my study area (i.e. values jump from 20 to 1087).
As many of the pixels' values can be lumped together into one classification for my purposes (e.g. different shrub communities into one class), I wanted to reset the values of the raster to simpler values (1:11). This will facilitate easy extraction of data from other rasters by vegetation type and ease of plotting the classification map. I have a working code, but it requires a ton of typing to change all 61 of the values I need to change. Here's what I did:
#===============================
############Example#############
#===============================
library(raster)
r <- raster(nrows=30, ncols=10, xmn=0, xmx=10)
r[] <- rep(10:19, 30)
r.omance <- function(x){
x[x==10] <- 1; x[x==11] <- 1; x[x==12] <- 1
x[x==13] <- 1; x[x==14] <- 1; x[x==15] <- 1
x[x==16] <- 2; x[x==17] <- 2; x[x==18] <- 2
x[x==19] <- 2
return(x)}
reclass <- calc(r, fun = r.omance)
Does anyone know of an easier way to go about this? You can imagine the typing to change 61 values, especially since x[x==16:20] <- 1 was producing an error, so every value had to be typed out separately. As I said, my code works. But I just want to become a better R coder.
Thanks.
You could use %in%:
x %in% c(1,4,3:10)
This:
x[x==10] <- 1; x[x==11] <- 1; x[x==12] <- 1
x[x==13] <- 1; x[x==14] <- 1; x[x==15] <- 1
would reduce to:
x[x %in% 10:15]
I would use the reclassify function
library(raster)
r <- raster(nrows=30, ncols=10, xmn=0, xmx=10)
r[] <- rep(10:19, 30)
rc <- matrix(c(10,15,1,16,19,2), ncol=3, byrow=TRUE)
x <- reclassify(r, rc, right=NA)
You will save yourself a bit of typing using the & logical operator, e.g.
x[ x >= 10 & x <= 15 ] <- 1
x[ x >= 16 & x <= 19 ] <- 2
I'm new to this site. I was wondering if anyone had experience with turning a list of grid coordinates (shown in example code below as df). I've written a function that can handle the job for very small data sets but the run time increases exponentially as the size of the data set increases (I think 800 pixels would take about 25 hours). It's because of the nested for loops but I don't know how to get around it.
## Dummy Data
x <- c(1,1,2,2,2,3,3)
y <- c(3,4,2,3,4,1,2)
df <- as.data.frame(cbind(x,y))
df
## Here's what it looks like as an image
a <- c(NA,NA,1,1)
b <- c(NA,1,1,1)
c <- c(1,1,NA,NA)
image <- cbind(a,b,c)
f <- function(m) t(m)[,nrow(m):1]
image(f(image))
## Here's my adjacency matrix function that's slowwwwww
adjacency.coordinates <- function(x,y) {
df <- as.data.frame(cbind(x,y))
colnames(df) = c("V1","V2")
df <- df[with(df,order(V1,V2)),]
adj.mat <- diag(1,dim(df)[1])
for (i in 1:dim(df)[1]) {
for (j in 1:dim(df)[1]) {
if((df[i,1]-df[j,1]==0)&(abs(df[i,2]-df[j,2])==1) | (df[i,2]-df[j,2]==0)&(abs(df[i,1]-df[j,1])==1)) {
adj.mat[i,j] = 1
}
}
}
return(adj.mat)
}
## Here's the adjacency matrix
adjacency.coordinates(x,y)
Does anyone know of a way to do this that will work well on a set of coordinates a couple thousand pixels long? I've tried conversion to SpatialGridDataFrame and went from there but it won't get the adjacency matrix correct. Thank you so much for your time.
While I thought igraph might be the way to go here, I think you can do it more simply like:
result <- apply(df, 1, function(pt)
(pt["x"] == df$x & abs(pt["y"] - df$y) == 1) |
(abs(pt["x"] - df$x) == 1 & pt["y"] == df$y)
)
diag(result) <- 1
And avoid the loopiness and get the same result:
> identical(adjacency.coordinates(x,y),result)
[1] TRUE