This is more a question to see if anyone has seen anything like this in their travels. I am working with a lot of weather data and I would like to plot wind based on wind barbs.
I have looked into the package RadioSonde however its plotwind() function is not doing the job I had anticipated. It does have a good example of the type of data data(ExampleSonde)
Arguably I can use TeachingDemos in conjunction with my.symbols() to create these wind barbs. I was just curious if anyone has found (or created) a way to plot wind barbs. Otherwise my.symbols() it is.
Thanks,
Badger
Another way is to create the wind barbs using grid graphics.
First step is to calculate how many, and what type of barb is needed. As described here, I created three types, that represent 50, 10, and 5 knots - I round down the speed to the nearest five.
The function below wind_barb generates a new grob for each wind speed it is given. Using an idea from Integrating Grid Graphics Output with Base Graphics Output - Murrell (pg4) you can plot the grobs easily and represent the wind direction by rotating the viewport.
An example
Create some data
set.seed(1)
dat <- data.frame(x=-2:2, y=-2:2,
direction=sample(0:360, 5),
speed=c(10, 15, 50, 75, 100))
# x y direction speed
# 1 -2 -2 95 10
# 2 -1 -1 133 15
# 3 0 0 205 50
# 4 1 1 325 75
# 5 2 2 72 100
Plot
library(gridBase)
library(grid)
with(dat, plot(x, y, ylim=c(-3, 3), xlim=c(-3, 3), pch=16))
vps <- baseViewports()
pushViewport(vps$inner, vps$figure, vps$plot)
# Plot
for (i in 1:nrow(dat)) {
pushViewport(viewport(
x=unit(dat$x[i], "native"),
y=unit(dat$y[i], "native"),
angle=dat$direction[i]))
wind_barb(dat$speed[i])
popViewport()
}
popViewport(3)
Which produces
wind_barb function to create barbs (please simplify me!). You can change the height and width of the barb by adjusting mlength and wblength arguments respectively.
wind_barb <- function(x, mlength=0.1, wblength=0.025) {
# Calculate which / how many barbs
# any triangles (50)
fif <- floor(x /50)
# and then look for longer lines for remaining speed (10)
tn <- floor( (x - fif* 50)/10)
# and then look for shorter lines for remaining speed (5)
fv <- floor( (x - fif* 50 - tn* 10)/5)
# Spacing & barb length
yadj <- 0.5+mlength
dist <- (yadj-0.5) / 10
xadj <- 0.5+wblength
xfadj <- 0.5+wblength/2
# Create grobs
main_grob <- linesGrob(0.5, c(0.5, yadj ))
# 50 windspeed
if(fif != 0) {
fify <- c(yadj, yadj-dist*seq_len(2* fif) )
fifx <- c(0.5, xadj)[rep(1:2, length=length(fify))]
fif_grob <- pathGrob(fifx, fify, gp=gpar(fill="black"))
} else {
fif_grob <- NULL
fify <- yadj+dist
}
# Ten windspeed
if(tn != 0) {
tny <- lapply(seq_len(tn) , function(x) min(fify) - dist*c(x, x-1))
tn_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xadj)), y=tny, SIMPLIFY=FALSE))
} else {
tn_grob <- NULL
tny <- fify
}
# Five windspeed
if(fv != 0) {
fvy <- lapply(seq_len(fv) , function(x) min(unlist(tny)) -dist* c(x, x-0.5))
fv_grob <- do.call(gList,
mapply(function(x,y)
linesGrob(x=x, y=y, gp=gpar(fill="black")),
x=list(c(0.5, xfadj)), y=fvy, SIMPLIFY=FALSE))
} else {
fv_grob <- NULL
}
# Draw
#grid.newpage()
grid.draw(gList(main_grob, fif_grob, tn_grob, fv_grob))
}
-------------------------------------
comment from sezen below
The plotted wind direction is wrong. To have right meteorological wind direction, use angle = 360 - dat$direction[i]. See http://tornado.sfsu.edu/geosciences/classes/m430/Wind/WindDirection.html
Related
I am using below code to list out Peaks and Valleys.
x_last <- as.numeric(series[1])
x <- as.numeric(series[2])
d_last <- (x-x_last)
series[1:2] <- NULL
output <- list()
for (x_next in series){
if (x_next == x){
next}
d_next <- (x_next - x)
if (d_last * d_next < 0){
output <- append(output, x)}
x_last <- x
x <- x_next
d_last <- d_next
}
Here Output(list) contains "continuous Peaks and Valleys".
Output <- c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894)
so on...
the graph plotted using Output(list). My question is how to add threshold in this code? or how can i remove small Peaks and Valleys(values less than 1). I Need continuous Peaks and valleys.
Looking for answers. thank you in advance.
If you just want to plot your data:
You could plot this with ggplot2 and add a geom_smooth() layer. It defaults to method "loess" which is kind of a "do-the-right-thing" smoother for small datasets.
dat <- data.frame(y=c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894))
dat$x <- 1:length(dat$y)
library(ggplot2)
ggplot(dat, aes(x, y)) +
geom_line() +
geom_smooth(method="loess", se=FALSE)
?
Or do you rather want to smoothen the data yourself? (Your data series is quite short for that.) Do you need an equation for the fit? It's easy to spend quite some time on that.
I don't fully understand this "peak/valley" stuff. In any case, take a look at the diff() function. Maybe this helps:
dat <- data.frame(y=c(41.49916, 37.92029, 39.86477, 39.86432, 39.95672, 39.95465, 39.96144, 39.83994, 40.43357, 40.11285, 40.82250, 39.37034, 58.82975, 42.19894))
dat[which(diff(dat$y) < 0.01)+1,"y"] <- NA
dat$y
[1] 41.50 NA 39.86 NA 39.96 NA NA NA 40.43 NA 40.82 NA
[13] 58.83 NA
Here I've used a threshold of 0.01.
I'm not sure if it's the right thing. But you can adapt this code for your needs.
At last I created a function to remove small cycles also to maintain Peak and valley. For me it is working perfectly.
hysteresis <- function(series, min_range){
#hysteresis function will remove cycles within the magnitude of min_range
#Series: list of values with continuous Peak & valley.
series <- unlist(series)
f <- series[1]
org <- f
series <- series[2:length(series)]
for(i in series){
val <- abs(i-f)
if(val > min_range){
org <- c(org,i)
f <- i
}
#else statement is used to maintain peak and valley
else{
org <- org[1:(length(org)-1)]
f <- org[length(org)]
}
}
return(org)
}
I'm working with variables resembling the data val values created below:
# data --------------------------------------------------------------------
data("mtcars")
val <- c(mtcars$wt, 10.55)
I'm cutting this variable in the following manner:
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val)
res <- cut2(x = val, cuts = cut_breaks)
which produces the following results:
> table(res)
res
[ 1, 2) [ 2, 3) [ 3, 4) [ 4, 5) [ 5, 6) 6 7 8 9 [10,11]
4 8 16 1 3 0 0 0 0 1
In the created output I would like to change the following:
I'm not interested in creating grups with one value. Ideally, I would like to for each group to have at least 3 / 4 values. Paradoxically, I can leave with groups having 0 values as those will dropped later on when mergining on my real data
Any changes to the cutting mechanism, have to work on a variable with integer values
The cuts have to be pretty. I'm trying to avoid something like 1.23 - 2.35. Even if those values would be most sensible considering the distribution.
In effect, what I'm trying to achieve is this: try to make more or less even pretty group and if getting a really tiny group then bump it together with the next group, do not worry about empty groups.
Full code
For convenience, the full code is available below:
# Libs --------------------------------------------------------------------
Vectorize(require)(package = c("scales", "Hmisc"),
character.only = TRUE)
# data --------------------------------------------------------------------
data("mtcars") val <- c(mtcars$wt, 10.55)
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val) res <-
cut2(x = val, cuts = cut_breaks)
What I've tried
First approach
I tried to play with the eps.correct = 0 value in the pretty_breaks like in the code:
cut_breaks <- pretty_breaks(n = cuts, eps.correct = 0)(variable)
but none of the values gets me anwhere were close
Second approach
I've also tried using the m= 5 argument in the cut2 function but I keep on arriving at the same result.
Comment replies
My breaks function
I tried the mybreaks function but I would have to put some work into it to get nice cuts for more bizzare variables. Broadly speaking, pretty_breaks cuts well for me, juts the tiny groups that occur from time to time are not desired.
> set.seed(1); require(scales)
> mybreaks <- function(x, n, r=0) {
+ unique(round(quantile(x, seq(0, 1, length=n+1)), r))
+ }
> x <- runif(n = 100)
> pretty_breaks(n = 5)(x)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> mybreaks(x = x, n = 5)
[1] 0 1
You could use the quantile() function as a relatively easy way to get similar numbers of observations in each of your groups.
For example, here's a function that takes a vector of values x, a desired number of groups n, and a desired rounding off point r for the breaks, and gives you suggested cut points.
mybreaks <- function(x, n, r=0) {
unique(round(quantile(x, seq(0, 1, length=n+1)), r))
}
cut_breaks <- mybreaks(val, 5)
res <- cut(val, cut_breaks, include.lowest=TRUE)
table(res)
[2,3] (3,4] (4,11]
8 16 5
In my problem there are subregions of a larger region that can be classified as positive or negative. I have several files with different classifications, in the following format:
start | end
10 | 20
60 | 120
178 | 220
They are sorted, and they have only positive subregions, the rest are assumed negative.
I would like to represent this data in a 2D graphic in R, but I don't know what type of graph I should use. It's something like this:
http://i.imgur.com/VaSvEKr.jpg
That kind of chart is called "Gantt", here's a possible way to draw it in base R :
# input example
DF <-
read.csv(text=
'"file","start","end"
"file1",10,20
"file1",60,120
"file1",178,220
"file2",10,20
"file2",25,100
"file2",130,140
"file2",190,210
"file3",0,50
"file3",55,400',stringsAsFactors=F)
minval <- min(DF$start) # or different if you know the limits
maxval <- max(DF$end) # or different if you know the limits
files <- rev(unique(DF$file))
nfiles <- length(files)
# empty plot to make space for everything
filehigh <- 1.0
plot(c(minval,maxval),c(filehigh/2,nfiles+filehigh/2),type='n', xlab='Time',ylab=NA,yaxt='n' )
# add y labels
axis(side=2,at=1:nfiles,labels=files,las=1)
# plot the rectangles
negcolor <- 'red'
poscolor <- 'green'
for(i in 1:nfiles){
file <- files[i]
subDF <- DF[DF$file == file,]
lastend <- minval
for(r in 1:nrow(subDF)){
yTop <- i+(filehigh/2)
yBottom <- i-(filehigh/2)
start <- subDF[r,'start']
end <- subDF[r,'end']
if(start > lastend){
rect(lastend,yBottom,start,yTop,col=negcolor )
}
rect(start,yBottom,end,yTop,col=poscolor)
lastend <- end
}
if(lastend < maxval){
rect(lastend,yBottom,maxval,yTop,col=negcolor )
}
}
Result :
I would like to sample values, but have a constraint in place that demands two values are at least window apart. This would be akin to sampling days in a year, but setting the window to be at least a fortnight apart. So far I've tried it like this
check.diff <- TRUE
window <- 14
while (check.diff == TRUE) {
sampled.session <- sort(sample(1:365, size = 5, replace = FALSE))
check.diff <- any(diff(sampled.session) < window)
}
This works nicely if the window constraint is small. If one specifies a rather large value, this can become an infinite loop. While I can insert all sorts of checks and maximum number of iterations, I was wondering if there's a smarter way of attacking this?
One way to do this is by removing candidates from the population from which you take the sample:
set.seed(42)
population <- 1:356
n_samples <- 5
window <- 14
sampled_session <- rep(sample(population, 1), n_samples) # initialize the vector
for (i in seq.int(2, n_samples)) {
borders <- sampled_session[i - 1] + (window - 1) * c(-1, 1)
days_in_window <- seq.int(borders[1], borders[2])
population <- setdiff(population, days_in_window)
sampled_session[i] <- sample(population, 1)
}
sort(sampled_session)
# [1] 90 193 264 309 326
diff(sort(sampled_session))
# [1] 103 71 45 17
Another way would be
set.seed(357)
population <- 1:357
n_samples <- 5
window <- 14
sampled.session <- numeric(n_samples)
for (i in seq_len(n_samples)) {
sampled.session[i] <- pick <- sample(population, 1)
population <- population[-which(population < pick + window & population > pick - window)]
}
sort(sampled.session)
[1] 19 39 111 134 267
Well, how about something like this.
window <- 14
sample_pair <- sample(1:365, size=2)
sample_pair[2] <- sample_pair[2] + window*(diff(foo)<window)
Then dump that pair into any larger sample group.
Or you could scale your entire sample set after drawing. Pseudocode:
samp.window <- diff(range(sample.set))
if (sample.window < window) sample.set <- sample.set *window/sample.window
Followed by a round or truncate if desired. Probably worth checking to make sure these distributions are uniform :-(
I'm having trouble finding a function in R that performs equal-frequency discretization. I stumbled on the 'infotheo' package, but after some testing I found that the algorithm is broken. 'dprep' seems to no longer be supported on CRAN.
EDIT :
For clarity, I do not need to seperate the values between the bins. I really want equal frequency, it doesn't matter if one value ends up in two bins. Eg :
c(1,3,2,1,2,2)
should give a bin c(1,1,2) and one c(2,2,3)
EDIT : given your real goal, why don't you just do (corrected) :
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
This returns a vector with indicators for which bin they are. But as some values might be present in both bins, you can't possibly define the bin limits. But you can do :
x <- rpois(50,5)
y <- EqualFreq2(x,15)
table(y)
split(x,y)
Original answer:
You can easily just use cut() for this :
EqualFreq <-function(x,n,include.lowest=TRUE,...){
nx <- length(x)
id <- round(c(1,(1:(n-1))*(nx/n),nx))
breaks <- sort(x)[id]
if( sum(duplicated(breaks))>0 stop("n is too large.")
cut(x,breaks,include.lowest=include.lowest,...)
}
Which gives :
set.seed(12345)
x <- rnorm(50)
table(EqualFreq(x,5))
[-2.38,-0.886] (-0.886,-0.116] (-0.116,0.586] (0.586,0.937] (0.937,2.2]
10 10 10 10 10
x <- rpois(50,5)
table(EqualFreq(x,5))
[1,3] (3,5] (5,6] (6,7] (7,11]
10 13 11 6 10
As you see, for discrete data an optimal equal binning is rather impossible in most cases, but this method gives you the best possible binning available.
This sort of thing is also quite easily solved by using (abusing?) the conditioning plot infrastructure from lattice, in particular function co.intervals():
cutEqual <- function(x, n, include.lowest = TRUE, ...) {
stopifnot(require(lattice))
cut(x, co.intervals(x, n, 0)[c(1, (n+1):(n*2))],
include.lowest = include.lowest, ...)
}
Which reproduces #Joris' excellent answer:
> set.seed(12345)
> x <- rnorm(50)
> table(cutEqual(x, 5))
[-2.38,-0.885] (-0.885,-0.115] (-0.115,0.587] (0.587,0.938] (0.938,2.2]
10 10 10 10 10
> y <- rpois(50, 5)
> table(cutEqual(y, 5))
[0.5,3.5] (3.5,5.5] (5.5,6.5] (6.5,7.5] (7.5,11.5]
10 13 11 6 10
In the latter, discrete, case the breaks are different although they have the same effect; the same observations are in the same bins.
How about?
a <- rnorm(50)
> table(Hmisc::cut2(a, m = 10))
[-2.2020,-0.7710) [-0.7710,-0.2352) [-0.2352, 0.0997) [ 0.0997, 0.9775)
10 10 10 10
[ 0.9775, 2.5677]
10
The classInt library is created "for choosing univariate class intervals for mapping or other graphics purposes". You can just do:
dataset <- c(1,3,2,1,2,2)
library(classInt)
classIntervals(dataset, 2, style = 'quantile')
where 2 is the number of bins you want and the quantile style provides quantile breaks. There are several styles available for this function: "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust",
"bclust", "fisher", or "jenks". Check docs for more info.
Here is a function that handle the error :'breaks' are not unique, and automatically select the closest n_bins value to the one you setted up.
equal_freq <- function(var, n_bins)
{
require(ggplot2)
n_bins_orig=n_bins
res=tryCatch(cut_number(var, n = n_bins), error=function(e) {return (e)})
while(grepl("'breaks' are not unique", res[1]) & n_bins>1)
{
n_bins=n_bins-1
res=tryCatch(cut_number(var, n = n_bins), error=function(e) {return (e)})
}
if(n_bins_orig != n_bins)
warning(sprintf("It's not possible to calculate with n_bins=%s, setting n_bins in: %s.", n_bins_orig, n_bins))
return(res)
}
Example:
equal_freq(mtcars$carb, 10)
Which retrieves the binned variable and the following warning:
It's not possible to calculate with n_bins=10, setting n_bins in: 5.
Here is a one liner solution inspired by #Joris' answer:
x <- rpois(50,5)
binSize <- 5
desiredFrequency = floor(length(x)/binSize)
split(sort(x), rep(1:binSize, rep(desiredFrequency, binSize)))
Here's another solution using mltools.
set.seed(1)
x <- round(rnorm(20), 2)
x.binned <- mltools::bin_data(x, bins = 5, binType = "quantile")
table(x.binned)
x.binned
[-2.21, -0.622) [-0.622, 0.1) [0.1, 0.526) [0.526, 0.844) [0.844, 1.6]
4 4 4 4 4
We can use package cutr with feature what = "rough", the look of labels can be customized to taste :
# devtools::install_github("moodymudskipper/cutr")
library(cutr)
smart_cut(c(1, 3, 2, 1, 2, 2), 2, "rough", brackets = NULL, sep="-")
# [1] 1-2 2-3 1-2 1-2 2-3 2-3
# Levels: 1-2 < 2-3