I use Julia with Plots , to generate my plots.
I want to plot data (A,B) and i know that all interesting data lies in two region of A. The two regions should be plotted between each other in one plot.
My A-data is evenly spaced. So what i did was cutting out my interesting pieces and glued them into one object.
My problem is that i don't know how to manipulate the scale on the x-axis.
When I just plot the B data against their array index, I basically get the form I want. I just need the numbers from A on the x-axis.
I give here a toy example
using Plots
N=5000
B=rand(N)
A=(1:1:N)
xl_1=100
xu_1=160
xl_2=600
xu_2=650
A_new=vcat(A[xl_1:xu_1],A[xl_2:xu_2])
B_new=vcat(B[xl_1:xu_1],B[xl_2:xu_2])
plot(A_new,B_new) # This leaves the spacing between the data explicit
plot(B_new) # This creats basically the right spacing, but
# without the right x axis grid
I did not find anything how one can use two successive xlims, therefore i try it this way.
You can't pass two successive xlims, because you can't have a break in the axis. That is by design in Plots.
So your possibilities are: 1) to have two subplots with different parts of the plot, or 2) to plot with the index, and just change the axis labels.
The second approach would use a command like xticks = ([1, 50, 100, 150], ["1", "50", "600", "650"], but I'd recommend the first as it's strictly speaking a more correct way of displaying the data:
plot(
plot(A[xl_1:xu_1], B[xl_1:xu_1], legend = false),
plot(A[xl_2:xu_2], B[xl_2:xu_2], yshowaxis = false),
link = :y
)
Related
Is there a way to add text labels to the points on a scatterplot? Each point has a string associated with it as its label. I like to label only as many points as it can be done withour overlapping?
df = DataFrame(x=rand(100), y=rand(100), z=randstring.(fill(5,100)))
scatter(df.x, df.y)
annotate!(df.x, df.y, text.(df.z))
using StatisticalGraphics package:
using InMemoryDatasets
using StatisticalGraphics
using Random
ds=Dataset(x=rand(100), y=rand(100), z=randstring.(fill(5,100)))
sgplot(ds, Scatter(x=:x,y=:y,labelresponse=:z))
Here is something I wrote for Makie.jl that suited my needs:
Non-overlapping labels for scatter plots
It works best for single line, short text labels, and where all labels have similar lengths with one another. It is still WIP, as I am working to improve it for placement of longer text labels.
Here are some samples of what it can do:
Essentially, you call function viz to plot a scatter chart on your (x, y) data set:
resolution = (600, 600) # figure size (pixels) -- need not be a equal dimension
fontpt = 12 # label font size (points)
flabel = 1.5 # inflate the label size to create some margins
fdist = 0.3 # inflate the max. distance between a label and its
# anchor point before a line is drawn to connect. them.
# Smaller values would create more connecting lines.
viz(x, y, labels; resolution=resolution, flabel=flabel, fdist=fdist, fontpt=fontpt)
where labels is a list containing the text labels for every pair of (x, y) point.
You can use the extra named argument series_annotations in the scatter function. Here us an example where I use "1", "2", etc. as labels:
using Plots
x = collect(0:0.1:2)
y = sinpi.(x)
scatter(x, y, series_annotations = text.(1:length(x), :top))
Avoiding overlaps is more difficult. You could customize your label with empty "" for duplicates where the points are the same, or see for Makie: Makie: Non-overlapping label placement algorithm for scatter plots
iFacColName <- "hireMonth"
iTargetColName <- "attrition"
iFacVector <- as.factor(c(1,1,1,1,10,1,1,1,12,9,9,1,10,12,1,9,5))
iTargetVector <- as.factor(c(1,1,0,1,1,0,0,1,1,0,1,0,1,1,1,1,1))
sp <- spineplot(iFacVector,iTargetVector,xlab=iFacColName,ylab=iTargetColName,main=paste0(iFacColName," vs. ",iTargetColName," Spineplot"))
spLabelPass <- sp[,2]/(sp[,1]+sp[,2])
spLabelFail <- 1-spLabelPass
text(seq_len(nrow(sp)),rep(.95,length(spLabelPass)),labels=as.character(spLabelPass),cex=.8)
For some reason, the text() function only plots one label far to the right of the graph. I have used this format to apply data labels to other types of graphs, so I am confused.
EDIT: added more code to make example work
You're not placing your labels inside the plotting region. It only extends to around 1.3 on the x axis. Try plotting something like
text(
cumsum(prop.table(table(iFacVector))),
rep(.95, length(spLabelPass)),
labels = as.character(round(spLabelPass, 1)),
cex = .8
)
and you'll get something like
This is obviously not the right positions for the labels, but you should be able to figure that out by yourself. (You're going to have to subtract half of the frequency for each bar from the cumulative frequency and account for the fact that the bars are padded with some amount of whitespace.
I am trying to plot several histograms for the same data set, but with different numbers of bins. I am using Gadfly.
Suppose x is just an array of real values, plotting each histogram works:
plot(x=x, Geom.histogram(bincount=10))
plot(x=x, Geom.histogram(bincount=20))
But I'm trying to put all the histograms together. I've added the number of bins as another dimension to my data set:
x2 = vcat(hcat(10*ones(length(x)), x), hcat(20*ones(length(x)), x)
df = DataFrame(Bins=x2[:,1], X=x2[:,2])
Is there any way to send the number of bins (the value from the first column) to Geom.histogram when using Geom.subplot_grid? Something like this:
plot(df, x="X", ygroup="Bins", Geom.subplot_grid(Geom.histogram(?)))
I think you would be better off not using subplot grid at that point, and instead just combine them with vstack or hstack. From the docs
Plots can also be stacked horizontally with ``hstack`` or vertically with
``vstack``. This allows more customization in regards to tick marks, axis
labeling, and other plot details than is available with ``subplot_grid``.
For educational purpose I'm trying to plot a singel horizontal "numberline" with some datapoints with labels in R. I came this far;
library(plotrix)
source("spread.labels.R")
plot(0:100,axes=FALSE,type="n",xlab="",ylab="")
axis(1,pos=0)
spread.labels(c(5,5,50,60,70,90),rep(0,6),ony=FALSE,
labels=c("5","5","50","60","70","90"),
offsets=rep(20,6))
This gave me a numberline with smaller lines pointing up to (and a little bit "in") the labels from where the datapoints should lie on the numberline - but without the points itself. Can anyone give me additional or alternative R-codes for solving thess problems:
- datapoints itself still missing are not plotted,
- and labels maybe not evenly divided over the whole numberline,
- and lines come into the labels and not merely point to the labels
Thank a lot,
Benjamin Telkamp
I usually like to create plots using primitive base R graphics functions, such as points(), segments(), lines(), abline(), rect(), polygon(), text(), and mtext(). You can easily create curves (e.g. for circles) and more complex shapes using segments() and lines() across granular coordinate vectors that you define yourself. For example, see Plot angle between vectors. This provides much more control over the plot elements you create, however, it often takes more work and careful coding than more "pre-packaged" solutions, so it's a tradeoff.
For your case, it sounds to me like you're happy with what spread.labels() is trying to do, you just want the following changes:
add point symbols at the labelled points.
prevent overlap between labels and lines.
Here's how this can be done:
## define plot data
xlim <- c(0,100);
ylim <- c(0,100);
px <- c(5,5,50,60,70,90);
py <- c(0,0,0,0,0,0);
lx.buf <- 5;
lx <- seq(xlim[1]+lx.buf,xlim[2]-lx.buf,len=length(px));
ly <- 20;
## create basic plot outline
par(xaxs='i',yaxs='i',mar=c(5,1,1,1));
plot(NA,xlim=xlim,ylim=ylim,axes=F,ann=F);
axis(1);
## plot elements
segments(px,py,lx,ly);
points(px,py,pch=16,xpd=NA);
text(lx,ly,px,pos=3);
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.