I used the information provided in
How to label vector in gnuplot
to label some of my plotted vectors. The issue is that I have twenty vectors, but I don't want twenty different labels. Every two vectors have the same label. E.g. the 0th and the 1st have label "1", the 2nd and 3rd have label "2", et cetera. How can I create a custom labeling scheme to do this without labeling each vector manually? Doing this manually is not practical because I have several files and twenty vectors to label for each file.
This is my command:
plot "gnuCors.txt" using 1:2:3:4 with vectors, "gnuCors.txt" u 5:6:0 with labels left
So, your vectors get labeled with the value int($0)/2 + 1:
plot "gnuCors.txt" using 1:2:3:4 with vectors,\
"" u 5:6:(int($0)/2 + 1) with labels left offset 0.5
Note, that this should work fine, but sometimes gnuplot has troubles to automatically convert the number given in the last using column. Then you should explicitely format the values with sprintf:
plot "gnuCors.txt" using 1:2:3:4 with vectors,\
"" u 5:6:(sprintf("%d", int($0)/2 + 1)) with labels left offset 0.5
Related
Is there a way to add text labels to the points on a scatterplot? Each point has a string associated with it as its label. I like to label only as many points as it can be done withour overlapping?
df = DataFrame(x=rand(100), y=rand(100), z=randstring.(fill(5,100)))
scatter(df.x, df.y)
annotate!(df.x, df.y, text.(df.z))
using StatisticalGraphics package:
using InMemoryDatasets
using StatisticalGraphics
using Random
ds=Dataset(x=rand(100), y=rand(100), z=randstring.(fill(5,100)))
sgplot(ds, Scatter(x=:x,y=:y,labelresponse=:z))
Here is something I wrote for Makie.jl that suited my needs:
Non-overlapping labels for scatter plots
It works best for single line, short text labels, and where all labels have similar lengths with one another. It is still WIP, as I am working to improve it for placement of longer text labels.
Here are some samples of what it can do:
Essentially, you call function viz to plot a scatter chart on your (x, y) data set:
resolution = (600, 600) # figure size (pixels) -- need not be a equal dimension
fontpt = 12 # label font size (points)
flabel = 1.5 # inflate the label size to create some margins
fdist = 0.3 # inflate the max. distance between a label and its
# anchor point before a line is drawn to connect. them.
# Smaller values would create more connecting lines.
viz(x, y, labels; resolution=resolution, flabel=flabel, fdist=fdist, fontpt=fontpt)
where labels is a list containing the text labels for every pair of (x, y) point.
You can use the extra named argument series_annotations in the scatter function. Here us an example where I use "1", "2", etc. as labels:
using Plots
x = collect(0:0.1:2)
y = sinpi.(x)
scatter(x, y, series_annotations = text.(1:length(x), :top))
Avoiding overlaps is more difficult. You could customize your label with empty "" for duplicates where the points are the same, or see for Makie: Makie: Non-overlapping label placement algorithm for scatter plots
I am trying to plot a time series graph, but am having issues getting it to be a line graph while showing the decades at the bottom.
My data set has the decades (as factors) next to performance (integer)
If I write
plot(StockPerformance$Decade, StockPerformance$Performance)
I will get a graph that has horizontal lines in it
PLOT PICTURE
adding,
type ="o"
like this:
plot(StockPerformance$Decade, StockPerformance$Performance, type ="o")
doesn't change it....
In R, when you read/create a data frame using read.table (or a variant thereof) or make it using data.frame, it tries to figure out what you have, and treat it appropriately. Specifically, inputs with character vectors (like "1830s" get converted to factors.
Factors are a way to efficiently store character strings - which was a lot more important when R was first created than now. The important thing for you is that characters don't have any order to them unless you put it there, so R automatically makes boxplots out of them. That's why you are seeing lines - they are boxplots with only one point.
To get around this, you need to convert them to numbers for the purpose of plotting. Then, you need to "fix" the axes afterwards. Here's how:
plot(Performance ~ as.numeric(Decade),
data = StockPerformance,
xlab = "Decade", # otherwise we have "as.numeric(Decade)
xaxt = 'n', # removes default axis ticks and labels
pch = 1 # default open circle. Change the number to get other options. 16 and 20 are both closed circles (20 is small, 16 is big)
)
with(StockPerformance, # This just makes it so I don't have to type StockPerformance twice below.
axis(1, at = 1:nlevels(Decade),
value = levels(Decade)
))
I am using RStudio and I am having an issue with a ggplot2 graph. My data set has around 86,200 observations; so I am expecting these points to show up in my plot but strangely it is showing only one point in the middle of the plot.
ggplot(mydata,aes("Package Revenue EXCL VAT","Total Spending",colour=PropertyCode, size=5, alpha=0.5)) + geom_point()
The 2 columns used for the scatterplot are numeric columns. Running a str(mydata) gives the following for those 2 columns:
Package Revenue EXCL VAT: num
Total Spending: num
And this how the plot shows in the plot viewer window of RStudio (I have excluded the legends from the screen capture):
Any idea what I am doing wrong?
As the comments said, use identifiers, not character strings. As you can see in your plot, you have one point, and its coordinates are, literally, the discrete values x = “Package Revenue EXCL VAT” and y = “Total Spending”.
In addition, you need to remove the fixed properties from the aesthetics and put them into the geometry instead: otherwise ggplot2 will map them to constant but arbitrary values (i.e. not the ones you want).
ggplot(mydata) +
aes(`Package Revenue EXCL VAT`, `Total Spending`, color = PropertyCode) +
geom_point(size = 5, alpha = 0.5)
(With added formatting cleanup.)
In case that’s unclear, the backticks in the above code don’t delimit character strings, they delimit identifiers: in R, `foo` is identical to foo. However, backticks allows you to use otherwise invalid characters in the identifier. This includes spaces.
The matter is confused by the fact that R allows you to use quoted strings instead of backtick identifiers in some cases. But aes isn’t one of these cases, and if you want to keep your sanity you shouldn’t use this confusing feature of R.
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
I have a data.frame with 72 discrete categories. When I colour by these categories I get 72 different keys in the legend. I would prefer to only use every 2nd or 3rd key.
Any idea how I cut down on the number of lines in the legend?
Thanks
H.
Code that reproduces my problem is given below.
t=seq(0,2*pi,length.out=10)
RR=rep(cos(t),72)+0.1*rnorm(720)
dim(RR)=c(10,72)
stuff=data.frame(alt,RR)
names(stuff)=c("altitude",
paste(rep(15:20,each=12),
rep(c("00","05",as.character(seq(from=10,to=55,by=5))),6),
sep=":"))
bb=melt(stuff,id.vars=c(1))
names(bb)[2:3]=c("period","velocity")
ggplot(data=bb,aes(altitude,velocity))+geom_point(aes(color=period))+geom_smooth()
You can treat your period values as numeric in geom_point(). That will make colors as gradient (values from 1 to 72 corresponding to number of levels). Then with scale_colour_gradient() you can set number of breaks you need and add labels as your actual period values.
ggplot(data=bb,aes(altitude,velocity))+
geom_point(aes(color=as.numeric(period)))+
geom_smooth()+
scale_colour_gradient("Period",low="red", high="blue",
breaks=c(seq(1,72,12),72),labels=unique(bb$period)[c(seq(1,72,12),72)])
It looks hard to customize the legend here for a discrete_color_scale!So I propose a lattice solution. You need just to give the right text to the auto.key list.
libarry(latticeExtra)
labels.time <- unique(bb$period)[rep(c(F,F,T),each=3)] ## recycling here to get third label
g <- xyplot(velocity~altitude, data=bb,groups=period,
auto.key=list(text=as.character(labels.time),
columns=length(labels.time)/3),
par.settings = ggplot2like(), axis = axis.grid,
panel = function(x,y,...){
panel.xyplot(x,y,...)
panel.smoother(x,y,...)
})