Passing labels to Plots.jl histogram - julia

I am new to Julia and was wondering how to pass labels to the Histogram function in Plots.jl package.
using Plots
gr()
histogram(
data[:sentiment_labels],
title = "Hstogram of sentiment labels",
xlabel = "Sentiment",
ylabel = "count",
label = ["Negative" "Positive" "Neutral"],
fillcolor = [:coral,:dodgerblue,:slategray]
)
Only the first labels "Negative" appears in the plot.

So the short answer is: there's only one label in your plot because there's only one data series in your plot - a histogram only plots one data series, which has one label attached to it. It might seem a bit unusual that you get multiple colours but only one legend, so I'll break down why that happens as it's instructive and a frequent source of confusion for Plots.jl users I believe:
It is a bit of a coincidence that you are getting three different colours for the bars you are plotting. What happens here is that you are providing a single color argument that is cycled through for the bars in the histogram. You can see this if you provide more colours to your histogram call:
using Plots
sentiment_labels = [fill(-1, 200); fill(0, 700); fill(1, 100)]
histogram(
sentiment_labels,
fillcolor = [:coral, :red, :green, :dodgerblue, :slategray]
)
gives:
What's happening here? We have provided five colours, and it turns out that your histogram only has a bar every five increments (there are bins between -1, 0, and 1, it's just that there are zero observations in those bins). Therefore every fifth bar has the same colour, and with the zero bars disappearing, we only end up with one colour visible in the plot.
Another way of seeing this is having data that's more continuous than your sentiment labels:
cont_data = rand(1_000)
histogram(
cont_data,
fillcolor = [:coral, :red, :green, :dodgerblue, :slategray]
)
gives:
So actually there's only one colour argument passed in here. The crucial difference between colours and labels in your histogram call is that one is a row, the other a column vector:
julia> ["Negative" "Neutral" "Positive"]
1×3 Array{String,2}:
"Negative" "Neutral" "Positive"
julia> [:coral, :slategrey, :dodgerblue]
3-element Array{Symbol,1}:
:coral
:slategrey
:dodgerblue
Plots will interpret the first of these as applying to three different series ("Negative" is the label for the first series, "Neutral" for the second, "Positive" for the third), while it interprets the second as applying to one series only (so :coral, :slategrey, :dodgerblue are all colours for the first series passed in. This is quite a subtle distinctions in Plots.jl, which often trips people up (me included!)
To get three labels, you should therefore have three series for which you plot histograms. One way of doing this is to split your vector of sentiment labels into three vectors:
histogram(
[filter(x -> x == y, sentiment_labels) for y ∈ -1:1],
fillcolor = [:coral :dodgerblue :slategray],
label = ["Negative" "Positive" "Neutral"]
)
gives:
Although I would probably argue that in your case a histogram isn't the right tool - if your labels are only ever going to be negative, neutral and positive, a simple bar chart will do, as you don't need the automatic binning functionality that a histogram provides. So I would probably do:
bar(
title = "Count of sentiment labels",
xlabel = "Sentiment",
ylabel = "count",
[-1 0 1], [[sum(sentiment_labels .== x)] for x ∈ -1:1],
label = ["Negative" "Positive" "Neutral"],
fillcolor = [:coral :dodgerblue :slategray],
linecolor = [:coral :dodgerblue :slategray],
xticks = -1:1
)
to get:

Related

Labels on grouped bars in barplot() [duplicate]

This question already has an answer here:
Show element values in barplot
(1 answer)
Closed 4 years ago.
I have a barplot with grouped bars. Is it possible to include a label for each bar ? Example of plot without bar labels:
test <- structure(c(0.431031856834624, 0.54498742364355, 0.495317895592119,0.341002949852507, 0.40229990800368, 0.328769657724329,0.258600583090379,0.343181818181818, 0.260619469026549), .Dim = c(3L, 3L), .Dimnames = list(
c("2015", "2016", "2017"), c("a", "b", "c")))
barplot(test,ylim=c(0,1),beside=T)
p <- barplot(test, ylim=c(0, 1), beside=T)
text(p, test + .05*sign(test), labels=format(round(test, digits=2), nsmall=2))
The last line adds the labeling over the bar plots.
p takes the return values of the barplot() which are the x-axis bar positions.
In this example this is of the format 3x3 matrix.
text() needs then p for his x= argument. And for his y= argument it needs a slightly offsetted value than its bar plot heights (test). sign() determines the direction (above or below, +1 or -1) of the bar and .05 I determined empirically by trying, it is dependent on your values of the table.
So, x= and y= are the x and y coordinates for the labeling.
And finally, labels= determines which text should be printed.
The combination of format() and round() gives you full control over how many digits you want to display and that the display is absolutely regular in turns of number of digits displayed, which is not, if you use only round().
With xpd=T you could determine, whether labeling is allowed to go outside of region or not.
cex= could determine the fontsize of the label,
col= the colouring and font= the font.
alternatively, you can give just test for y= and determine via pos=3 that it should be above and offset=1 how many characterwidths the offset of the text shoul be.
p <- barplot(test, ylim=c(0, 1), beside=T)
text(x=p, y=test, pos=3, offset=1, labels=format(round(test, digits=2), nsmall=2))
You can find plenty of more instructions by looking into the documentation by
?text
# and
?barplot
in the R console
You can add a label using text function by extending your barplot. You can play with the parameters as you wish. Here is the sample code and its output.
x= barplot(test,ylim=c(0,1),beside=T)
text(x, test, labels=test, pos=1, offset=.5, col="red", srt = 90) #srt is used for vertical labels
If you really want to make a better plot, I would recommend ggplot as it has several other features like adding a theme to your plots and it is more easy for customizations.
If you're looking to label ever bar's category not it's value you can do something like this
allPermutations <- unlist(lapply(colnames(test), function(x) paste(x, rownames(test)) ))
barplot(test,ylim=c(0,1),beside=T, names.arg = allPermutations, las=2)
the file line gets all the combinations of categories. The plot call allows you to specify individual values with "names.arg" while las=2 rotates the names so it shows a bit nicer

Base Plot, correctly defining axis [duplicate]

How can I change the spacing of tick marks on the axis of a plot?
What parameters should I use with base plot or with rgl?
There are at least two ways for achieving this in base graph (my examples are for the x-axis, but work the same for the y-axis):
Use par(xaxp = c(x1, x2, n)) or plot(..., xaxp = c(x1, x2, n)) to define the position (x1 & x2) of the extreme tick marks and the number of intervals between the tick marks (n). Accordingly, n+1 is the number of tick marks drawn. (This works only if you use no logarithmic scale, for the behavior with logarithmic scales see ?par.)
You can suppress the drawing of the axis altogether and add the tick marks later with axis().
To suppress the drawing of the axis use plot(... , xaxt = "n").
Then call axis() with side, at, and labels: axis(side = 1, at = v1, labels = v2). With side referring to the side of the axis (1 = x-axis, 2 = y-axis), v1 being a vector containing the position of the ticks (e.g., c(1, 3, 5) if your axis ranges from 0 to 6 and you want three marks), and v2 a vector containing the labels for the specified tick marks (must be of same length as v1, e.g., c("group a", "group b", "group c")). See ?axis and my updated answer to a post on stats.stackexchange for an example of this method.
With base graphics, the easiest way is to stop the plotting functions from drawing axes and then draw them yourself.
plot(1:10, 1:10, axes = FALSE)
axis(side = 1, at = c(1,5,10))
axis(side = 2, at = c(1,3,7,10))
box()
I have a data set with Time as the x-axis, and Intensity as y-axis. I'd need to first delete all the default axes except the axes' labels with:
plot(Time,Intensity,axes=F)
Then I rebuild the plot's elements with:
box() # create a wrap around the points plotted
axis(labels=NA,side=1,tck=-0.015,at=c(seq(from=0,to=1000,by=100))) # labels = NA prevents the creation of the numbers and tick marks, tck is how long the tick mark is.
axis(labels=NA,side=2,tck=-0.015)
axis(lwd=0,side=1,line=-0.4,at=c(seq(from=0,to=1000,by=100))) # lwd option sets the tick mark to 0 length because tck already takes care of the mark
axis(lwd=0,line=-0.4,side=2,las=1) # las changes the direction of the number labels to horizontal instead of vertical.
So, at = c(...) specifies the collection of positions to put the tick marks. Here I'd like to put the marks at 0, 100, 200,..., 1000. seq(from =...,to =...,by =...) gives me the choice of limits and the increments.
And if you don't want R to add decimals or zeros, you can stop it from drawing the x axis or the y axis or both using ...axt. Then, you can add your own ticks and labels:
plot(x, y, xaxt="n")
plot(x, y, yaxt="n")
axis(1 or 2, at=c(1, 5, 10), labels=c("First", "Second", "Third"))
I just discovered the Hmisc package:
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
library(Hmisc)
plot(...)
minor.tick(nx=10, ny=10) # make minor tick marks (without labels) every 10th

axis length (not scaling!) in a scatter plot

How to change the axis length? for ex:
s <- data.table(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
I get a scatter plot where the data is at the edges of the plot (a lot of white space in the graph). changing the xaxp values doesn't help (tried xaxp=c(-1, +2,4)) but nothing happened) and when I try to define it a factor I get a box plot. I know I can "squeeze" it when i save to .png but is there any other way?
I tried to upload pictures to convey the problem but I don't have enough reputation.
edit-thanks for whoever uploaded it (although the axis are reversed - wave is the x and V2 is the y). the thing is that there is a lot of "free space" between the 1st and the 2nd wave. the position is perfect when i define the wave a factor (it's centered and each factor is half the axis length) but it keeps giving me a box plot!
You can add a lot of values to your plot function, like colour, title, and also the limits of the axsis
Your code:
s <- data.frame(school=rep(1:3,5), wave=c(rep(1,7), rep(2,8)), v1=rpois(15,10))
plot(s$wave,s$v2)
And now just add some more:
plot(
x = s$wave,
y = s$v2,
col = "red",
main = "This is my title",
xlab = "the label of the x-axis",
ylab = "the label of the y-axis",
xlim = c(-5, 5), # the limits of the x-axis,
ylim = c(-4, 10) # the limits of the y-axis
)
You can add much more like size and type of the points ...
just as jlhoward mentioned
i found a function in the "lattice" package that does exactly what i want - a boxplot without the box.
the function is called stripplot.
http://www.math.ucla.edu/~anderson/rw1001/library/base/html/stripplot.html
thank you all for the help

Labels/points colored by category with PCA

I'm using prcomp to do PCA analysis in R, I want to plot my PC1 vs PC2 with different color text labels for each of the two categories,
I do the plot with:
plot(pca$x, main = "PC1 Vs PC2", xlim=c(-120,+120), ylim = c(-70,50))
then to draw in all the text with the different colors I've tried:
text(pca$x[,1][1:18], pca$[,1][1:18], labels=rownames(cava), col="green",
adj=c(0.3,-0.5))
text(pca$x[,1][19:35], pca$[,1][19:35], labels=rownames(cava), col="red",
adj=c(0.3,-0.5))
But R seams to plot 2 numbers over each other instead of one, the pcs$x[,1][1:18] plots the correct points I know because if I use that plot the points it works and produces the same plot as plot(pca$x).
It would be great if any could help to plot the labels for the two categories or
even plot the points different color to make it easy to differentiate between the plots easily.
You need to specify your x and y coordinates a bit differently:
text(pca$x[1:18,1], pca$x[1:18,2] ...)
This means take the first 18 rows and the first column (which is PC1) for the x coord, etc.
I'm surprised what you did doesn't throw an error.
If you want the points themselves colored, you can do it this way:
plot(pca$x, main = "PC1 Vs PC2", col = c(rep("green", 18), rep("red", 18)))

change distance of x-axis labels from axis in sciplot bargraph

I'm constructing a plot using bargraph.CI from sciplot. The x-axis represents a categorical variable, so the values of this variable are the names for the different positions on the x-axis. Unfortunately these names are long, so at default settings, some of them just disappear. I solved this problem by splitting them into multiple lines by injecting "\n" where needed. This basically worked, but because the names are now multi-line, they look too close to the x-axis. I need to move them farther away. How?
I know I can do this with mgp, but that affects the y-axis too.
I know I can set axisnames=FALSE in my call to barplot.CI, then use axis to create a separate x-axis. (In fact, I'm already doing that, but only to make the x-axis extend farther than it would by default- see my code below.) Then I could give the x-axis its own mgp parameter that would not affect the y-axis. But as far as I can tell, axis() is well set up for ordinal or continuous variables and doesn't seem to work great for categorical variables. After some fiddling, I couldn't get it to put the names in the right locations (i.e. right under their correspondence bars)
Finally, I tried using mgp.axis.labels from Hmisc to set ONLY the x-axis mgp, which is precisely what I want, but as far as I could tell it had no effect on anything.
Ideas? Here's my code.
ylim = c(0.5,0.8)
yticks = seq(ylim[1],ylim[2],0.1)
ylab = paste(100*yticks,"%",sep="")
bargraph.CI(
response = D$accuracy,
ylab = "% Accuracy on Test",
ylim = ylim,
x.factor = D$training,
xlab = "Training Condition",
axes = FALSE
)
axis(
side = 1,
pos = ylim[1],
at = c(0,7),
tick = TRUE,
labels = FALSE
)
axis(
side = 2,
tick = TRUE,
at = yticks,
labels = ylab,
las = 1
)
axis works fine with cateory but you should set the right ticks values and play with pos parameter for offset translation. Here I use xvals the return value of bargraph.CI to set àxis tick marks.
Here a reproducible example:
library(sciplot)
# I am using some sciplot data
dat <- ToothGrowth
### I create along labels
labels <- c('aaaaaaaaaa\naaaaaaaaaaa\nhhhhhhhhhhhhhhh',
'bbbbbbbbbb\nbbbbbbbbbbb\nhhhhhhhhhhhhhh',
'cccccccccc\nccccccccccc\ngdgdgdgdgd')
## I change factor labels
dat$dose <- factor(dat$dose,labels=labels)
ll <- bargraph.CI(x.factor = dose, response = len, data = dat,axisnames=FALSE)
## set at to xvals
axis(side=1,at=ll$xvals,labels=labels,pos=-2,tick=FALSE)

Resources