I'm new with R and I'd like to create a plot with irregular intervals like the second one already suggested on another discussion (Uneven axis in base r plot). I'm not able to run the given script with my data.
I'd like to add more space between each Y label from 0 to 2000.
I'd also like to have less space between each Y label from 2000 to 7000. This would help me to distinguish the different lines in my graph that are really close to each other.
I don't want to use the ggplot function if it is possible.
Thanks a lot!!
Here what I've done (see my graph):
axis(2, c(seq(0, 2000, by=250), seq(2000,7000, by = 1000)), las = 1)
My actual graph
Related
I want to do a simple plot using Plots.jl.
I calculated a rate for each month over a couple of years. The problem that I am facing now is that I want to add a trendline to this plot. I did not find how this is done in Julia or Plots, if this is somewhere, please tell me.
My second question is that as I just get a vector with lets say 150 elements, each for a month, Plots.jl just gives me numbers on the x-axis for 0, 50, 100 and 150 with horizontal lines. I would like to change this to every 12 numbers one of these lines plus the year as a label on the axis.
I hope my question is clear, and thank you very much in advance.
Cheers
No fancy features needed if I understand your question correctly.
using Plots
dates = 1:150
ticks = 1:12:150
ticks_labels = 0:12
values = rand(150).+dates*0.01
plot(dates, values, xticks = (ticks, ticks_label), label="my series")
bhat = [dates ones(150)]\values
Plots.abline!(bhat..., label = "trendline")
output ->
Plots now has a simple keyword option for adding a trend line.
using Plots
scatter(collect(1:10),collect(1:10)+rand(10),smooth=:true)
I want to change x-axis in my graphic, but it doesn't work properly with axis(). Datas in the graphic are daily datas and I want to show only years. Hope someone understands me and find a solution. This is how it looks like now: enter image description here and this is how it looks like with the code >axis (1, at = seq(1800, 1975, by = 25), las=2): enter image description here
Without a reproducible code is not easy to get what could be the problem. I try a "quick and dirt" approach.
High level plots are composed by elements that are sub-composed themselves. Hence, separate drawing commands could turn in use by allowing a finer control on the plotting procedure.
In practice, the first thing to do is plot "nothing".
> plot(x, y, type = "n", xlab = "", ylab = "", axes = F)
type = "n" causes the data to not be drawn. axes = F suppresses the axis and the box around the plot. In spite of that, the plotting region is ready to show the data.
The main benefit is that now the plotting area is correctly dimensioned. Try now to add the desired x axis as you tried before.
> points(x, y) # Plots the data in the area
> axis() # Plots the desired axis with your scale
> title() # Plots the desired titles
> box() # Prints the box surrounding the plot
EDITED based on comment by #scoa
As a quick and dirty solution, you can simply enter the following line after your plot() line:
# This reads as, on axis x (1), anchored at the first (day) value of 0
# and last (day) value of 63917 with 9131 day year increments (by)
# and labels (las) perpendicular (2) to axis (for readability)
# EDITED: and AT the anchor locations, put the labels
# 1800 (year) to 1975 (year) in 25 (year) increments
axis (1, at = seq(0, 63917, by = 9131), las=2, labels=seq(1800, 1975, by=25));
For other parameters, check out ?axis. As #scoa mentioned, this is approximate. I have used 365.25 as a day-to-year conversion, but it's not quite right. It should suffice for visual accuracy at the scale you have provided. If you need precise conversion from days to years, you need to operate on your original data set first before plotting.
I'm working on trying to create a key for a heatmap, but as far as I know, I cannot use the existing tools for adding a legend since I've generated the colors myself (I manually turn a scaled variable into rgb values for a short rainbow ( [255,0,0] to [0,0,255] ).
Basically, all I want to do is use the rightmost 10th of the screen to create a rectangle with these 10 colors: "#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38", "#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000"
with three numerical labels - at 0, max/2, and max
In essence, I want to manually produce an object that looks like a rudimentary heatmap color key.
As far as I know, split.screen can only split the screen in half, which isn't what I'm looking for. I want the graphic I already know how to produce to take up the leftmost 90% of the screen, and I want this colored rectangle to take up the other 10%.
Thanks.
EDIT: I greatly appreciate the advice about the best way to the the plot - that said, I still would like to know the best way to do the task originally asked - creating the legend by hand; I already am able to produce the exact heatmap graphic that I'm looking for - the false coloring wasn't the only problem with ggplot that I was having - it was just the final factor convincing me to switch. I need a non ggplot solution.
EDIT #2: This is close to the solution I am looking for, except this only goes up to 10 instead of accepting a maximum value as a parameter (I will be running this code on multiple data-sets, all with different maximum values - I want the legend to reflect this). Additionally, if I change the size of the graph, the key falls apart into disconnected squares.
Take a look at the layouts function (link). I think you want something like this:
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
## plot heatmap
## plot legend
I would also recommend the ggplot2 package and the geom_tile function which will take care of all of this for you.
Assuming your data is in a data frame with the x and y coordinates and heatmap value (e.g. gdat <- data.frame(x_coord=c(1,2,...), y_coord=c(1,1,...), val=c(6,2,...))) Then you should be able to produce your desired heat map plot with the following ggplot command:
ggplot(gdat) + geom_tile(aes(x=x_coord, y=y_coord, fill=val)) +
scale_fill_gradient(low="#0000FF", high="#FF0000")
To get your data into the following format you may want to look into the very useful reshape2 package.
Given a script no ggplot restriction on this answer here is how one could produce the plot with just base R.
colors <- c("#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38",
"#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000")
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
plot(rnorm(20), rnorm(20), col=sample(colors, 20, replace=TRUE))
par(mar=c(0,0,0,0))
plot(x=rep(1,10), y=1:10, col=colors, pch=15, cex=7.1)
You may have to adjust the cex for your device.
I am currently trying to generate an automated script to create a labeled line graph. This line graphs will have 1 or more maxima that if they are above a certain threshold must be labeled.
Currently I can label the maxima just fine.
For that I am using textxy() in the calibrate package
However when multiple peaks occur things get more complicated:
It quickly becomes difficult to see where the label is assigned to. So what I would like is a line from the label to the top (or just below) of the peak.
I've looked around all day today with no luck. I tried everything in Intelligent point label placement in R
wordcloud, but that unfortunately doesn't allow you to offset the labels, and fails if you have only one label.
identify, is much too slow. I need to be able to automate this to do thousands of images a day.
pointLabel, thigmophobe.labels both didn't work as they don't draw a line, and I am not dealing with lots of labels anyway.
I also tried manually drawing an arrow between the label and the point, but that got very time consuming.
Does anyone know of any package or easy way to do this? Is this not possible to automate?
Thanks!
Cameron
You can try the labelPeaks function provided by the MALDIquant package. Its algorithm to place the peak labels is taken from Ian Fellows' wordcloud::wordlayout.
library("MALDIquant")
data("fiedler2009subset")
## a simplified preprocessing to provide some example data
s <- trim(fiedler2009subset[[1]], c(3000, 3500))
r <- removeBaseline(s)
p <- detectPeaks(r)
## plot the peaks and label them
plot(p, ylim=c(0, 30000))
labelPeaks(p, avoidOverlap=TRUE, underline=FALSE, digits=1)
## rotate labels by 90 degree
plot(p, ylim=c(0, 30000))
labelPeaks(p, underline=FALSE, srt=90, adj=c(0, 0.5), col="red")
## label peaks above 5000
plot(p, ylim=c(0, 30000))
labelPeaks(p, index=intensity(p) > 5000)
Please see ?labelPeaks for more details.
I have two related problems.
Problem 1: I'm currently using the code below to generate a histogram overlayed with a density plot:
hist(x,prob=T,col="gray")
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(density(x))
I've pasted the data (i.e. x above) here.
I have two issues with the code as it stands:
the last tick and label (100) of the x-axis does not appear on the histogram/plot. How can I put these on?
I'd like the y-axis to be of count or frequency rather than density, but I'd like to retain the density plot as an overlay on the histogram. How can I do this?
Problem 2: using a similar solution to problem 1, I now want to overlay three density plots (not histograms), again with frequency on the y-axis instead of density. The three data sets are at:
http://pastebin.com/z5X7yTLS
http://pastebin.com/Qg8mHg6D
http://pastebin.com/aqfC42fL
Here's your first 2 questions:
myhist <- hist(x,prob=FALSE,col="gray",xlim=c(0,100))
dens <- density(x)
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(dens$x,dens$y*(1/sum(myhist$density))*length(x))
The histogram has a bin width of 5, which is also equal to 1/sum(myhist$density), whereas the density(x)$x are in small jumps, around .2 in your case (512 even steps). sum(density(x)$y) is some strange number definitely not 1, but that is because it goes in small steps, when divided by the x interval it is approximately 1: sum(density(x)$y)/(1/diff(density(x)$x)[1]) . You don't need to do this later because it's already matched up with its own odd x values. Scale 1) for the bin width of hist() and 2) for the frequency of x length(x), as DWin says. The last axis tick became visible after setting the xlim argument.
To do your problem 2, set up a plot with the correct dimensions (xlim and ylim), with type = "n", then draw 3 lines for the densities, scaled using something similar to the density line above. Think however about whether you want those semi continuous lines to reflect the heights of imaginary bars with bin width 5... You see how that might make the density lines exaggerate the counts at any particular point?
Although this is an aged thread, if anyone catches this. I would only think it is a 'good idea' to forego translating the y density to count scales based on what the user is attempting to do.
There are perfectly good reasons for using frequency as the y value. One idea in particular that comes to mind is that using counts for the y scale value can give an analyst a good idea about where to begin the 'data hunt' for stratifying heterogenous data, if a mixed distribution model cannot soundly or intuitively be applied.
In practice, overlaying a density estimate over the observed histogram can be very useful in data quality checks. For example, in the above, if I were looking at the above graphic as a single source of data with the assumption that it describes "1 thing" and I wish to model this as "1 thing", I have an issue. That is, I have heterogeneous data which may require some level of stratification. The density overlay then becomes a simple visual tool for detecting heterogeneity (apart from using log transformations to smooth between-interval variation), and a direction (locations of the mixed distributions) for stratifying the data.