I have some 100k values. When I plot them as a line in R (using plot(type="l") the numbers next to the x-axis ticks are printed in scientific format (e.g. 0e+00,2e+04,...,1e+05). Instead, I would like them to be:
A) 0,20kb,...,100kb
B) the same but now the first coordinate should be 1 (i.e. starting to count from 1 instead of 0).
BTW
R arrays use numbering that starts from 1 (in contrast to arrays in perl, java etc.) so I wonder why when plotting "they" decided starting from 0...
A)
R> xpos <- seq(0, 1000, by=100)
R> plot(1:1000, rnorm(1000), type="l", xaxt="n")
R> axis(1, at=xpos, labels=sprintf("%.2fkb", xpos/1000))
B) same as above, adjust xpos
The question is quite old but when I looked for solutions for the described problem it was ranked quite high. Therefore, I add this - quite late - answer and hope that it might help some others :-) .
In some situations it might be useful to use the tick locations which R suggests. R provides the function axTicks for this purpose. Possibly it did not exist in R2.X but only since R3.X.
A)
myTicks = axTicks(1)
axis(1, at = myTicks, labels = paste(formatC(myTicks/1000, format = 'd'), 'kb', sep = ''))
B)
If you plot data like plot(rnorm(1000)), then the first x-value is 1 and not 0. Therefore, numbering automatically starts with 1. Maybe this was a problem with a previous version of R?!
Related
I'd like to plot a dataset that consists of two vectors of length 100. The mean difference of the vectors being high and the variance of each of them being considerably smaller, it is quite difficult to plot both vectors and still be able to see the variation within each vector.
What I'd like to be able to manually set the breaks so that we could both see the difference between the vectors and within them.
Consider this data set
a=rnorm(100,sd=0.005)+1
b=rnorm(100,sd=0.005)+10
vec = c(a,b)
Neither plot(vec) nor plot(vec,log="y") gives satisfying results, as it is not possible to distinguish the variation within the vector (see picture).
I'd like the breaks on the y-axis to be (min(a), max(a), 5, min(b), max(b)) (and get equal distance between them). How could one achieve that?
Depending on exactly what you are trying to do, a simple transformation of the data in each part of the vector might be enough:
vec2 <- c( (a - min(a))/ (max(a)-min(a)) , 3 + (b - min(b))/ (max(b)-min(b)) )
plot(vec2, axes=F)
box()
axis(1)
axis(2, at=c(0,1,2,3,4), labels = round(c(min(a), max(a), 5, min(b), max(b)),2))
Alternative approaches might be a custom transformation in ggplot, a secondary axis in ggplot, breaking the graph into facets, or using ggbreak.
In my plots I try to replace the axis labels with value 'Inf' with the infinity sign (e.g. unicode '\u221e'). Since I have many plots with different labels, I don't want to do it by hand.
I thought it would be easier to use unicode than plotmath. However I can't figure out how to reach my goal. For example, I have the following vector:
xlab <- as.character(c(1:10,Inf))
x <- y <- 1:11
plot(x,y,xaxt="n")
axis(1,at=x,labels=gsub("Inf","\u221E",xlab))
axis(3,at=x,labels=gsub("Inf","\\u221E",xlab))
both don't work. What am I missing? Thank you for your help!
Edit on 2018-02-06:
I was wrong, rawr's solution works only halfway. I think I need to clarify my problem a bit more.
1) I have many different plots (with different x, y and corresponding xlab values) that I want to loop over. That's why I try to use a sub/gsub solution, because I don't want to write a hundred times the labels.
2) My first example (axis(1,at=x,labels=gsub("Inf","\u221E",xlab))) is not working on any of my windows machines. It is working on debian, though.
3) rawr's solution does have the problem that it annotates all available labels, no matter how much space there is available for annotating. Simple example:
x <- y <- exp(-1:11)
xlab <- as.character(c(Inf,10:-1))
plot(x,y,xaxt="n")
axis(1, at = x, labels = parse(text = gsub("Inf", "infinity", xlab)))
is not that nice.
Is there any solution for my windows machines? Possibly not by code, but by changing some settings?
Thanks!
Try
axis(1, at=x, labels=c(1:10, expression(infinity)))
A more flexible approach that can handle any unicode character is available using the stringi package:
axis(1, at=x, labels=c(1:10, stri_unescape_unicode('\\u221E')))
I need to plot a vector of numbers. Let's say these numbers range from 0 to 1000. I need to make a histogram where the x axis goes from 100 to 500, and I want to specify the number of bins to be 10. How do I do this?
I know how to use xlim and break separately, but I don't know how to make a given number of bins inside the custom range.
This is a very good question actually! I was bothered by this all the time but finally your question has kicked me to finally solve it :-)
Well, in this case we cannot simply do hist(x, xlim = c(100, 500), breaks = 9), as the breaks refer to the whole range of x, not related to xlim (in other words, xlim is used only for plotting, not for computing the histogram and setting the actual breaks). This is a clear flaw of the hist function and there is no simple remedy found in the documentation.
I think the easiest way out is to "xlim" the values before they go to the hist function:
x <- runif(1000, 0, 1000) # example data
hist(x[x > 100 & x < 500], breaks = 9)
breaks should be number of cells minus one.
For more info see ?hist
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
I have two vectors (in a data frame) that I want to plot like this plot(df$timeStamp,df$value), which works nicely by itself. Now the plot is showing the timestamp in a pure numerical way as markers on the x axis.
When I format the vector of timestamps it into a vector of "hh:mm:ss", plot() complains (which makes sense, as the x-axis data is now a vector of strings).
Is there a way to say plot(x-vector, y-vector, label-x-vector) where the label-x-vector contains the elements to display along the x-axis?
The last part of your general question is done in two commands rather than one. If you look at ?plot.default (linked from ?plot) you'll see an option to leave off the x-axis all together using the xaxt argument (xaxt = 'n'). Do that and then use axis to make the x-axis what you want (check ?axis). I don't know what format your timestamp is currently in so it's hard to help further.
In general it's...
plot(x-vector, y-vector, xaxt = 'n')
axis(1, x-vector, label-x-vector)
(The help for plotting may be just about the messiest part of R-help but once you get used to looking at plot.default, axis, and par you'll start getting a better handle on things)
The standard R plots are pretty good at doing what you want if you give them the correct information. If you can convert your timestamps to actual time objects (Date or POSIXct objects) then plot will tend to do the correct thing. Try the following examples:
tmp <- as.POSIXct( seq(0, length=10, by=60*5), origin='2011-12-28' )
tmp
plot( tmp, runif(10) )
tmp2 <- as.POSIXct( seq(0, length=10, by=60*60*5), origin='2011-12-28' )
tmp2
plot( tmp2, runif(10) )
tmp3 <- as.POSIXct( seq(0, length=10, by=60*60/2), origin='2011-12-28' )
tmp3
plot( tmp3, runif(10) )
In each case the tick labels are pretty meaningful, but if you would like a different format then you can follow #John's example and suppress the default axis, then use axis.POSIXct and specify what format you want.
The examples use equally spaced times (due to my laziness), but will work equally well for unequally spaced times.