Here is my data file in (update 2021: link is dead... http://s.yunio.com/87HT7f), please download it and save it as mydata.
y<-scan("mydata")
hist(y,breaks=c(0,60,70,80,90,100),freq=TRUE)
axis(2,at=seq(0,20,length.out=5),labels=c(0,5,10,15,20))
There are two problems:
1.Warning message:
In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle, :
the AREAS in the plot are wrong -- rather use freq=FALSE
I just want freq not probability, times counted on the y axis, how to make the warning message vanish?
2.When run
axis(2,at=seq(0,20,length.out=5),labels=c(0,5,10,15,20))
There no 20 on y axis.
For the first problem, it is a warning, not the error. This warning says that visually areas of each bar do not correspond to their actual frequency - you can see it from the first bar that has the largest area but frequency is only 5.
For the second problem, you have to set ylim=c(0,20) inside hist() to see also number 20 because y axis is shorter than 20. Function axis() plots just labels, it doesn't change length of axis (originally there is no space for number 20).
hist(y,breaks=c(0,60,70,80,90,100),freq=TRUE,ylim=c(0,20))
axis(2,at=seq(0,20,length.out=5),labels=c(0,5,10,15,20))
Check the manual for hist:
freq:
Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant
(and 'probability' is not specified).
Related
I'm trying to create a histogram which involves a lot of repeated values in one of the cases. One of the data points is not being represented in the graph. Here is the smallest, simplest subset I could find that still reproduced my issue.
cleanVar <- c(rep(1,9),1.25,1.5)
plot_ly(data.table(cleanVar),
x = ~cleanVar,
type = "histogram")
The above graph shows only two bars. One centered at 1 of height 9, and one centered at 1.2 of height 1.
Also strangely, the hover-over shows "1" for the first bar, despite it covering the range [.9,1.1], and it shows "1.25" for the second bar, despite it covering the range [1.1,1.3].
If we change the 1 to only be repeated 8 times cleanVar <- c(rep(1,8),1.25,1.5), so that there are 10 total values in the histogram, it works better, but still, the three bins it creates are .25 wide according to the hover-over, yet they are only .2 wide on the graph itself.
What is plotly doing? How can I properly show 3 bins of height 9,1,1 and width .25? binning options in layout() aren't working.
By default plotly uses the following procedure to define the bins:
start:
Sets the starting value for the x axis bins. Defaults to the minimum data value, shifted down if necessary to make nice round values and to remove ambiguous bin edges. For example, if most of the data is integers we shift the bin edges 0.5 down, so a size of 5 would have a default start of -0.5, so it is clear that 0-4 are in the first bin, 5-9 in the second, but continuous data gets a start of 0 and bins [0,5), [5,10) etc. Dates behave similarly, and start should be a date string. For category data, start is based on the category serial numbers, and defaults to -0.5. If multiple non-overlaying histograms share a subplot, the first explicit start is used exactly and all others are shifted down (if necessary) to differ from that one by an integer number of bins.
end:
Sets the end value for the x axis bins. The last bin may not end
exactly at this value, we increment the bin edge by size from
start until we reach or exceed end. Defaults to the maximum data
value. Like start, for dates use a date string, and for category
data end is based on the category serial numbers.
You can find this information via:
library(listviewer)
schema(jsonedit = interactive())
Navigate as follows: object ► traces ► histogram ► attributes ► xbins ► start
To avoid the default behaviour just make your x variable a factor:
library(plotly)
library(data.table)
cleanVar <- c(rep(1, 9), 1.25, 1.5)
plot_ly(data.table(cleanVar),
x = ~factor(cleanVar),
type = "histogram")
Result:
I want to present percentages over a 24h period in 15 min intervals as a bar plot.
When I use barplot(), the labels for those timepoints are more or less randomly chosen by R (depending on how I format the window. I know it's not random, but it's not what I want either). I would rather have them evenly spaced at 1 h intervals (that is every 4th bar).
I have searched extensively on this and know i can add labels later with axis() but I have not found a way to set which bars are labeled and which are left blank.
So here is an example. Sorry for the long lines:
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10:00","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
names(x)<-Labels
barplot(x)
I do not think you can force R to show every label if it does not have enough space. But at least if you want to add the labels every 1h, the following code should work :
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
b=barplot(x,axes = F)
axis(2)
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
I want to change x-axis in my graphic, but it doesn't work properly with axis(). Datas in the graphic are daily datas and I want to show only years. Hope someone understands me and find a solution. This is how it looks like now: enter image description here and this is how it looks like with the code >axis (1, at = seq(1800, 1975, by = 25), las=2): enter image description here
Without a reproducible code is not easy to get what could be the problem. I try a "quick and dirt" approach.
High level plots are composed by elements that are sub-composed themselves. Hence, separate drawing commands could turn in use by allowing a finer control on the plotting procedure.
In practice, the first thing to do is plot "nothing".
> plot(x, y, type = "n", xlab = "", ylab = "", axes = F)
type = "n" causes the data to not be drawn. axes = F suppresses the axis and the box around the plot. In spite of that, the plotting region is ready to show the data.
The main benefit is that now the plotting area is correctly dimensioned. Try now to add the desired x axis as you tried before.
> points(x, y) # Plots the data in the area
> axis() # Plots the desired axis with your scale
> title() # Plots the desired titles
> box() # Prints the box surrounding the plot
EDITED based on comment by #scoa
As a quick and dirty solution, you can simply enter the following line after your plot() line:
# This reads as, on axis x (1), anchored at the first (day) value of 0
# and last (day) value of 63917 with 9131 day year increments (by)
# and labels (las) perpendicular (2) to axis (for readability)
# EDITED: and AT the anchor locations, put the labels
# 1800 (year) to 1975 (year) in 25 (year) increments
axis (1, at = seq(0, 63917, by = 9131), las=2, labels=seq(1800, 1975, by=25));
For other parameters, check out ?axis. As #scoa mentioned, this is approximate. I have used 365.25 as a day-to-year conversion, but it's not quite right. It should suffice for visual accuracy at the scale you have provided. If you need precise conversion from days to years, you need to operate on your original data set first before plotting.
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
Using R and polygon I'm trying to shade the area under the line of a plot from the line to the x-axis and I'm not sure what I am doing wrong here.
The shading is using some point in the middle of the y range to shade from, not 0, the x-axis.
The data set ratioresults is a zoo object but I don't think that's the issue since I tried coercing the y values to as.numeric and as.vector and got the same results.
Code:
plot(index(ratioresults),ratioresults$ratio, type="o", col="red")
polygon(c(1,index(ratioresults),11),c(0, ratioresults$ratio, 0) , col='red')
What's index(ratioresults)? For a simple zoo object I see:
> index(x)
[1] "2003-02-01" "2003-02-03" "2003-02-07" "2003-02-09" "2003-02-14"
which is a vector of Date objects. You are trying to prepend/append values of 1 and 11 to this vector. Its not going to work.
Here's a reproducible example:
x=zoo(matrix(runif(11),ncol=1),as.Date("2012-08-01") + 0:10)
colnames(x)="ratio"
plot(index(x),x$ratio,type="o",col="red",ylim=c(0,1))
polygon(index(x)[c(1,1:11,11)],c(0,x$ratio,0),col="red")
Differences from yours:
I call my thing x.
I set ylim on the plot - I don't know how your plot managed to start at 0 on the Y axis.
I complete the polygon using the x-values of the first and 11th (last) point, rather than 1 and 11 themselves.
#With an example dataset: please provide one when you need help!
ratioresults<-as.zoo(runif(10,0,1))
plot(index(ratioresults),ratioresults, type="o", col="red",
xaxs="i",yaxs="i", ylim=c(0,2))
polygon(c(index(ratioresults),rev(index(ratioresults))),
c(as.vector(ratioresults),rep(0,length(ratioresults))),col="red")
The issue with your question is that the x-axis is not a line defined by a given y value by default, so one way to fill under a curve to the x-axis using polygon would be to define a y values for the x-axis using ylim (here I chose 0). Whatever value you choose you will want to specify that the plot stop exactly at the value using yaxs="i".
You also have to construct your polygon with the value you chose for you x-axis.