R: use bin counts and bin breaks to get a histogram - r

I generated a random vector from normal distribution and plotted a histogram.
I modified the counts of the each bin and I want to plot another histogram with the same breaks(break_vector) and the new bin count vector (new_counts).
How to do that?
I tried barplot(), but the way it displays the bin labels is different.
x = rnorm(500,1,6)
delta = 1
break_vector = seq(min(x)-delta,max(x)+delta,by=delta)
hist_info = hist(x,breaks=break_vector)
new_counts = hist_info$counts+5

Try
new_hist <- hist_info
new_hist$counts <- hist_info$counts + 5
plot(new_hist)

Related

r - how to find the first empty bin of a histogram

I have a large dataset with response times. I need to make reference to the first empty bin of the histogram (with x being milliseconds), and exclude all data that comes after that.
I think
Can anybody help?
If you capture the return of hist it contains all of the information that you need.
set.seed(3)
x = rnorm(20)
H = hist(x)
min(which(H$counts == 0))
[1] 5
To exclude the data that bin and above
MIN = min(which(H$counts == 0))
x = x[x<H$breaks[MIN]]

Generating histogram that can calculate percent recovered

I have the following dataset called df:
Amp Injected Recovered Percent less_0.1_True
0.13175 25.22161274 0.96055540 3.81 0
0.26838 21.05919344 21.06294791 100.02 1
0.07602 16.88526724 16.91541763 100.18 1
0.04608 27.50209048 27.55404507 100.19 0
0.01729 8.31489333 8.31326976 99.98 1
0.31867 4.14961918 4.14876247 99.98 0
0.28756 14.65843377 14.65248551 99.96 1
0.26177 10.64754579 10.76435667 101.10 1
0.23214 6.28826689 6.28564299 99.96 1
0.20300 17.01774090 1.05925850 6.22 0
...
Here, the less_0.1_True column flags whether the Recovered periods were close enough to Injected period to be considered a successful recovery or not. If the flag is 1, then it is a succesful recovery. Based on this, I need to generate a plot (Henderson & Stassun, the Astrophysical Journal, 747:51, 2012) like the following:
I am not sure how to create a histogram like this. The closest I have been do reproduce is a bar plot with the following code:
breaks <- seq(0,30,by=1)
df <- split(dat, cut(dat$Injected,breaks)) # I make bins with width = 1 day
x <- seq(1,30,by=1)
len <- numeric() #Here I store the total number of objects in each bin
sum <- numeric() #Here I store the total number of 1s in each bin
for (i in 1:30){
n <- nrow(df[[i]])
len <- c(len,n)
s <- sum(df[[i]]$less_0.1_True == 1, na.rm = TRUE)
sum <- c(sum,s)
}
percent = sum/len*100 #Here I calculate what the percentage is for each bin
barplot(percent, names = x, xlab = "Period [d]" , ylab = "Percent Recovered", ylim=c(0,100))
And it generates the following bar plot:
Obviously, this plot does not look like the first one and there are issues such as it does not show from 0 to 1 like the first graph (which I understand is the case because the latter is a bar graph and not a histogram).
Could anyone please guide me as to how I may reproduce the first figure based on my dataset?
If I run your code I get errors. You need to use border = NA to get rid of the bar borders:
set.seed(42)
hist(rnorm(1000,4), xlim=c(0,10), col = 'skyblue', border = NA, main = "Histogram", xlab = NULL)
Another example using ggplot2:
ggplot(iris, aes(x=Sepal.Length))+
geom_histogram()
I finally found a solution to the problem in StackOverflow. I guess the solved question was worded differently than mine and so I could not find it when I was looking for it initially. The solution is here: How to plot a histogram with a custom distribution?

want to use another df for errorbars in R with barplot

I have these two df.
x;
experiment expression
1 HC 50
2 LC 4
3 HR 10
4 LR 2
y;
HC_conf_lo HC_conf_hi LC_conf_lo LC_conf_hi HR_conf_lo HR_conf_hi LR_conf_lo LR_conf_hi
1 63.3293 109.925 2.33971 5.26642 8.8504 16.7707 0.124013 0.434046
I want to use df:y to plot low and high conf. points. Output should be a barplot with errorbars. Can someone show me using lines in the basic package how to do this?
So don't know if your data is valid. Assuming the confidence intervals are valid.
Here's what you can do to get error bars in your data
#First reading in your data
x<-read.table("x.txt", header=T)
y<=read.table("y.txt", header =T)
#reshaping y to merge it with x
y.wide <-data.frame(matrix(t(y),ncol=2,byrow=T)) #Transpose Y,
#matrix with 2 cols, byrow,
#so we get the lo and hi values in one row
names(y.wide)<-c("lo","hi") #name the columns in y.wide
#Make a data.frame of x and y.wide
xy.df <-data.frame(x,y.wide) # this will be used for plotting the error bars
#make a matrix for using with barplot (barplot takes only matrix or table)
xy<-as.matrix(cbind(expression=x$expression,y.wide))
rownames(xy)<-x$experiment #rownames, so barplot can label the bars
#Get ylimts for barplot
ylimits <-range(range(xy$expression), range(xy$lo), range(xy$hi))
barx <-barplot(xy[,1],ylim=c(0,ylimits[2])) #get the x co-ords of the bars
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
# ? as don't know if it's C.I, or what
with(xy.df, arrows(barx,expression,barx,lo,angle=90, code=1,length=0.1))
with(xy.df, arrows(barx,expression,barx,hi,angle=90, code=1,length=0.1))
Resultant Plot
But it doesn't look right, This is because your expression values don't fall between the lo and hi values.
With the hack below,
barplot(xy[,1],ylim=c(0,ylimits[2]),main = "barplot of Expression with ? bars")
with(xy.df, arrows(barx,lo,barx,hi,angle=90, code=2,length=0.1))
with(xy.df, arrows(barx,hi,barx,lo,angle=90, code=2,length=0.1))
The resultant plot
So look at the both arrows call carefully, and you will see how I achieved it.
I would recommend double checking your calculations though.
And this is far easier with ggplot2. Look at this page for examples and code
http://docs.ggplot2.org/0.9.3.1/geom_errorbar.html

How to annotate subplots with ggplot from rpy2?

I'm using Rpy2 to plot dataframes with ggplot2. I make the following plot:
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species"))
p.plot()
r["dev.off"]()
I'd like to annotate each subplot with some statistics about the plot. For example, I'd like to compute the correlation between each x/y subplot and place it on the top right corner of the plot. How can this be done? Ideally I'd like to convert the dataframe from R to a Python object, compute the correlations and then project them onto the scatters. The following conversion does not work, but this is how I'm trying to do it:
# This does not work
#iris_df = pandas.DataFrame({"Sepal.Length": rpy2.robjects.default_ri2py(iris.rx("Sepal.Length")),
# "Sepal.Width": rpy2.robjects.default_ri2py(iris.rx("Sepal.Width")),
# "Species": rpy2.robjects.default_ri2py(iris.rx("Species"))})
# So we access iris using R to compute the correlation
x = iris_py.rx("Sepal.Length")
y = iris_py.rx("Sepal.Width")
# compute r.cor(x, y) and divide up by Species
# Assume we get a vector of length Species saying what the
# correlation is for each Species' Petal Length/Width
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species")) + \
# ...
# How to project correlation?
p.plot()
r["dev.off"]()
But assuming I could actually access the R dataframe from Python, how could I plot these correlations? thanks.
The solution is to create a dataframe with a label for each sample plotted. The dataframe's column should match the corresponding column name of the dataframe with the original data. Then this can be plotted with:
p += ggplot2.geom_text(data=labels_df, mapping=ggplot2.aes_string(x="1", y="1", mapping="labels"))
where labels_df is the dataframe containing the labels and labels is the column name of labels_df with the labels to be plotted. (1,1) in this case will be the coordinate position of the label in each subplot.
I found that #user248237dfsf's answer didn't work for me. ggplot got confused between the data frame I was plotting and the data frame I was using for labels.
Instead, I used
ggplot2_env = robjects.baseenv'as.environment'
class GBaseObject(robjects.RObject):
#classmethod
def new(*args, **kwargs):
args_list = list(args)
cls = args_list.pop(0)
res = cls(cls._constructor(*args_list, **kwargs))
return res
class Annotate(GBaseObject):
_constructor = ggplot2_env['annotate']
annotate = Annotate.new
Now, I have something that works just like the standard annotate.
annotate(geom = "text", x = 1, y = 1, label = "MPC")
One minor comment: I don't know if this will work with faceting.

NP chart using ggplot2

how i can generate NP chart using ggplot2?
I made simple Rscript which generates bar, point charts. I am supplying data by csv file. how many columns do i need to specify and in gplot functions what arguments do i need to pass?
I am very new to R, ggplots.
EDIT :
This is what is meant by an NP chart.
Current code attempt:
#load library ggplot2
library(ggplot2)
#get arguments
args <- commandArgs(TRUE)
pdfname <- args[1]
graphtype <- args[2]
datafile <- args[3]
#read csv file
tasks <- read.csv(datafile , header = T)
#name the pdf from passed arg 1
pdf(pdfname)
#main magic that generates the graph
qplot(x,y, data=tasks, geom = graphtype)
#clean up
dev.off()
In .csv file there are 2 columns x,y i call this script by Rscript cne.R 11_16.pdf "point" "data.csv".
Thanks you very much #mathematical.coffee this is what i need but
1> I am reading data from csv file which contains following data
this is my data
Month,Rate
"Jan","37.50"
"Feb","32.94"
"Mar","25.00"
"Apr","33.33"
"May","33.08"
"Jun","29.09"
"Jul","12.00"
"Aug","10.00"
"Sep","6.00"
"Oct","23.00"
"Nov","9.00"
"Dec","14.00"
2> I want to display value on each plotting point. and also display value for UCL,Cl,LCL, and give different label to x and y.
Problem when i read data it is not in the same order as in csv file. how to fix it?
You combine ggplot(tasks,aes(x=x,y=y)) with geom_line and geom_point to get the lines connected by points.
If you additionally want the UCL/LCL/etc drawn you add in a geom_hline (horizontal line).
To add text to these lines you can use geom_text.
An example:
library(ggplot2)
# generate some data to use, say monthly up to a year from today.
n <- 12
tasks <- data.frame(
x = seq(Sys.Date(),by="month",length=n),
y = runif(n) )
CL = median(tasks$y) # substitue however you calculate CL here
LCL = quantile(tasks$y,.25) # substitue however you calculate LCL here
UCL = quantile(tasks$y,.75) # substitue however you calculate UCL here
limits = c(UCL,CL,LCL)
lbls = c('UCL','CL','LCL')
p <- ggplot(tasks,aes(x=x,y=y)) + # store x/y values
geom_line() + # add line
geom_point(aes(colour=(y>LCL&y<UCL))) + # add points, colour if outside limits
opts(legend.position='none', # remove legend for colours
axis.text.x=theme_text(angle=90)) # rotate x axis labels
# Now add in the limits.
# horizontal lines + dashed for upper/lower and solid for the CL
p <- p + geom_hline(aes(yintercept=limits,linetype=lbls)) + # draw lines
geom_text(aes(y=limits,x=tasks$x[n],label=lbls,vjust=-0.2,cex=.8)) # draw text
# display
print(p)
which gives:

Resources