histogram with one column per each value in R - r

I am trying to plot simple histogram in R. I have an integer vector and I want to draw a histogram with one column per each value.
test_data = c(1,1,1,2,2,3,3,4)
hist(test_data)
But I get this
Please tell me whether it is possible to get the same result as I have in Python?
import matplotlib.pyplot as plt
test_data = [1,1,1,2,2,3,3,4]
plt.hist(test_data)
plt.show()

You could us the barplot and table functions
barplot(table(test_data))

You can use the nclass or breaks argument to adjust the number of bins.
test_data = c(1,1,1,2,2,3,3,4)
hist(test_data,breaks=5)
hist(test_data,nclass=5)
In fact it is the same thing for python. The argument is bins. The default value is 10 (according to this page)
So if you modify it, we will get a different plot
import matplotlib.pyplot as plt
test_data = [1,1,1,2,2,3,3,4]
plt.hist(test_data,bins=4)
plt.show()
you get

Related

obspy plot streams as one plot with different color

Hi I am new to use obspy.
I want to plot two streams to one plot.
I made code as below.
st1=read('/path/1.SAC')
st1+=read('/path/2.SAC')
st1.plot()
I succeed to plot two plots but what I want to do is plotting them as two colors.
When I put the option of 'color', then both colors are changed.
How can I set colors seperately?
Currently it is not possible to change the color of individual waveforms, and changing color will change all waveforms as you mentioned. I suggest you make your own graph from the ObsPy Stream using Matplotlib:
from obspy import read
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
st1=read('/path/1.SAC')
st1+=read('/path/2.SAC')
# Start figure
fig, ax = plt.subplots(nrows=2, sharex='col')
ax[0].plot(st1[0].times("matplotlib"), st1[0].data, color='red')
ax[1].plot(st1[1].times("matplotlib"), st1[1].data, color='blue')
# Format xaxis
xfmt_day = mdates.DateFormatter('%H:%M')
ax[0].xaxis.set_major_formatter(xfmt_day)
ax[0].xaxis.set_major_locator(mdates.MinuteLocator(interval=1))
plt.show()

Plotting TimeDeltas in Pandas

I'm trying to plot timedeltas on the x-axis, but am seeing strange behaviour. With the following code, I would expect two curved plots:
dates = [datetime.datetime(2013,1,1) + datetime.timedelta(seconds=x**2) for x in range(1000)]
deltas = [datetime.timedelta(seconds=x**2) for x in range(1000)]
values = range(1000)
foo = DataFrame.from_dict({'dates': dates, 'deltas': deltas, 'vals': values})
foo.plot(x='dates', y='vals')
foo.plot(x='deltas', y='vals')
but in fact the second plot comes out as a straight line as in that case the x-axis is rescale. Is this a bug or am I just doing it wrong?
This is not properly supported by matplotlib ATM, so see this issue here
workaround is easy enough, just set the index to the formatted (string version) and it will work.

Changing color and shape of point in time series plot depending on value of different variable in r

I have a time series which contains probabilities. For simplicity, imagine each point represents my estimate as to whether it would rain in New York on a given day. When I plot the time series, I want the symbol to be of pch=1 and the col="green" if I were correct and if I were wrong, I'd like to use pch=4 and col="red".
I've tried and failed with a couple of approaches. Mainly I am using the chart.TimeSeries function in PerformanceAnalytics. I'm open to ideas.
I have tried plotting one series including vectors for pch and col to represent correct and incorrect. I have tried plotting just the correct points and the using the points function to add the incorrect ones.
Thanks.
I will try to guess at what you are asking. First create a set of predictions for each day in some month and decide if your prediction was correct:
df <- data.frame(days = 1:31,
p_rain = runif(31),
correct = sample(c(TRUE,FALSE),31,T))
Now create your symbols and colors based on whether or not you were correct:
df$ppch <- 1 # the default
df$ppch[df$correct==FALSE] <- 4 # when you were wrong
Similarly,
df$pcolor <- as.character('green')
df$pcolor[df$correct==FALSE] <- 'red'
Now plot:
with(df, plot(days,p_rain,pch=ppch,col=pcolor))

Plotting - pandas - distribution in boxplots and norm distribution in histograms

I'd like to add distribution to boxplot when using it with pandas dataframe like this:
In [52]: df = DataFrame(rand(10,5))
In [53]: plt.figure();
In [54]: bp = df.boxplot()
but this generates these:
and I would like something like this:
is it possible using pandas? Thanks
Same with histograms, for example:
plt.figure()
pd.tools.plotting.hist_frame(fr_q, color="k", alpha=0.5,bins=20, figsize=fgsize)
and now I would like to insert "kde". It's easy for single plot, for ex.:
plt.figure()
a.hist(normed=True)
a.plot(kind="kde")
but how to added to every subplot?
Thanks
http://nbviewer.ipython.org/urls/gist.github.com/fonnesbeck/5850463/raw/a29d9ffb863bfab09ff6c1fc853e1d5bf69fe3e4/3.+Plotting+and+Visualization.ipynb
here is a good resource for plotting

ggplot2 2d Density Weights

I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)

Resources