Plotting TimeDeltas in Pandas

Plotting TimeDeltas in Pandas - datetime

I'm trying to plot timedeltas on the x-axis, but am seeing strange behaviour. With the following code, I would expect two curved plots:
dates = [datetime.datetime(2013,1,1) + datetime.timedelta(seconds=x**2) for x in range(1000)]
deltas = [datetime.timedelta(seconds=x**2) for x in range(1000)]
values = range(1000)
foo = DataFrame.from_dict({'dates': dates, 'deltas': deltas, 'vals': values})
foo.plot(x='dates', y='vals')
foo.plot(x='deltas', y='vals')
but in fact the second plot comes out as a straight line as in that case the x-axis is rescale. Is this a bug or am I just doing it wrong?

This is not properly supported by matplotlib ATM, so see this issue here
workaround is easy enough, just set the index to the formatted (string version) and it will work.

Related

TimeSeries: Can't get the type = "o" to work, my data is only plotting horizontal lines for each data point when I need a connected line graph

I am trying to plot a time series graph, but am having issues getting it to be a line graph while showing the decades at the bottom.
My data set has the decades (as factors) next to performance (integer)
If I write
plot(StockPerformance$Decade, StockPerformance$Performance)
I will get a graph that has horizontal lines in it
PLOT PICTURE
adding,
type ="o"
like this:
plot(StockPerformance$Decade, StockPerformance$Performance, type ="o")
doesn't change it....

In R, when you read/create a data frame using read.table (or a variant thereof) or make it using data.frame, it tries to figure out what you have, and treat it appropriately. Specifically, inputs with character vectors (like "1830s" get converted to factors.
Factors are a way to efficiently store character strings - which was a lot more important when R was first created than now. The important thing for you is that characters don't have any order to them unless you put it there, so R automatically makes boxplots out of them. That's why you are seeing lines - they are boxplots with only one point.
To get around this, you need to convert them to numbers for the purpose of plotting. Then, you need to "fix" the axes afterwards. Here's how:
plot(Performance ~ as.numeric(Decade),
data = StockPerformance,
xlab = "Decade", # otherwise we have "as.numeric(Decade)
xaxt = 'n', # removes default axis ticks and labels
pch = 1 # default open circle. Change the number to get other options. 16 and 20 are both closed circles (20 is small, 16 is big)
)
with(StockPerformance, # This just makes it so I don't have to type StockPerformance twice below.
axis(1, at = 1:nlevels(Decade),
value = levels(Decade)
))

Plots.jl adding a trendline, plus changing the x-axis

I want to do a simple plot using Plots.jl.
I calculated a rate for each month over a couple of years. The problem that I am facing now is that I want to add a trendline to this plot. I did not find how this is done in Julia or Plots, if this is somewhere, please tell me.
My second question is that as I just get a vector with lets say 150 elements, each for a month, Plots.jl just gives me numbers on the x-axis for 0, 50, 100 and 150 with horizontal lines. I would like to change this to every 12 numbers one of these lines plus the year as a label on the axis.
I hope my question is clear, and thank you very much in advance.
Cheers

No fancy features needed if I understand your question correctly.
using Plots
dates = 1:150
ticks = 1:12:150
ticks_labels = 0:12
values = rand(150).+dates*0.01
plot(dates, values, xticks = (ticks, ticks_label), label="my series")
bhat = [dates ones(150)]\values
Plots.abline!(bhat..., label = "trendline")
output ->

Plots now has a simple keyword option for adding a trend line.
using Plots
scatter(collect(1:10),collect(1:10)+rand(10),smooth=:true)

Odd axis label behaviour after setting xlim in pyramid.plot [plotrix]

I'm trying to make an "opposing stacked bar chart" and have found pyramid.plot from the plotrix package seems to do the job. (I appreciate ggplot2 will be the go-to solution for some of you, but I'm hoping to stick with base graphics on this one.)
Unfortunately it seems to do an odd thing with the x axis, when I try to set the limits to non integer values. If I let it define the limits automatically, they are integers and in my case that just leaves too much white space. But defining them as xlim=c(1.5,1.5) produces the odd result below.
If I understand correctly from the documentation, there is no way to pass on additional graphical parameters to e.g. suppress the axis and add it on later, or let alone define the tick points etc. Is there a way to make it more flexible?
Here is a minimal working example used to produce the plot below.
require(plotrix)
set.seed(42)
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5))
Just in case it is of interest to anyone else, I'm not doing a population pyramid, but rather attempting a stacked bar chart with some of the values negative. The code above includes a 'trick' I use to make it possible to have a different number of sets of bars on each side, namely adding empty columns to the matrix, hopefully someone will find that useful - so sorry the working example is not as minimal as it could have been!

Setting the x axis labels using laxlab and raxlab creates a continuous axis:
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5),
laxlab = seq(from = 0, to = 1.5, by = 0.5),
raxlab=seq(from = 0, to = 1.5, by = 0.5))

R: how to make multiple plots from one CSV, grouping by a column

I'd like to put multiple plots onto a single visual output in R, based on data that I have in a CSV that looks something like this:
user,size,time
fred,123,0.915022
fred,321,0.938769
fred,1285,1.185608
wilma,5146,2.196687
fred,7506,1.181990
barney,5146,1.860287
wilma,1172,1.158015
barney,5146,1.219313
wilma,13185,1.455904
wilma,8754,1.381372
wilma,878,1.216908
barney,2974,1.223852
I can read this just fine, using, e.g.:
data = read.csv('data.csv')
For the moment, a fairly simple plot is fine, so I'm just trying plot(), without much to it (setting type='o' to get lines and points), and' from solving a past problem, I know that I can do, e.g., the following, to get data for just fred:
plot(data$time[which(data$user == 'fred')], data$size[which(data$user == 'fred')], type='o')
What I'd like, though, is to have the data for each user all showing up on one set of axes, with color coding (and a legend to match users to colors) to identify different user data.
And if another user shows up, I'd like another line to show up, with another color (perhaps recycling if I have too many users at once).
However, just this doesn't do it:
plot(data$size, data$time, type='o',col=c("red", "blue", "green"))
Because it doesn't seem to group by the user.
And just this:
plot(data, type='o')
gives me an error:
Error in plot.default(...) :
formal argument "type" matched by multiple actual arguments
This:
plot(data)
does do something, but not what I want.
I've poked around, but I'm new enough to R that I'm not quite sure how best to search for this, nor where to look for examples that would hit a use-case like this.
I even got somewhat closer with this:
plot(data$size[which(data$user == 'wilma')], data$time[which(data$user == 'wilma')], type='o', col=c('red'))
lines(data$size[which(data$user == 'fred')], data$time[which(data$user == 'fred')], type='o', col=c('green'))
lines(data$size[which(data$user == 'barney')], data$time[which(data$user == 'barney')], type='o', col=c('blue'))
This gives me a plot (which I'd post inline, but as a new user, I'm not allowed to yet):
not-quite-right plot
which is kind of close to what I want, except that it:
doesn't have a legend
has ugly axis labels, instead of just time and size
is scaled to the first plot, and thus is missing data from some of the others
isn't sorted by x-axis, which I could do externally, though I'm guessing I could do it fairly easily in R.
So, the question, ultimately, is this:
What's an easy way to plot data like this which:
has multiple lines based on the labels in the first column of the CSV
uses the same set of axes for the data in columns 2 and 3, regardless of the label
has a legend and color-coding for which label is being used for a particular line (or set of points)
will adapt to adding new labels to the data file, hopefully without change to the R code.
Thanks in advance for any help or pointers on this.
P.S. I looked around for similar questions, and found one that's sort of close, but it's not quite the same, and I failed to figure out how to adapt it to what I'm trying to do.

Good question. This is doable in base plot, but it's even easier and more intuitive using ggplot2. Below is an example of how to do this with random data in ggplot2
First download and install the package
install.packages("ggplot2",repos='http://cran.us.r-project.org')
require(ggplot2)
Next generate the data
a <- c(rep('a',3),rep('b',3),rep('c',3))
b <- rnorm(9,50,30)
c <- rep(seq(1,3),3)
dat <- data.frame(a,b,c)
Finally, make the plot
ggplot(data=dat, aes(x=c, y=b , group=a, colour=a)) + geom_line() + geom_point()
Basically, you are telling ggplot that your x axis corresponds to the c column (dat$c), your y axis corresponds to the b column (y$b) and to group (draw separate lines) by the a column (dat$a). Colour specifies that you want to group colour by the a column as well.
The resulting graph looks like this:

ggplot2 lines from point to point on xy are sorted

I'm trying to look at the path of an MCMC trace and the following plain plot() shows the sort of thing I am after, however, when I try the same in ggplot2, it unhelpfully sorts the x-axis values - which I might like in some circumstances, but not now.
set.seed(123)
t1 <- data.frame(x=rnorm(20), y=rnorm(20))
plot(t1$x, t1$y, type='b')
qplot(t1$x, t1$y, geom=c('point','line'))
How do I get something like in the plot() in ggplot2?

Use path instead of line. line will connect points from smallest to largest x value but path will connect them in order as they are in data frame.
qplot(t1$x, t1$y, geom=c('point','path'))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Plotting TimeDeltas in Pandas - datetime

This is not properly supported by matplotlib ATM, so see this issue here workaround is easy enough, just set the index to the formatted (string version) and it will work.

Related

TimeSeries: Can't get the type = "o" to work, my data is only plotting horizontal lines for each data point when I need a connected line graph

Plots.jl adding a trendline, plus changing the x-axis

Odd axis label behaviour after setting xlim in pyramid.plot [plotrix]

R: how to make multiple plots from one CSV, grouping by a column

ggplot2 lines from point to point on xy are sorted

Categories

Resources