I'm very novice at this so this might be a dumb question but -
I have an csv data of a regular x and y values. The x values however - are not always increasing constantly. The graph that plotly made for me had x values and increased based on the data. the x value are dates so this causes some misinterpretation base on the graph. Is there a way to have the dates increase at regular interval in the graph?
Here's what the graph looks like(a snippet)
Beginning and end
You would need to convert your x-values to datetime objects first. Plotly will then recognize the x-values as date values and plot them accordingly.
from datetime import datetime
import plotly
plotly.offline.init_notebook_mode()
x = ['1961/04/12',
'1961/04/13',
'1961/05/04',
'1961/06/06',
'1961/07/20',
'1961/07/22',
'1961/08/05',
'1961/08/07',
'1962/02/19']
y = [1, 0, 0, 0, 0, 1, 2, 1, 0]
# convert your x-values to date
d = []
for t in x:
t = [int(t) for t in t.split('/')]
d.append(datetime(*t))
data = [plotly.graph_objs.Scatter(x=d, y=y,line=dict(shape='hv'))]
fig = plotly.graph_objs.Figure(data=data)
plotly.offline.iplot(fig)
Related
I'm trying to plot several data series onto the same plot in R, but even with the showZeroValues=TRUE argument in dyLegend(), the legend stops showing values on mouseover when at least one of the series has a y=0 at the current x. I am not sure what I am doing wrong.
Below is a simplified example:
library(dygraphs)
library(xts)
x=data.frame(a=c(1, 2, 3, 1, 0, 0, 2), b=c(2, 3, 1, 0, 1, 4, 5))
x$Date=seq(as.Date("2017-06-01"), (as.Date("2017-06-01")+dim(x)[1]-1), by="days")
d=xts(x, order.by=x$Date)[,1:2]
dygraph(d) %>%
dyOptions(drawGrid=FALSE, fillGraph=TRUE) %>%
dyLegend(labelsSeparateLines=TRUE, showZeroValues=TRUE)
On my computer the dynamic legend skips all x values at which one of the two series has y=0, as can be seen with the cursor being close to zeros but the legend still stuck on the right end of the graph: example.
I had the same issue and found out that it was caused by the xts object containing character strings. The original data frame had a Date column, which I used to create the xts object, but I did not subset the numerical data. This resulted in the xts object being created but with character values (see issue here). Surprisingly enough, the resulting plots were not much impacted, and the output was correct, which made troubleshooting less straightforward.
In your example, the following should solve the issue:
x=data.frame(a=c(1, 2, 3, 1, 0, 0, 2), b=c(2, 3, 1, 0, 1, 4, 5))
x$Date=seq(as.Date("2017-06-01"), (as.Date("2017-06-01")+dim(x)[1]-1), by="days")
d=xts(x[, 1:2], order.by=x$Date) # This is the only change in your code
dygraph(d) %>%
dyOptions(drawGrid=FALSE, fillGraph=TRUE) %>%
dyLegend(labelsSeparateLines=TRUE, showZeroValues=TRUE)
I have a vector called data with length 444000 approximately, and most of the numeric values are between 1 and 100 (almost all of them). I want to draw the histogram and draw the the appropriate density on it. However, when I draw the histogram I get this:
hist(data,freq=FALSE)
What can I do to actually see a more detailed histogram? I tried to use the breaks code, it helped, but it's really hard do see the histogram, because it's so small. For example I used breaks = 2000 and got this:
Is there something that I can do? Thanks!
Since you don't show data, I'll generate some random data:
d <- c(rexp(1e4, 100), runif(100, max=5e4))
hist(d)
Dealing with outliers like this, you can display the histogram of the logs, but that may difficult to interpret:
If you are okay with showing a subset of the data, then you can filter the outliers out either dynamically (perhaps using quantile) or manually. The important thing when showing this visualization in your analysis is that if you must remove data for the plot, then be up-front when the removal. (This is terse ... it would also be informative to include the range and/or other properties of the omitted data, but that's subjective and will differ based on the actual data.)
quantile(d, seq(0, 1, len=11))
d2 <- d[ d < quantile(d, 0.90) ]
hist(d2)
txt <- sprintf("(%d points shown, %d excluded)", length(d2), length(d) - length(d2))
mtext(txt, side = 1, line = 3, adj = 1)
d3 <- d[ d < 10 ]
hist(d3)
txt <- sprintf("(%d points shown, %d excluded)", length(d3), length(d) - length(d3))
mtext(txt, side = 1, line = 3, adj = 1)
I have a dummy variable call it "drink" and a corresponding age variable that represents a precise age estimate (several decimal points) for each person in a dataset. I want to first "bin" the age variable, extracting the mean value for each bin based on the "drink" dummy, and then graph the result. My code to do so looks like this:
df$bins <- cut(df$age, seq(from = 17, to = 31, by = .2), include.lowest = TRUE)
df.plot <- ddply(df, .(bins), summarise, avg.drink = mean(drinks_alcohol))
qplot(bins, avg.drink, data = df.plot)
This works well enough, but the x-axis in the graph is unreadable because it corresponds to the length size of the bins. Is there a way to make the modify the X-axis to show, for example, ages 19-23 only, with the "ticks" still aligning with the correct bins? For example, in my current code there is a bin for (19, 19.2] and another bin for (20, 20.2]. I would want only the bins that start in whole numbers to be identified on the X-axis with the first number (19, 20), not the second (19.2, 20.2) shown.
Is there any straightforward way to do this?
The most direct way to specify axis labels is with the appropriate scale function... in the case of factors on the x axis, scale_x_discrete. It will use whatever labels you give it with the labels argument, or you can give it a function that formats things as you like.
To "manually" specify the labels, you just need to create a vector of appropriate length. In this case, if you factor values go are intervals beginning with seq(17, 31.8, by = 0.2) and you want to label bins beginning with integers, then your labels vector will be
bin_starts = seq(17, 31.8, by = 0.2)
bin_labels = ifelse(bin_starts - trunc(bin_starts) < 0.0001, as.character(bin_starts), "")
(I use the a - b < 0.0001 in case of precision problems, though it shouldn't be a problem in this particular case).
A more robust solution would to label the factor levels with the number at the start of the interval from the beginning. cut also has a labels argument.
my_breaks = seq(17, 32, by = 0.2)
df$bins <- cut(df$age, breaks = my_breaks, labels = head(my_breaks, -1),
include.lowest = TRUE)
You could then fairly easily write a formatter (following templates from the scales package) to print only the ones you want:
int_only = function(x) {
# test if we can coerce to numeric, if not do nothing
if (any(is.na(as.numeric(x)))) return(x)
# otherwise convert to numeric and return integers and blanks as labels
x = as.numeric(x)
return(ifelse(x - trunc(x) < 1e-10, as.character(x), ""))
}
Then, using the nicely formatted data created above, you should be able to pass int_only as a formatter function to labels to get the labels you want. (Note: untested! necessary tweaks left as an exercise for the reader, though I'll gladly accept edits :) )
please help: I want to shade a time-series figure in R's plot for all values where an indicator variable, z == 1.
Here follows a code which generates a similar scenario that I am looking at:
x <-runif(100, 5.0, 7.5)
y <-runif(100, 1, 10)
z = as.numeric(y >= 5)
date = seq(as.Date("1910/1/1"), as.Date("2009/1/1"), "years")
data = data.frame(cbind(x,y,z))
color <- rgb(190, 190, 190, alpha=80, maxColorValue=255)
plot(date,x, type='l')
rect(xleft=date[10], xright=date[40], ybottom=5, ytop=7.5, col = color,density=100)
From the code, I can only specify dates one by one. But suppose I want to shade all the areas where z==1? I.e. all the dates where z == 1. Any ideas how this could be done?
Manythanks, Nic
Just feed an entire vector of dates into the xleft and xright parameters, as indexed by z==1. Don't do line shading, it will run a long time, just change the color to grey. Afterwards, plot the time series again over the rectangles:
plot(date,x, type='l')
rect(xleft=date[z==1]-180,xright=date[z==1]+180,
ybottom=5, ytop=7.5, col="grey",border=NA)
lines(date,x)
I have a chart of financial activity and a couple running sums. Things are getting a little busy and I'm having trouble distinguishing fiscal (ends June 30th) vs calendar year. Is there a way to set the background to different colors based on date?
In other words could I set background to lite green where 2009-06-30 < date < 2010-07-01?
Apply a piece of both suggestions by #G-Grothendieck and #vincent - use rect within zoo package. zoo is excellent for any visualization of time series.
library(zoo)
#random data combined with time series that starts in 2009-01
v <- zooreg(rnorm(37), start = as.yearmon("2009-1"), freq=12)
plot(v, type = "n",xlab="",xaxt="n")
#this will catch some min and max values for y-axis points in rect
u <- par("usr")
#plot green rect - notice that x-coordinates are defined by date points
rect(as.yearmon("2009-6-30"), u[3], as.yearmon("2010-7-1"), u[4],
border = 0, col = "lightgreen")
lines(v)
axis(1, floor(time(v)))
#customized x-axis labels based on dates values
axis(1,at=c(2009.4, 2010.5),padj=-2,lty=0,labels=c("start","end"),cex.axis=0.8)
Check out xblocks.zoo in the zoo package. e.g., example(xblocks.zoo)
You can plot grey rectangles, with rect, before plotting the curves.
You will also need the dimensions of the plotting area: they are in par("usr").
library(quantmod)
getSymbols("A")
plot( index(A), coredata(Ad(A)), type="n" )
# This example uses calendar years: adapt as needed
dates <- c(
ISOdate( year(min(index(A))), 1, 1 ),
ISOdate( year(max(index(A))) + 1, 1, 1 )
)
dates <- as.Date(dates)
dates <- seq.Date(dates[1], dates[2], by="2 year")
rect(
dates,
par("usr")[3],
as.Date( ISOdate( year(dates) + 1, 1, 1 ) ),
par("usr")[4],
col="grey",
border=NA
)
lines(index(A), coredata(Ad(A)), lwd=3)