Cannot convert Array{Any,2} to series data for plotting - julia

I am learning the Julia from the coursera
using DelimitedFiles
EVDdata = DelimitedFiles.readdlm("wikipediaEVDdatesconverted.csv", ',')
# extract the data
epidays = EVDdata[:,1]
EVDcasesbycountry = EVDdata[:, [4, 6, 8]]
# load Plots and plot them
using Plots
gr()
plot(epidays, EVDcasesbycountry)
I am getting the error message Cannot convert Array{Any,2} to series data for plotting
but in that course the lecturer successfully plots the data. where I am going wrong?
I search about the error where I end up something call parsing the string into an integer. As the data set may contain string values.
Or am I missing something else.

I found this to be working for me:
# extract the data
epidays = Array{Integer}(EVDdata[:,1])
EVDcasesbycountry = Array{Integer}(EVDdata[:, [4, 6, 8]])
# load Plots and plot them
using Plots
gr()
plot(epidays, EVDcasesbycountry)

It's a bit hard to tell what's going on in Coursera, as it's not clear what versions of Plots and DataFrames the video is using.
The error you're seeing however is telling you that a 2-dimensional Array (i.e. a matrix) can't be converted to a single series for plotting. This is because plot is supposed to be called with two vectors, one for x and one for y values:
plot(epidays, EVData[:, 4])
You can plot multiple columns in a loop:
p = plot()
for c in eachcol(EVData[:, [4, 6, 8]])
plot!(p, epidays, c)
end
display(p)
There is also StatsPlots.jl, which extend the standard Plots.jl package for frequently needed "data science-y" plotting functions. In this case you could use the #df macro for plotting DataFrames; just quoting one of the examples in the Readme:
using DataFrames, IndexedTables
df = DataFrame(a = 1:10, b = 10 .* rand(10), c = 10 .* rand(10))
#df df plot(:a, [:b :c], colour = [:red :blue])
Finally, there are some more grammar-of-graphics inspired plotting packages in Julia which are focused on plotting DataFrames, e.g. the pure-Julia Gadfly.jl, or the VegaLite wrapper VegaLite.jl

You can also try this
using StatsPlots
gr()
using DataFrames, IndexedTables
df = DataFrame(EVDdata)
#df df plot(:x1, [:x4 :x6 :x8], marker = ([:octagon :star7 :square], 9), title = "EVD in West Africa, epidemic segregated by country", xlabel = "Days since 22 March 2014",ylabel = "Number of cases to date",line = (:scatter), colour = [:red :blue :black])

On the other hand, this tutorial does (apparently) the same thing as the coursera plot and it works.
https://docs.juliaplots.org/latest/tutorial/#Basic-Plotting:-Line-Plots
x = 1:10; y = rand(10, 2) # 2 columns means two lines
plot(x, y)
And I haven't figured out why too...
Update: The staff answer is that maybe " Julia no longer supports plot 'Array{Any,2}' " and a simple workaround is to convert the EVDcasesbycountry data to Int doing this:
epidays = EVDdata[:,1]
EVDcasesbycountry = convert.(Int, EVDdata[:, [4, 6, 8]])
It worked for me and is kinda consistant with my first answer because when I checked the types of x and y they weren't Any as the data of epidays and EVDcasesbycountry.

https://docs.juliaplots.org/latest/generated/gr/
This contains some nice examples
Coming to the problem you can pass vector instead on the matrix for plotting
using Plots
gr()
y = Vector[EVData[:,4],EVData[:,6],EVData[:,8]]
plot(
epidays,y,
color = [:black :orange :red],
line = (:scatter),
marker = ([:hex :d :star4],5)
)

Related

Plot multiple columns saved in data frame with no x

My problem is multifaceted.
I would like to plot multiple columns saved in a data frame. Those columns do not have an x variable but would essentially be 1 to 101 consistent for all. I have seen that I can transfer them into long format but most ggplot options require an X. I tried zoo which does what I want it to, but the x-label is all jumbled and I am not aware of how to fix it. (Example of data below, and plot)
df <- zoo(HIP_131_Y0_LC_walk1[1:9])
plot(df)
I have multiple data frames saved in a list so ultimately would like to run a function and apply to all. The zoo function solves step one but I am not able to apply to all the data frames in the list.
graph<-lapply(myfiles,function(x) zoo(x) )
print(graph)
Ideally I would like to also mark minimum and maximum, which I am aware can be done with ggplot but not zoo.
Thank you so much for your help in advance
Assuming that the problem is overlapped panel names there are numerous solutions to this:
abbreviate the names using abbreviate. We show this for plot.zoo and autoplot.zoo .
put the panel name in the upper left. We show this for plot.zoo using a custom panel.
Use a header on each panel. We show this using xyplot.zoo and using ggplot.
The examples below use the test input in the Note at the end. (Next time please provide a complete example including all input in reproducible form.)
The first two examples below abbreviates the panel names and using plot.zoo and autoplot.zoo (which uses ggplot2). The third example uses xyplot.zoo (which uses lattice). This automatically uses headers and is probably the easiest solution.
library(zoo)
plot(z, ylab = abbreviate(names(z), 8))
library(ggplot2)
zz <- setNames(z, abbreviate(names(z), 8))
autoplot(zz)
library (lattice)
xyplot(z)
(click on plots to see expanded; continued after plots)
This fourth example puts the panel names in the upper left of the panel themselves using plot.zoo with a custom panel.
pnl <- function(x, y, ..., pf = parent.frame()) {
legend("topleft", names(z)[pf$panel.number], bty = "n", inset = -0.1)
lines(x, y)
}
plot(z, panel = pnl, ylab = "")
(click on plot to see it expanded)
We can also get headers with autoplot.zoo similar to in lattice above.
library(ggplot2)
autoplot(z, facets = ~ Series, col = I("black")) +
theme(legend.position = "none")
(click to expand; continued after graphics)
List
If you have a list of vectors L (see Note at end for a reproducible example of such a list) then this will produce a zoo object:
do.call("merge", lapply(L, zoo))
Note
Test input used above.
library(zoo)
set.seed(123)
nms <- paste0(head(state.name, 9), "XYZ") # long names
m <- matrix(rnorm(101*9), 101, dimnames = list(NULL, nms))
z <- zoo(m)
L <- split(m, col(m)) # test list using m in Note

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Multiple histograms in Julia using Plots.jl

I am working with a large number of observations and to really get to know it I want to do histograms using Plots.jl
My question is how I can do multiple histograms in one plot as this would be really handy. I have tried multiple things already, but I am a bit confused with the different plotting sources in julia (plots.jl, pyplot, gadfly,...).
I don't know if it would help for me to post some of my code, as this is a more general question. But I am happy to post it, if needed.
There is an example that does just this:
using Plots
pyplot()
n = 100
x1, x2 = rand(n), 3rand(n)
# see issue #186... this is the standard histogram call
# our goal is to use the same edges for both series
histogram(Any[x1, x2], line=(3,0.2,:green), fillcolor=[:red :black], fillalpha=0.2)
I looked for "histograms" in the Plots.jl repo, found this related issue and followed the links to the example.
With Plots, there are two possibilities to show multiple series in one plot:
First, you can use a matrix, where each column constitutes a separate series:
a, b, c = randn(100), randn(100), randn(100)
histogram([a b c])
Here, hcat is used to concatenate the vectors (note the spaces instead of commas).
This is equivalent to
histogram(randn(100,3))
You can apply options to the individual series using a row matrix:
histogram([a b c], label = ["a" "b" "c"])
(Again, note the spaces instead of commas)
Second, you can use plot! and its variants to update a previous plot:
histogram(a) # creates a new plot
histogram!(b) # updates the previous plot
histogram!(c) # updates the previous plot
Alternatively, you can specify which plot to update:
p = histogram(a) # creates a new plot p
histogram(b) # creates an independent new plot
histogram!(p, c) # updates plot p
This is useful if you have several subplots.
Edit:
Following Felipe Lema's links, you can implement a recipe for histograms that share the edges:
using StatsBase
using PlotRecipes
function calcbins(a, bins::Integer)
lo, hi = extrema(a)
StatsBase.histrange(lo, hi, bins) # nice edges
end
calcbins(a, bins::AbstractVector) = bins
#userplot GroupHist
#recipe function f(h::GroupHist; bins = 30)
args = h.args
length(args) == 1 || error("GroupHist should be given one argument")
bins = calcbins(args[1], bins)
seriestype := :bar
bins, mapslices(col -> fit(Histogram, col, bins).weights, args[1], 1)
end
grouphist(randn(100,3))
Edit 2:
Because it is faster, I changed the recipe to use StatsBase.fit for creating the histogram.

How to overlay multiple TA in new plot using quantmod?

We can plot candle stick chart using chart series function chartSeries(Cl(PSEC)) I have created some custom values (I1,I2 and I3) which I want to plot together(overlay) outside the candle stick pattern. I have used addTA() for this purpose
chartSeries(Cl(PSEC)), TA="addTA(I1,col=2);addTA(I2,col=3);addTA(I3,col=4)")
The problem is that it plots four plots for Cl(PSEC),I1,I2 and I3 separately instead of two plots which I want Cl(PSEC) and (I1,I2,I3)
EDITED
For clarity I am giving a sample code with I1, I2 and I3 variable created for this purpose
library(quantmod)
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
I1=SMA(price,3)
I2=SMA(price,10)
I3=SMA(price,15)
chartSeries(price, TA="addTA(I1,col=2);addTA(I2,col=3);addTA(I3,col=4)")
Here is an option which preserves largely your original code.
You can obtain the desired result using the option on=2 for each TA after the first:
library(quantmod)
getSymbols("PSEC")
price <- Cl(PSEC)
I1 <- SMA(price,3)
I2 <- SMA(price,10)
I3 <- SMA(price,15)
chartSeries(price, TA=list("addTA(I1, col=2)", "addTA(I2, col=4, on=2)",
"addTA(I3, col=5, on=2)"), subset = "last 6 months")
If you want to overlay the price and the SMAs in one chart, you can use the option on=1 for each TA.
Thanks to #hvollmeier who made me realize with his answer that I had misunderstood your question in the previous version of my answer.
PS: Note that several options are described in ?addSMA(), including with.col which can be used to select a specific column of the time series (Cl is the default column).
If I understand you correctly you want the 3 SMAs in a SUBPLOT and NOT in your main chart window.You can do the following using newTA.
Using your data:
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
Now plotting a 10,30,50 day SMA in a window below the main window:
chartSeries(price['2016'])
newSMA <- newTA(SMA, Cl, on=NA)
newSMA(10)
newSMA(30,on=2)
newSMA(50,on=2)
The key is the argument on. Use on = NA in defining your new TA function, because the default value foron is 1, which is the main window. on = NA plots in a new window. Then plot the remaining SMAs to the same window as the first SMA. Style the colours etc.to your liking :-).
You may want to consider solving this task using plotting with the newer quantmod charts in the quantmod package (chart_Series as opposed to chartSeries).
Pros:
-The plots look cleaner and better (?)
-have more flexibility via editing the pars and themes options to chart_Series (see other examples here on SO for the basics of things you can do with pars and themes)
Cons:
-Not well documented.
PSEC=getSymbols("PSEC",auto.assign=F)
price=Cl(PSEC)
chart_Series(price, subset = '2016')
add_TA(SMA(price, 10))
add_TA(SMA(price, 30), on = 2, col = "green")
add_TA(SMA(price, 50), on = 2, col = "red")
# Make plot all at once (this approach is useful in shiny applications):
print(chart_Series(price, subset = '2016', TA = 'add_TA(SMA(price, 10), yaxis = list(0, 10));
add_TA(SMA(price, 30), on = 2, col = "purple"); add_TA(SMA(price, 50), on = 2, col = "red")'))

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

Resources