ggplot geom_line to date axis not working - r

I have several data-sets similar to https://www.dropbox.com/s/j9ihawgfqwxmkgc/pred.csv?dl=0
Loading them from CSV and then plotting works fine
predictions$date <- as.Date(predictions$date)
plot(predictions$date, predictions$pct50)
But when I want to use GGPLOT to draw these data predicted points into a plot to compare them with the original points like:
p = ggplot(theRealPastDataValues,aes(x=date,y=cumsum(amount)))+geom_line()
This command
p + geom_line(predictions, aes(x=as.numeric(date), y=pct50))
generates the following error:
ggplot2 doesn't know how to deal with data of class uneval
But as the first plot(predictions$date, predictions$pct50) works with the data I do not understand what is wrong.
Edit
dput(predictions[1:10, c("date", "pct50")])
structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-03",
"2009-07-04", "2009-07-05", "2009-07-06", "2009-07-07", "2009-07-08",
"2009-07-09", "2009-07-10"), pct50 = c(4276, 4076, 4699.93, 4699.93,
4699.93, 4699.93, 4664.76, 4627.37, 4627.37, 4627.37)), .Names = c("date",
"pct50"), row.names = c(NA, 10L), class = "data.frame")
Edit 2
I change this
p + geom_line(data = predictions, aes(x=as.numeric(date), y=pct50))
and the error changed to:
Invalid input: date_trans works with objects of class Date only
Zusätzlich: Warning message:
In eval(expr, envir, enclos) : NAs created
so I think the hint to How to deal with "data of class uneval" error from ggplot2? (see comments) was a good Idea, bit still the plot does not work.

Your first issue (Edit 2) is because ?geom_line uses mapping=NULL as the first argument, so you need to explicitely state the first argument is data
p + geom_line(data = predictions, aes(x=as.numeric(date), y=pct50))
similar question
Your second issue is because your predictions$date is a character vector, and when using as.numeric it introduces NAs. If you want numerics you need to format it as a date first, then convert it to numeric
as.numeric(as.Date(predictions$date), format="%Y%m%d")

Related

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Error message when creating ggplot with recession bands

I have a dataset which plots unemployment over time, and I want to add in bands highlighting when there is a recession.
The original dataframe is called quarterly data.
recession <- data.frame(date_start= as_date(c("1973-07-01", "1980-01-01", "1990-07-01","2008-04-01")),
date_end = as_date(c("1975-07-01","1981-04-01", "1991-07-01","2009-04-01")))
recession$date_start <- ymd (recession$date_start)
recession$date_end <- ymd (recession$date_end)
ggplot(quarterly_data, aes(x=date, y= Unemployment))+
geom_line()+
geom_rect(data = recession, inherit.aes=FALSE , aes(xmin = date_start, xmax = date_end, ymin = -0.1, ymax = 0.1),
fill = "red", alpha= 0.3)
However, when I run the ggplot, I get this error message:
Error: Invalid input: time_trans works with objects of class POSIXct only
Does anyone know how to fix this?
While you have supplied us with the data frame recession, you have not supplied us with the data frame quarterly_data, where you are getting the error. A few pointers here to try, but first, a bit of description of what to gauge is causing this issue.
First of all, time_trans appears to be from the scales package, but it's not clear why that needs to run based on the code above. Is there anything else that could be using the scales package here?
Now for the error message itself, it requires an object of class POSIXct only. This is different than objects of class Date, which are created from the lubridate package that you are using, as apparent from the use of as_date to create the recession data frame.
You can confirm this yourself by running class(recession$date_start), where you can see the output is a Date class object.
After the ymd() function, you are also getting an object of class Date. From the documentation, you should be able to coerce the class to be converted to POSIXct POSIXt via supplying a tz= (time zone) argument. You can see this with the following:
> class(ymd(recession$date_start))
[1] "Date"
> class(ymd(recession$date_start, tz='GMT'))
[1] "POSIXct" "POSIXt"
So, that might fix your problem. But, you still have some detective work to do, since we don't have your other data frame and we apparently are not seeing a function that is trying to call time_trans from the scales package. The other possibility here is that ggplot is calling this to adjust an axis based on a POSIXt object... but I don't see a scale_ call or coord_flip() that might cause this error. I would recommend the following sequence:
Try the "homerun" approach by running your ymd() functions again, but supplying tz="GMT" to force the output to be a POSIXct object. Not sure if this will be successful.
run the ggplot() line itself. Do you get the same error? If so, the error lies within the quarterly_data data frame, and not the recession data frame. If it works, then run the ggplot() line and add in the geom_line() object. If it still works, then your issue is with the geom_rect function, which likely means the recession data frame.
Check the class of date objects in quarterly_data. Are they Date class or POSIXct class? If Date, try to convert them to POSIXct (maybe just use as.POSIXct()).
Is there more code that belongs here from your plot call? If you have coord_flip() or any scale_x or other thematic elements that are added to your plot code, they can definitely be trying to adjust the time scale and result in that error.

Plot two columns in R from csv file

I have just started to learn R and I have a problem with plotting some values read from a CSV file.
I have managed to load the csv file:
timeseries <- read.csv(file="R/scripts/timeseries.csv",head=FALSE,sep=",")
When checking the content of timeseries, I get the correct results (so far, so good):
1 2016-12-29T19:00:00Z 6
...
17497 2016-12-30T00:00:00Z 3
Now, I am trying to plot the values - the date should be on the x-axis and the values on the y-axis.
I found some SO questions about this topic: How to plot a multicolumn CSV file?. But I am unable to make it work following the instructions.
I tried:
matplot(timeseries[, 1], timeseries[, -1], type="1")
Also, I tried various barplot and matplot modifications but I usuassly get some exception like this one: Error in plot.window(...) : need finite 'xlim' values
Could someone suggest how to tackle this problem? Sorry for elementary question...
You need to make sure your dates have class Date.
dates <- c("2016-12-29T19:00:00Z", "2016-12-30T00:00:00Z")
values <- c(6,3)
df <- data.frame(dates, values)
df$dates <- as.Date(df$dates)
Then you could use ggplot2
library(ggplot2)
qplot(df$dates, df$values) + geom_line()
or even the default
plot(df$dates, df$values, type = "l")
or with lattice as in the question you referred to
library(lattice)
xyplot(df$values ~ df$dates, type = "l")

R qqplot argument "y" is missing error

I am relatively new to R and I am struggling with a error messages related to qqplot. Some sample data are at the bottom. I am trying to do a qqplot on some azimuth data, i.e. like compass directions. I've looked around here and the ?qqplot R documentation, but I don't see a solution I can understand in either. I don't understand the syntax for the function or the format the data are supposed to be in, or probably both. I First I tried loading the data as a single column of values, i.e. just the "Azimuth" column.
azimuth <- read.csv(file.choose(), header=TRUE)
qqplot(azimuth$Azimuth)
returns the following error,
Error in sort(y) : argument "y" is missing, with no default
Then I tried including the corresponding dip angles along with the azimuth data and received the same error. I also tried,
qqnorm(azimuth)
but this returned the following error,
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
Dataframe "azimuth":
Azimuth Altitude
23.33211466 -6.561729793
31.51267873 4.801537153
29.04577711 5.24504954
23.63450905 14.03342708
29.12535459 7.224141678
20.76972007 47.95686329
54.89253987 4.837417689
56.57958227 13.12587996
13.09845182 -7.417776178
26.45155154 31.83546988
29.15718557 25.47767069
28.09084746 14.61603384
28.93436865 -1.641785416
28.77521371 17.30536039
29.58690392 -2.202076058
0.779859221 12.92044019
27.1359178 12.20305106
23.57084707 11.97925859
28.99803063 3.931326877
dput() version:
azimuth <-
structure(list(Azimuth = c(23.33211466, 31.51267873, 29.04577711,
23.63450905, 29.12535459, 20.76972007, 54.89253987, 56.57958227,
13.09845182, 26.45155154, 29.15718557, 28.09084746, 28.93436865,
28.77521371, 29.58690392, 0.779859221, 27.1359178, 23.57084707,
28.99803063), Altitude = c(-6.561729793, 4.801537153, 5.24504954,
14.03342708, 7.224141678, 47.95686329, 4.837417689, 13.12587996,
-7.417776178, 31.83546988, 25.47767069, 14.61603384, -1.641785416,
17.30536039, -2.202076058, 12.92044019, 12.20305106, 11.97925859,
3.931326877)), .Names = c("Azimuth", "Altitude"), class = "data.frame", row.names = c(NA, -19L))
Try:
qqPlot
with a capital P.
Maybe you want to create the graph.
Have you ever tried?
qqnorm(azimuth$Azimuth);qqline(azimuth$Azimuth)
It seems that the qqplot function takes two input parameters, x and y as follows:
qqplot(x, y, plot.it = TRUE, xlab = "your x-axis label", ylab="your y-axis label", ...)
When you made your call as given above, you only gave one vector, hence R complained the y argument was missing. Check you input data set and see if you can find what x and y should be for your call to qqplot.

Forecast with data series with quantmod and forecast package

I'm new to ts, and xts object.
When dealing with time series data, I encountered the problem
require(quantmod)
require(forecast)
ticker <- "^GSPC"
getSymbols(ticker, src="yahoo", to = "2013-12-31")
prices <- GSPC[,6] # First get data using package quantmod
# then forecasting using package forecast
prices.ts <- as.ts(prices)
prices.ets <- ets(prices.ts)
prices.fore <- forecast(prices.ets, h=10)
# then plot
plot(prices.fore, xaxt = "n")
My problems are :
1 . When I tried to save the GSPC with date in a csv file. I searched and tried this
write.zoo((GSPC, file = "GSPC.csv", sep = ",", qmethod = "double"))
The error message: Error: unexpected ',' in "write.zoo((GSPC," , I checked the syntax, it seems to be correct, and I tried other combinations. All failed with the similar error message.
also I tried index(GSPC) to get the date.
and then cbind(index(GSPC), GSPC[, 6]). It also failed..
Error message: Error in merge.xts(..., all = all, fill = fill, suffixes = suffixes) :
dims [product 1762] do not match the length of object [3524]
but when I checked the length
> length(GSPC[,6])
[1] 1762
> length(index(GSPC))
[1] 1762
2 . the plot is like this
there's no x-lab and y- lab. I tried the methods of accepted answer posted here, . but failed.
Especially, I don't get the purpose of the following code. It seems to change the appearance of the plot, but it doesn't change the appearance at all. I don't know whether I lose some points.
a = seq(as.Date("2011-11-01"), by="weeks", length=11)
axis(1, at = decimal_date(a), labels = format(a, "%Y %b %d"), cex.axis=0.6)
abline(v = decimal_date(a), col='grey', lwd=0.5)
Also, I want to plot from as.Date("2013-01-01").
Could you please give some suggestions?
Thanks a lot!
You have additional parenthesis. Use
write.zoo(GSPC, file = "GSPC.csv", sep = ",", qmethod = "double")
I don't know what you are trying to achieve with your index and cbind commands. index does not give the data. And if you want the 6th column of GSPC just use GSPC[,6].
It looks like you have some non-standard plotting dimensions. Start a new graphics window and you will reset them to defaults. But you won't get xlab and ylab unless you specify them explicitly. And you won't get an x-axis because you have set xaxt="n"
The questions about the last code block do not seem to relate to your data at all.

Resources