Plotting POSIXct in ggplot manually scaling x-axis - r

I am trying to plot up this windspeed data, with years displaying on the x-axis. The data frame was set up as
wsAvg<-data.frame(date=as.POSIXct(ws07$date[1224:1559]),u.1=(ws07$u[1224:1559]),stringsAsFactors = FALSE)
wsAvg<-rbind(wsAvg,c(date=as.POSIXct(ws08$date[1032:1367]),(ws08$u[1032:1367])))
And below using ggplot to plot my windspeed data frame.
ggplot(wsAvg,aes(x=date,y=as.numeric(u.1)))+geom_point(size=3,pch=2)+
geom_smooth(method="lm",colour="black",se=FALSE)+
#scale_x_datetime(limits=as.POSIXct(c('2006-09-01','2016-10-01')),breaks=date_breaks("1 year"),labels=date_format("%Y"))+
Without the scale_x_datetime() in my command, I get those dates. When I add in the scale_x_datetime() function to manually scale my x-axis to display only years. All my data lines up onto 2007. Anyone know why this is?

It is very difficult to provide the answer to your question, since we don't have a clear picture of any of your data. With that being said, let's look at the information you did provide and see where the likely source of the problem is for your question.
The issue is clearly related to the formatting/data located in your "date" column. It's best to look at this stepwise and test at each step to see what can go wrong here:
Your raw data: There is likely nothing wrong with your base data, but we don't know the format of the "date" vector coming from ws07$date[1224:1559] and ws08$date[1032:1367]. Your raw data originates from two data frames, so just confirm that the raw data from these two vectors is formatted identically, but more importantly, is it already formatted as a date? What is class(ws08$date)? Also, what does the data look like if you took a sample of that dataset? (e.g. ws07$date[sample(1224:1559, 20)]).
Conversion to POSIXct: The first code you show includes as.POSIXct(), but does not include the argument for format=. You may or may not need to specify this, but I would recommend consulting the documentation to be sure you're using the function correctly. You can try converting a small subset of the data just using as.POSIXct(ws07$date[1224:1250]) or something like that. Does it give you the dates formatted correctly? If not, try specifying the format= arg until it "works" as you intended.
Initial Plot and Second plot The data is spread out in the first plot, likely kind of how you expected. What about the month/day combinations in the first plot - are they correct? If they are correct, it may indicate the year is being read wrong, since apparently all dates are clustered around May and June of 2007. Comparing the first and second plots, there's no obvious issue with scale_x_datetime() here. Those two plots are consistent with data that has x values = dates ranging from May-June of 2007.
Bottom line: Hard to discern exactly where it's going wrong for you, but likely it's (1) in the conversion to date using as.POSIXct from your ws07 and ws08 datasets, or (2) the format of ws07$date or ws08$date being imported/converted incorrectly. The solution is to use the format= argument in the date conversion/import function you are using to ensure that the format is correct and years/months/dates are imported accordingly.

The code that worked for me. Instead of using c() function when I was binding data from other datasets, I had to use data.frame() to add other years into the wsAvg data frame.
wsAvg<-data.frame(date=as.POSIXct(ws07$date[1224:1559]),u.1=(ws07$u[1224:1559]),stringsAsFactors = FALSE)
wsAvg<-rbind(wsAvg,data.frame(date=as.POSIXct(ws08$date[1032:1367]),u.1=(ws08$u[1032:1367])))

Related

Reading dates as dates and not characters

The data I have been working with reads everything just fine, **except** for the date column. It always reads it as characters instead.
This would be fine except that, when I have lots of dates (like over 400 of them), then you can see something like this on a scatterplot:
Scatter Plot
In essence, I have two questions.
The first is, apart from using as.Date, which is fine when I'm needed temporary stuff, how do I permanently make R read the date column as legit dates? What I mean is, is there a way I can make that date column read as dates when I am using read.csv or read.excel?
When graphing, like the graph I have included here, how can I only include some of the labels throughout so that it won't be so cramped up? I still want all the data, but really do not want all those labels.
I was hoping to add the data file, but I am unaware of how to add excel/csv files on this website and my data set is quite long (n = 491). I do have 9 columns, 1 of which is the date column. The others are numbers or actual letters (the latter of which is in fact a character). I can add maybe a few rows just to help out.
Some of the data set

Trouble showing year on x-axis with ggplot

I am using ggplot to show publications over time by year. However, my x-axis is showing up as integers (ex 2015.0) instead of each year showing up under each bar.
p <-ggplot(pubs, aes(x = Year, y=Pubs, fill=Author.order)) +
geom_bar(stat="identity", position=position_stack(reverse = TRUE),size=.3)
Like the other user mentioned, it's difficult to help you without seeing your dataset, pubs. With that being said, if you are seeing your axis labeled as 2015.0 for the year 2015, that definitely means pubs$Year is not formatted as a date. As such, ggplot treats it like a normal numeric.
Try formatting as date first:
pubs$Year <- as.Date(pubs$Year, format="your.format.here")
If I saw how pubs$Year was constructed, I could give you a suggestion for what to use for format=. By default the as.Date function will try different formats according to the documentation and hit you with an error if there are none found based on your data. as.Date uses strptime() to convert into Date class, so you can look at that documentation to help understand how to write the format= piece. Note that strptime converts characters to date types, so if your data is numeric, you may actually want to convert to character then convert to date. Sounds weird, but I've done it myself because it works. :)
There are more classes other than Date in R for representation of a date or date/time value, however. You can also use POSIXct or POSIXlt using any of the as.POSIX functions, which operate similarly to the as.Date function. Any of those formats should work fine with ggplot.
Finally, if you want to specify anything related to the scales for representation in your plot, you can use scale_x_date to change breaks, limits, format labels, etc. See documentation here.

Looking for advice on creating Tidy data from the start

I have a data set that will be growing. It is categorical observations (i.e., 1=yes, 2=no) by date and hour. Is the following an acceptable method of formatting for import to R or is there a better way?
I would use a template like this:
Using one column for the date makes it much easier to read/import into R. Also, the YYYY-MM-DD is the default format in R for date columns. Trying to write date and hour together in one column could be done but seems like it could be tedious and not as easy to see what is going on in the data. As was mentioned in the comments above, each observation should be on a separate row. Once you save the data as a csv, it will be easily imported into R.
Good luck.

Dates on X Axis

I have looked everywhere ( Example ) and a few other posts, but I just cannot understand how to setup the date on the X axis. Can someone help me? This is what I have so far:
Data.CSV:
21-Oct-14,
22-Oct-14,
23-Oct-14,....etc
I have tried this: axis(1,at=NVDA_Data$Date) and it doesn't show up and the console says its NULL
I just want to change where it says 0-250 with the dates
You can improve your question by letting us know you are using base R plot (which it appears from your screen shot to be true).
One easy solution is to convert your date-looking character or factor variable into a Date variable. Plot will generally make reasonable axis marks from a Date variable. Use summary() on your data frame to determine if the Date variable is a factor or character. If it is a character, then do something like this:
data.frame$Date2 <- as.Date(data.frame$Date, "%d-%b-%y")
If your Date column is read in as a factor, use colClasses to read it in as character, then use the snippet above.
The link #user20650 provided gives some good tips for labeling Date variables.

Plotting Time Series

I'm working on 16 world indices over three year and i want to make a plot from these 16 indices.
all<-read.table("C.../16indices.txt")
dimnames(all)[[2]]<-c("Date","BEL 20","CAC 40","AEX","DAX","FTSE 100","IBEXx 35","ATX","SMI","FTSE MIB","RTX","HSI","NIKKEI 225","S&P 500","NASDAQ","Dow Jones","BOVESPA")
attach(all)
Problems
My dates are written in the form "2009-01-05". I want only "2009" to appear otherwise i would have to many jumps.
For example the prices from the BOVESPA go from 40.000,15 to 60.000,137. How do I get nice y-labels? For instance 40.000, 45.000,...,60.000.
How do i get 16 of these plots in one nice figure/plot?
I'm not used to work with R. I tried something like this but that didn't work...
plot(all[1,],all[,2])
Biggest problem is no sample data> Here is advice based on guesswork:
I tried something like this but that didn't work... plot(all[1,],all[,2])
You need to format your date values as R Date class. If they are in YYYY-MM-DD format it will be as simple as:
all$Date <- as.Date(all.Date)
To your specific questions:
1) My dates are written in the form "2009-01-05". I want only "2009" to appear otherwise i would have to many jumps.
You will need to suppress axis plotting in the plot call and then need to add an axis() call.
2) For example the prices from the BOVESPA go from 40.000,15 to 60.000,137. How do I get nice y-labels? For instance 40.000, 45.000,...,60.000.
You appear to be in a European locale and that mean your initial read.table call probably mangled the data input and you need to read the documentation for read.csv2 which will properly handle the reversal of the decimal point and comma meanings for numeric data. You should also use colClasses.
3) How do i get 16 of these plots in one nice figure/plot?
You should probably calculate ratios from an initial starting point for each series so there can be a common scale for display.

Resources