Plotting Time Series - r

I'm working on 16 world indices over three year and i want to make a plot from these 16 indices.
all<-read.table("C.../16indices.txt")
dimnames(all)[[2]]<-c("Date","BEL 20","CAC 40","AEX","DAX","FTSE 100","IBEXx 35","ATX","SMI","FTSE MIB","RTX","HSI","NIKKEI 225","S&P 500","NASDAQ","Dow Jones","BOVESPA")
attach(all)
Problems
My dates are written in the form "2009-01-05". I want only "2009" to appear otherwise i would have to many jumps.
For example the prices from the BOVESPA go from 40.000,15 to 60.000,137. How do I get nice y-labels? For instance 40.000, 45.000,...,60.000.
How do i get 16 of these plots in one nice figure/plot?
I'm not used to work with R. I tried something like this but that didn't work...
plot(all[1,],all[,2])

Biggest problem is no sample data> Here is advice based on guesswork:
I tried something like this but that didn't work... plot(all[1,],all[,2])
You need to format your date values as R Date class. If they are in YYYY-MM-DD format it will be as simple as:
all$Date <- as.Date(all.Date)
To your specific questions:
1) My dates are written in the form "2009-01-05". I want only "2009" to appear otherwise i would have to many jumps.
You will need to suppress axis plotting in the plot call and then need to add an axis() call.
2) For example the prices from the BOVESPA go from 40.000,15 to 60.000,137. How do I get nice y-labels? For instance 40.000, 45.000,...,60.000.
You appear to be in a European locale and that mean your initial read.table call probably mangled the data input and you need to read the documentation for read.csv2 which will properly handle the reversal of the decimal point and comma meanings for numeric data. You should also use colClasses.
3) How do i get 16 of these plots in one nice figure/plot?
You should probably calculate ratios from an initial starting point for each series so there can be a common scale for display.

Related

Reading dates as dates and not characters

The data I have been working with reads everything just fine, **except** for the date column. It always reads it as characters instead.
This would be fine except that, when I have lots of dates (like over 400 of them), then you can see something like this on a scatterplot:
Scatter Plot
In essence, I have two questions.
The first is, apart from using as.Date, which is fine when I'm needed temporary stuff, how do I permanently make R read the date column as legit dates? What I mean is, is there a way I can make that date column read as dates when I am using read.csv or read.excel?
When graphing, like the graph I have included here, how can I only include some of the labels throughout so that it won't be so cramped up? I still want all the data, but really do not want all those labels.
I was hoping to add the data file, but I am unaware of how to add excel/csv files on this website and my data set is quite long (n = 491). I do have 9 columns, 1 of which is the date column. The others are numbers or actual letters (the latter of which is in fact a character). I can add maybe a few rows just to help out.
Some of the data set

Plotting POSIXct in ggplot manually scaling x-axis

I am trying to plot up this windspeed data, with years displaying on the x-axis. The data frame was set up as
wsAvg<-data.frame(date=as.POSIXct(ws07$date[1224:1559]),u.1=(ws07$u[1224:1559]),stringsAsFactors = FALSE)
wsAvg<-rbind(wsAvg,c(date=as.POSIXct(ws08$date[1032:1367]),(ws08$u[1032:1367])))
And below using ggplot to plot my windspeed data frame.
ggplot(wsAvg,aes(x=date,y=as.numeric(u.1)))+geom_point(size=3,pch=2)+
geom_smooth(method="lm",colour="black",se=FALSE)+
#scale_x_datetime(limits=as.POSIXct(c('2006-09-01','2016-10-01')),breaks=date_breaks("1 year"),labels=date_format("%Y"))+
Without the scale_x_datetime() in my command, I get those dates. When I add in the scale_x_datetime() function to manually scale my x-axis to display only years. All my data lines up onto 2007. Anyone know why this is?
It is very difficult to provide the answer to your question, since we don't have a clear picture of any of your data. With that being said, let's look at the information you did provide and see where the likely source of the problem is for your question.
The issue is clearly related to the formatting/data located in your "date" column. It's best to look at this stepwise and test at each step to see what can go wrong here:
Your raw data: There is likely nothing wrong with your base data, but we don't know the format of the "date" vector coming from ws07$date[1224:1559] and ws08$date[1032:1367]. Your raw data originates from two data frames, so just confirm that the raw data from these two vectors is formatted identically, but more importantly, is it already formatted as a date? What is class(ws08$date)? Also, what does the data look like if you took a sample of that dataset? (e.g. ws07$date[sample(1224:1559, 20)]).
Conversion to POSIXct: The first code you show includes as.POSIXct(), but does not include the argument for format=. You may or may not need to specify this, but I would recommend consulting the documentation to be sure you're using the function correctly. You can try converting a small subset of the data just using as.POSIXct(ws07$date[1224:1250]) or something like that. Does it give you the dates formatted correctly? If not, try specifying the format= arg until it "works" as you intended.
Initial Plot and Second plot The data is spread out in the first plot, likely kind of how you expected. What about the month/day combinations in the first plot - are they correct? If they are correct, it may indicate the year is being read wrong, since apparently all dates are clustered around May and June of 2007. Comparing the first and second plots, there's no obvious issue with scale_x_datetime() here. Those two plots are consistent with data that has x values = dates ranging from May-June of 2007.
Bottom line: Hard to discern exactly where it's going wrong for you, but likely it's (1) in the conversion to date using as.POSIXct from your ws07 and ws08 datasets, or (2) the format of ws07$date or ws08$date being imported/converted incorrectly. The solution is to use the format= argument in the date conversion/import function you are using to ensure that the format is correct and years/months/dates are imported accordingly.
The code that worked for me. Instead of using c() function when I was binding data from other datasets, I had to use data.frame() to add other years into the wsAvg data frame.
wsAvg<-data.frame(date=as.POSIXct(ws07$date[1224:1559]),u.1=(ws07$u[1224:1559]),stringsAsFactors = FALSE)
wsAvg<-rbind(wsAvg,data.frame(date=as.POSIXct(ws08$date[1032:1367]),u.1=(ws08$u[1032:1367])))

simple R Time Series function plotting

thank you kindly for your time.
I'm merely trying to plot a simple time series data set, but am running into a number of basic issues (one of which I'll ask here). For example, I have a notepad file that starts with:
"x"
"1",2.731
"2",2.562
"3",2.632
"4",2.495
"5",1.978
...and so on...
So R reads it just fine, e.g. myfile=read.table("F:/Documents/myfile.txt",sep=""). However, the values seem to change under a conversion using R's ts function, i.e.
myfile = ts(myfile,start=1,end=120,frequency=1)
plot(myfile, type="o",pch=22,lty=1,pty=2,xlab="Month",ylab="Values",main="My File")
So when plotted, the first value starts at 20+ for some reason, as opposed to 2+. Furthermore, R assumes that the y-axis goes from 1 to 120 (mirroring the x-axis), which is not the right scale (i.e. 0 through 10). In another data set that I did (using integers), it was shifted upward by 1. In any event, I believe the issue is probably about how to properly identifying the y-axis.
Any ideas on how to tackle this? Thanks!

Limiting Window Size and/or Removing Specific Rows of Time Values In R

I'm trying to figure out how to observe just one particular section of the data in the graph below (e.g. 5pm onwards). I know there are basically two methods of doing this:
1) Method 1: Limiting the window size, which requires the following function:
< symbols(Data$Times, Data$y, circles=Data$z, xlim=c("5:00pm","10:00pm"))
The problem is, I get an "invalid 'xlim' value" error when I try to input the two time endpoints.
2) Method 2: Clearing out the rows in Data$Times that have values over 5pm.
The problem here is that I'm not sure how to sort the rows by earliest time -> latest time OR how to define a new variable such that TimesPM <- Data$Times>"5pm" (what I typed just now obviously did not work.)
Any ideas? Thanks in advance.
ETA: This is what I plotted:
Times<-strptime(DATA$Time,format="%I:%M%p")
symbols(Times, y, circles=z, xaxt='n', inches=.4, fg="3", bg=(a), xlab="Times", ylab="y")
axis.POSIXct(1, at=Times, format="%I:%M%p")
Both approaches have the problem that in all likelihood your datetime format will not equal the values expressed just as a character vector like "5:00pm" even after coercion with the ">" comparison operator. To get the best advice you need to present str(DATA$Times) or dput(head(DATA$Times)) or class(Data$Times) . Generally plotting functions recognize either valid date or datetime classes or their numeric representation. If the ordering operation is not working, then it raises the question whether you have a proper class. But you appear to have an axis labeling that suggests a date-time format of some sort, and that we just need to figure out what class it really is.
Because you are creating a character vector from you Time column, you probably want to apply the restriction before you send the DATA$Time vector to strptime(). You still have not offered the requested clarifications, so I have no way to give tested or even very specific code, but you might be doing something like
Times<-strptime(DATA$Time[ as.POSIXlt(DATA$Time)$hour >= 17 &
as.POSIXlt(DATA$Time)$hour <= 22 ] ,
format="%I:%M%p")

Plotting hundreds of hours of data with gnuplot

I am trying to plot data from a simulation that tracks simulation time in (hours):(minutes):(seconds) format, but does not turn (hours) into days - so (hours) can be in the hundreds. When gnuplot plots data by time, however ("set xdata time"), it only plots up to 99 hours in one continuous plot; after that, it loops back around and starts overplotting hour 100+ near the beginning (and even then, does weird stuff). Does anyone know why this happens and/or how to get around it?
I also looked into reading the components of the time column (which is the 3rd field of data on each line, but not necessarily a fixed number of characters into the line) in as 3 simple numbers (integers), then converting to a real number, which happens to be a decimal version of the time (e.g., 107:45:00 -> 107.75), which would be fine for the plot, but I haven't been able to figure out how to get gnuplot to do that, either.
Any other ideas are welcome. (I would rather not alter the original file, due to the additional complexity of multiple versions of each file, having to teach others how to convert the file and how to figure out the plot didn't work because they didn't convert the file, etc.)
Version 2 of MathGL (GPL plotting library) have time ticks which can be set as you want (using standard strftime() format). However it is in beta version now -- stable version should appear at October 2011.

Resources