I have a dataset with a x-y-z structure.
X = age of arrival in the city
Y = year of arrival
Z = number of current survivors from X/Y combination
I have no problem plotting this for any given time using RGL in R. However I would like to introduce a time dimension.
I could of course make 23 plots and paste them together, but I would like to be able to manipulate the viewing on the fly, and treat the whole time series as one plot. I have Z values for 23 years. I also would like to colour my plot with an extra z2 variable, being z_year/z_(year-1). Is this possible within the RGl pakcage with some programming or is there a better package available?
Try creating a video like described on SO..
Alternative is a for-loop with a plot and delaying it -> look at ?Sys.sleep
Related
I am trying to find a function that matches two time series such that the datetime corresponds to reality.
So I need a function that minimizes the distance between the two curves shown above and outputs a new dataframe that has TAIR time-shifted towards the values of tre200h0.
From my bare eyes, it looks like this shift is about 22h.
ggplot
Best,
Fabio
I don't know a function that does this job for me.
Solved by Ric Villalba in the comments to OG Question.
Two R base functions to analyze time series lags are acf and pacf. i.e. given you have x and y you can use acf(y-x) and seek the zeroes in the plot (if your series have adequate seasonal behaviour), or, if you prefer, acf(y-x, plot=F) and get the data. Try which.min( acf(x-y)$acf^2 ).
Of course, it is a simplification of otherwise complex matter
I've just started learning R, and was wondering, say I have the dataset quake, and I want to generate the probability histogram of quakes near Fiji, would the code simply be hist(quakes$lat,freq=F)?
A histogram shows the frequency or proportion of a given value out of all the values in a data set. You need a numeric vector as the x argument for hist(). There is no flat variable in quakes, but there is a lat variable. hist(quakes$lat, freq = F) would show the following:
This shows the north/south geographical distribution of earthquakes, centering around -20, and, since it is approximately normal (with a left skew) suggests that there is a mechanism for earthquake generation that centers around a specific latitude.
The best way to learn is to try. If you wonder if that would be the way to do it, try it.
You might also want to look at this tutorial on creating kernel density plots with ggplot.
I need some help with Excel plotting
I have two sets of data, similar to the following (sorry for the poor formatting, I'm new here)
Point,Date,MaxCPU,PercentCPU
1,1/1/2016,400,50
2,2/1/2016,400,65
3,3/1/2016,400,75
4,4/1/2016,400,63
5,5/1/2016,600,75
6,6/1/2016,600,80
7,7/1/2016,600,68
etc
I want to plot the PercentCPU as a column graph using 2 data series (A and B, differentiated when MaxCPU changes from 400 to 600)
The first data series (Series A) is the PercentCPU for points 1 to 4 inclusive (Colour Blue), then I want to plot a second series (series B) using PercentCPU for points 5 - 7 inclusive (Colour Red)
I've seen plenty of help video's and help documents on how to do this they are not what I want as I need the second series B to continue on, on the x-axis after series A finishes
The net result is to have 1 continuous looking chart that has both series A and B on it, both with different colours but series B following on from series A, so that visually one can easily see the PercentCPU changes when MaxCPU changed from 400 to 600 (MaxCPU is not being graphed)
Try as I might, all I can get is series B plonking itself right on top of series A on the x-axis (line graph) or intermingling (column graph), I'm at a loss at how to get these two series side by side
Even trying a two series Y axis doesn't help, the graph still resorted to merging the two data series
How to get two data series on the same graph, one following the other along the x-axis, side by side, instead of both data series starting at the 0 x-axis origin?
Please enlighten me oh Excel guru's :-)
If you want different colored lines, use a different series for each line. The easiest way to achieve that is to arrange the data into separate columns, one column for each series. This can be done in a helper table on a separate sheet, using formulas that reference the original data.
For correct X axis placement on a category x axis, ensure that all rows of data are included in the series, even empty cells.
To connect two series, you need two data points in exactly the same position, one in each series, so that they overlap.
I have a time series dataset with spatial data (x,y coordinates). Each point is static in location, but its value varies over time, ie. each point has its own unique function. I want to assign these functions as a mark, so I can plot the point pattern with each individual time series as a plotting symbol.
This is an exploratory step to eventually perform some spatial functional data analysis.
As an example, I want something like Figure 2 published in this article:
*Delicado,P., R. Giraldo, C. Comas, and J. Mateu. 2010. Spatial Functional Data: Some Recent Contibutions. Environmetrics 21:224-239
I'm having trouble posting an image of the figure
1) Working in R with ggplot2, I can plot a line of change in quant of each id over time:
(Fake example dataset, where x and y are Carteian coordinates, id is an individual observation, and quant are values of id at each year):
x<-c(1,1,1,2,2,2,3,3,3)
y<-c(1,1,1,2,2,2,3,3,3)
year<-c(1,2,3,1,2,3,1,2,3)
id<-c("a","a","a","b","b","b","c","c","c")
quant<-c(5,2,4,2,4,2,4,4,6)
allData<-data.frame(x,y,year,id,quant)
ggplot(allData,aes(x=year,y=quant, group=id))+geom_line()
2) Or I can plot the geographic point pattern of id:
ggplot(allData,aes(x=x,y=y,color=id))+geom_point()
I want to plot the graph from (2), but use the line plots from (1) as the point symbols (marks). Any suggestions?
Hi I am using partitioning around medoids algorithm for clustering using the pam function in clustering package. I have 4 attributes in the dataset that I clustered and they seem to give me around 6 clusters and I want to generate a a plot of these clusters across those 4 attributes like this 1: http://www.flickr.com/photos/52099123#N06/7036003411/in/photostream/lightbox/ "Centroid plot"
But the only way I can draw the clustering result is either using a dendrogram or using
plot (data, col = result$clustering) command which seems to generate a plot similar to this
[2] : http://www.flickr.com/photos/52099123#N06/7036003777/in/photostream "pam results".
Although the first image is a centroid plot I am wondering if there are any tools available in R to do the same with a medoid plot Note that it also prints the size of each cluster in the plot. It would be great to know if there are any packages/solutions available in R that facilitate to do this or if not what should be a good starting point in order to achieve plots similar to that in Image 1.
Thanks
Hi All,I was trying to work out the problem the way Joran told but I think I did not understand it correctly and have not done it the right way as it is supposed to be done. Anyway this is what I have done so far. Following is how the file looks like that I tried to cluster
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155
following is the Pam clustering output
GRMZM2G181227
1
GRMZM2G146885
2
GRMZM2G139463
2
GRMZM2G015295
2
GRMZM2G111909
2
GRMZM2G078097
3
GRMZM2G450498
3
GRMZM2G413652
2
GRMZM2G090087
2
AC217811.3_FG003
2
Using the above two files I generated a third file that somewhat looks like this and has cluster information in the form of cluster type K1,K2,etc
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip Cluster_type
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722 K1
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648 K2
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344 K2
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755 K2
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883 K2
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945 K3
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794 K3
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949 K2
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155 K2
I certainly don't think that this is the file that joran would have wanted me to create but I could not think of anything else thus I ran lattice on the above file using the following code.
clusres<- read.table("clusinput.txt",header=TRUE,sep="\t");
jpeg(filename = "clusplot.jpeg", width = 800, height = 1078,
pointsize = 12, quality = 100, bg = "white",res=100);
parallel(~clusres[2:5]|Cluster_type,clusres,horizontal.axis=FALSE);
dev.off();
and I get a picture like this
Since I want one single line as the representative of the whole cluster at four different points this output is wrong moreover I tried playing with lattice but I can not figure out how to make it accept the Rpkm values as the X coordinate It always seems to plot so many lines against a maximum or minimum value at the Y coordinate which I don't understand what it is.
It will be great if anybody can help me out. Sorry If my question still seems absurd to you.
I do not know of any pre-built functions that generate the plot you indicate, which looks to me like a sort of parallel coordinates plot.
But generating such a plot would be a fairly trivial exercise.
Add a column of cluster labels (K1,K2, etc.) to your original data set, based on your clustering algorithm's output.
Use one of the many, many tools in R for aggregating data (plyr, aggregate, etc.) to calculate the relevant summary statistics by cluster on each of the four variables. (You haven't said what the first graph is actually plotting. Mean and sd? Median and MAD?)
Since you want the plots split into six separate panels, or facets, you will probably want to plot the data using either ggplot or lattice, both of which provide excellent support for creating the same plot, split across a single grouping vector (i.e. the clusters in your case).
But that's about as specific as anyone can get, given that you've provided so little information (i.e. no minimal runnable example, as recommended here).
How about using clusplot from package cluster with partitioning around medoids? Here is a simple example (from the example section):
require(cluster)
#generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
clusplot(pam(x, 2)) #`pam` does you partitioning