Given the plot of the time-series data, I was wondering if there is a robust function/mathematical formula I can use in R to determine which plots are oscillating. For example each individual graph corresponds to a single cell's intensity value over a certain time period. I would want a method to give a score or some value that would be able to differentiate between plots that are not oscillating (#513 and 559) compared to the plots that are oscillating (508,512,557,558). All the plots have the same scaling.
Related
I'm using DTWCLUST package in r for multivariate time series clustering. Here's my code.
data("uciCT")
mvc <- tsclust(CharTrajMV, k = 4L, distance = "gak", seed = 390L)
plot(mvc)
The CharTrajMV data set has 100 observations with 3 variables. As I understand, clusters are determined based on 3 variables as opposed to univariate time series clustering.
Each cluster graph shows several similarly patterned time series (observations) belonging to that cluster. How is this graph drawn? There are 3 time series variables used for clustering, how does one pattern graph come out? I mean the input is 3-dimentional(variables) dataset, but the output is 1-dimentional.
Moreover, I can get the 3 variables's centroid for each cluster (using mvc#centroids)
plot(mvc, labels = list(nudge_x = -10, nudge_y = 1), type="centroids")
this code shows only one centroid for each cluster. Can I get 3 variables' centroid graphs for each cluster with plot option? or is this right approach?
This is covered in the documentation. Plotting so many different series in separate panes would get very congested, so, for multivariate plots, the variables are appended one after the other, and you get vertical dotted lines to see the place where that happened, maybe injecting some missing values in some places to account for differences in length. This does mean the x axis isn't so meaningful anymore, but it's only meant to be a quick visualization aid.
I've just started learning R, and was wondering, say I have the dataset quake, and I want to generate the probability histogram of quakes near Fiji, would the code simply be hist(quakes$lat,freq=F)?
A histogram shows the frequency or proportion of a given value out of all the values in a data set. You need a numeric vector as the x argument for hist(). There is no flat variable in quakes, but there is a lat variable. hist(quakes$lat, freq = F) would show the following:
This shows the north/south geographical distribution of earthquakes, centering around -20, and, since it is approximately normal (with a left skew) suggests that there is a mechanism for earthquake generation that centers around a specific latitude.
The best way to learn is to try. If you wonder if that would be the way to do it, try it.
You might also want to look at this tutorial on creating kernel density plots with ggplot.
I have a time series dataset with spatial data (x,y coordinates). Each point is static in location, but its value varies over time, ie. each point has its own unique function. I want to assign these functions as a mark, so I can plot the point pattern with each individual time series as a plotting symbol.
This is an exploratory step to eventually perform some spatial functional data analysis.
As an example, I want something like Figure 2 published in this article:
*Delicado,P., R. Giraldo, C. Comas, and J. Mateu. 2010. Spatial Functional Data: Some Recent Contibutions. Environmetrics 21:224-239
I'm having trouble posting an image of the figure
1) Working in R with ggplot2, I can plot a line of change in quant of each id over time:
(Fake example dataset, where x and y are Carteian coordinates, id is an individual observation, and quant are values of id at each year):
x<-c(1,1,1,2,2,2,3,3,3)
y<-c(1,1,1,2,2,2,3,3,3)
year<-c(1,2,3,1,2,3,1,2,3)
id<-c("a","a","a","b","b","b","c","c","c")
quant<-c(5,2,4,2,4,2,4,4,6)
allData<-data.frame(x,y,year,id,quant)
ggplot(allData,aes(x=year,y=quant, group=id))+geom_line()
2) Or I can plot the geographic point pattern of id:
ggplot(allData,aes(x=x,y=y,color=id))+geom_point()
I want to plot the graph from (2), but use the line plots from (1) as the point symbols (marks). Any suggestions?
Is it possible to generate a heatmap taking into consideration both the color and the transparency, with these two parameters given from two different matrices (matrix 1 defines color, matrix 2 defines alpha)?
A little more information on what I'm after:
I have successfully used R and the heatmap.2 function in the gplots package to generate heatmaps - in this case to visualize miRNA interactions. Here, what I want to show is the probability of a particular nucleotide along the typical 20-24 nucleotides of the miRNA in being engaged in target pairing. My heatmap matrix consists of miRNAs (rows) and positions 1-24 (columns) with numeric paring probability in each cell. An example would be changing the alpha parameter of the color determined by the matrix values, such that white=no pairing and dark red=high pairing.
The heatmap.2 function works great for a single such plot, but I would now like to take in overlap information from two different species. Thus, I would need my heatmap to basically consider two matrices:
1) A matrix with the degree of species overlap, e.g. ranging from red-purple-blue for species1-only to species1+2 to species2-only.
2) A matrix with the average degree of pairing, e.g. visualized by the alpha parameter going from a weak-to-strong average pairing (whatever the color) at a given position in matrix 1.
I have tried to use the principles from this post:
Place 1 heatmap on another with transparency in R
But haven't been able to apply its suggestions to my own question.
Thanks in advance!
I need to get a plot of a Lorentz curve of a cumulative variable as a function of the number of observations. I want both axes to be displayed on a percentage basis (e.g. say observations are the number of buyers and the y variable is the amount they bought, buyers are already ranked in descending order, I want to get the plot that says "The top 10% buyers purchased 90% of the total bought"). My dataset is a couple million observations.
What is the best way to do this? Sub-questions:
If I need to add two variables for the quantiles of total observations and total $ bought (so as to use them to plot), what is the object that returns the row number? I tried:
user_quantile <- row(df)/nrow(df)
but I get a matrix of identical columns (user_quantile.1, user_quantile.2) of which I only need one column.
Is there instead any way to skip adding percentages as variables and only have them for axes values?
The plot has way to many points than I need to get the line. What is the best approach to minimize the computational effort and get a nice graph?
Thanks.
You may want to acquaint yourself with the excellent RSeek search engine for R content. One quick query for Lorentz curve (and Lorenz curve) lead to these packages:
ineq: Measuring inequality, concentration, and poverty
reldist: Relative Distribution Methods
GeoXp: Interactive exploratory spatial data analysis
lawstat: An R package for biostatistics, public policy and law
all of which seem to supply a Lorenz curve function.
In order to get the plot done you need first to arrange the raw data.
1) You can use the cut2() function from the Hmisc package to cut the data in quantiles. Check the documentation, it's not hard. It's similar to the cut() from the base package.
2) After using the cut2() function with the income data, you need to compute the frequency of each decile. Use table() for that. Then calculate percentages of income for each decile.
3) Now you should have a very small table with the following columns:
Decile, cumulative % of total income.
Add another column with the 45 degree line. Just add a constant cumulative % of income.
finaltable$cumulative_equality_line = seq(0.1, 1, by = 0.1)
4) You can use base graphics or ggplot2 for plotting. I guess you can do it with the info of step 3 or perhaps check out specific plotting questions.
I'll have to do it soon, but i already have the final table. I'll post the code for plotting once i do it.
Good luck!