Cross-Recurrence plots in R (with or without ggplot) - r

I have different time-series corresponding to different individuals and their location within a building (a categorical variable -- more like a room name).
I would like to study the similarity in movement of different individuals by something like cross-recurrence plots, where the two time-series correspond to the two axes and the actual points correspond to the presence/absence of individuals in the same room.
Has anyone tried doing such plots in R or while using ggplot? Any help would be great!

I haven't used this routine. I used only d2 dimension and Lyapunov exponent for EEG but this package Tisean (RTisean for your case) has a routine ['recurr'] that returns the specific plot.
This link has a nice wrap up of tutorials and links
Edited:
In this link you can find a nice example of application of recurrence plot.
The return variables of function recur(and similar functions of other packages) you can access after putting $ after the dataset (like database)
and you can access them inside in ggplot function and applying the appropriate aes.

Related

Single linkage hierarchical clustering - boxplots on height of the branches to detect outliers

before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample with more than 800 cases, and in my plot (single linkage dendrogram) the numbers are written across each other and therefore not readable, so it is impossible for me to clearly identify the outliers by just looking at the graph :-/
Here they say, you can create boxplots based on the branch distance to identify outliers in a more objective way. I thought that would be also a great way to just make the row numbers of the outliers in my dataset readable, however I am struggling with creating the boxplots..
https://link.springer.com/article/10.1186/s12859-017-1645-5/figures/3
Does anyone know, how to write the code to get the boxplots based on the height of the branches?
This is the code I use for clustering and attached you can see the plot
dr_dist<-dist(dr_ma_cluster[,c(148:154)])
hc_dr<-hclust(dr_dist,method = "single") #single linkage
plot(hc_dr,labels=(row.names(dr_ma_cluster)))
This is my failed trial to do the boxplot, as I don't know how to address the branch height
> boxplot(hc_dr)
Error in x[floor(d)] + x[ceiling(d)] :
non-numeric argument for binary operator
> boxplot(hc_dr[,c(148:154)])
Error in hc_dr[, c(148:154)] : Incorrect number of dimensions
And here another way to do the graph (and some automated outlier detection approach), but it makes the readability even worse with large datasets..
Another code to plot the tree, even less readable for large datasets:
Delete outliers automatically of a calculated agglomerative hierarchical clustering data
Thanks for any help!!
boxplot(hc_dr$height) as suggested by StupidWolf was the simple thing I was looking for.
Unfortunately I did not manage to label the outlier dots with the rownames from the original dataframe. Rownames from the branch height table were useless as they were assigned in ascending order.
hang = 0.0001 gave a better look to the dendrogram, but labels were still unreadable as still over eachother.
If anyone has a similar problem check R Shiny, zoomable dendrogram program
the code given there in the answer was super easy to adapt, resulting in a zoomable dendrogram, which makes it easy to identify the relevant cases (->outliers). for details search dendextendas proposed by csgroen.
Both together, the boxplot and this nice tool served to identify the rownames of the outliers after single linkage clustering in order to delete them before km means clustering

Issues with combining different (continuous and ordinal) plot types into one plot

I am preparing a figure for a paper presenting data for 2 different experiments in one plot. For that reason I don't need a legend for every plot, so I try to combine them with ggdraw from cowplot.
My code
should generate a reproducible example
and gives this output:
It seems like the two figures get the same slot (A) and the legend gets slot (B). Typically, I would probably use facet wrap to plot them together (which should also guarantee that the scaling/legend is consistent across the two plots.), but that will probably not work in this case, as I am trying to add an additional figure type to C and D.
The problem is that this figure type is ordinal so I have used a somewhat “hacky” approach to plot it, giving me this figure looking essentially as I want it to:
I so far have not been able to extract to another element that ggdraw can use.
Ideally the final plot should roughly look like this (of course with different labels):
How would you go about plotting these different types together?
Thank you for taking time to read my question and I hope that you can help me. I now it is quite a mouth full, but I was not sure how I meaningfully could reduce it to smaller chunks.

Select multiple points on scatterplot, save selection to new table

I have a very large data set (~250,000 records) that I have used to create a linear model. I have plotted predicted vs. actual
.
I tried to use identify() to select the two cluster of values near the center of the graph and coord() to identify them. There are a few problems here: 1)There are many, many more points in those clusters than I can click on and identify individually, and 2)I need to know ALL of them, select all of them somehow with out selecting any others, and subset my data to just those points.
This model was created using a satellite image paired with ancillary spatial data. Each entry in the table corresponds to a particular point on the map. I need to identify where these two clusters are located on the map. My data frame includes the FID (which I can use to link back to the map), the original predictor, the response, and my predicted values.
I appreciate any help!

ggplot2: Impossible to create stacked bar chart WITHOUT reshape/melt?

I am a novice at R and experimenting with as an alternative for data visualisation.
I am having trouble creating a stacked bar chart.
I have tried the reshape2 package with the melt function and have successfully produced one, but I had to explicitly create a dataset containing JUST the x-axis and variables that I want stacked.
It seems extremely counter-intuitive to me that we can't visualise data from a left to right sense (x-axis constant, y variables summed and overlapping).
Is there an alternate method, where I could simply perform a ggplot with the logic of:
ggplot(data=dataset, aes(x=Time, y1=var1, y2=var2, y3=var3.....)) +
geom_bar(stat="identity",position="stack")
where y1, y2, y3 are the variables I want stacked, but do not have corresponding flags for me to use a "fill=flag" type?
I basically want to work off one large master dataset and export multiple analysis without having to excessively isolate each dataset and melt it
In general a stacked bar chart is used to distinguish between variations within a single category of data. For example if you had a bar chart showing the population of three species of migratory fowl that inhabit one specific marsh.
The bars might be mallard ducks, muted swans & Canada geese. Each would have a single whole bar.
The stacking would come in when you looked at these with a trait or quality they might share which you were comparing, such as the number who migrate and those who overwinter locally. The population of each type of fowl would be split into two stacks in the bar, those migrating who are Canada geese, those not...and so on.
It is not really meant to bring together disparate traits into a stack.
So, if you have data that separates out categories of the same population, reshaping the data to create a set of individual types within your data in columns, then differentiating by factors in another (also all in the same column) that is the right move.
If you need to keep it extracted for some reason, you can probably use y = (x$1 +x$2 x$b) to create your stacks, but depending on the data that might fail miserably. The best thing to do is reshape so that the quality you are counting is in a column and you compare those members across some other column with stacks.
If you need to use the data in another format later, create a temporary table, plot and then remove() it and gc() after graphing to get your memory back

How to symbolize groups separately in a graph (R)

I have a small data set consisting of two columns of data and a column designating which of the two sites the data was taken at. I have used xyplot to sort by groups, but I can't figure out how to alter the symbology of each group separately. I also need to add a regression line, and can only figure out how to do that in plot. What graphics package can give me these features in the same graph?
I have looked into different graphics packages to find one in which I can do everything I need, but I am new to R and am not having much luck.
ggplot2 is your go-to.
Try this:
install.packages('ggplot2')
library(ggplot2)
one=c("A","B","A","B")
two=c(1,2,3,4)
three=c(5,7,8,20)
df<-data.frame(one,two,three)
ggplot(df,aes(x=two,y=three,col=one))+geom_line()

Resources