hcclust, clustered observed correlation network data - r

I am new to R and Network theory. I would like to calculate the cluster correlation to investigate the structural equivalence of my data. I have been looking for a solution of the problem for quite some time now, also in this forum. I could find relevant contributions, but could not find out a solution for my problem.
The first note I should make is that I stacked directed network matrices on top of each other (one transposed and one not transposed for the directed network data). Following the structure http://www.imsbio.co.jp/RGM/R_rdfile?f=NetCluster/man/clusterCorr.Rd&d=R_CC
when I calculate
clustered_observed_cors <- clustConfigurations(num_vertices, advice_m1_money_m1_kinship_m1_hclust, advice_m1_money_m1_kinship_m1_cors)
the program stops and I cannot compute anything anymore until I hit the red stop button in the bottom left window of R Studio. That's my problem. does anyone have an idea what could be the reason for this being the case?
The error I get is
In cor(as.vector(d[g1[i], , ]), as.vector(d[g2[j], , ]), use = "complete.obs") :
the standard deviation is zero
Any help appreciated,
Simon

Related

R - replicate weight survey

Currently I'm interested in learning how to obtain information from the American Community Survey PUMS files. I have read some of the the ACS documentation and found that to replicate weights I must use the following formula:
And thanks to google I also found that there's the SURVEY package and the svrepdesign function to help me get this done
https://www.rdocumentation.org/packages/survey/versions/3.33-2/topics/svrepdesign
Now, even though I'm getting into R and learning statistics and have a SQL background, there are two BIG problems:
1 - I have no idea what that formula means and I would really like to understand it before going any further
2 - I don't understand how the SVREPDESIGN function works nor how to use it.
I'm not looking for someone to solve my life/problems, but I would really appreciate if someone points me in the right direction and gives a jump start.
Thank you for your time.
When you are using svyrepdesign, you are specifying that it is a design with replicated weights, and it uses the formula you provided to calculate the standard errors.
The American Community Survey has 80 replicate weights, so it first calculates the statistic you are interested in with the full sample weights (X), then it calculates the same statistic with all 80 replicate weights (X_r).
You should read this: https://usa.ipums.org/usa/repwt.shtml

How to add zoom option for wordcloud in Shiny (with reproducible example)

Could you please help me to add zooming option for wordcloud
Please find reproducible example #
´http://shiny.rstudio.com/gallery/word-cloud.html´
I tried to incorporate rbokeh and plotly but couldnt find wordcloud equivalent render function
Additionally, I found ECharts from github #
´https://github.com/XD-DENG/ECharts2Shiny/tree/8ac690a8039abc2334ec06f394ba97498b518e81´
But incorporating this ECharts are also not convenient for really zoom.
Thanks in advance,
Abi
Normalisation is required only if the predictors are not meant to be comparable on the original scaling. There's no rule that says you must normalize.
PCA is a statistical method that gives you a new linear transformation. By itself, it loses nothing. All it does is to give you new principal components.
You lose information only if you choose a subset of those principal components.
Usually PCA includes centering the data as a Pre Process Step.
PCA only arranges the data in its own Axis (Eigne Vectors) System.
If you use all axis you lose no information.
Yet, usually we want to apply Dimensionality Reduction, intuitively, having less coordinates for the data.
This process means projecting the data into Sub Space which is spanned by only some of the Eigen Vectors of the data.
If one chose wisely the number of vectors one might end up with a significant reduction in the number of dimensions of the data with negligible loss of data / information.
The way to do so is by choosing Eigen Vectors which their Eigen Values sum to most of the data power.
PCA itself is invertible, so lossless.
But:
It is common to drop some components, which will cause a loss of information.
Numerical issues may cause a loss in precision.

Multiple regression lines to define a set of data

I am trying to use a regression model to establish a relationship between two parameters, A and B(more specifically, runtime and workload, so that can I recommend what an optimal workload could be maybe, or how strongly one affects the other etc. ) I am using 'rlm'(robust linear model) for this purpose since it saves me the trouble of dealing with outliers before hand.
However, rather than output one single regression model, I would like to determine a band that can confidently explain most of the points. Here is an image I took from the web. Those additional red lines are what I want to determine.
This is what I had in mind :
1. I found the mean of the residuals of all the points lying above the line. Then we probably shift the original regression line by some multiple of mean + k*sigma. The same can be done for the points below the line.
In SVM, in order to find the support vectors, we draw parallel lines(essentially shift the middle line until we find support vectors on either sides). I had something like that in mind. Play around with the intercepts a little and find the the number of points which can be explained by the band. Keep a threshold so you can stop somewhere.
The problem is, I am unable to implement this in R. For that matter, I am not sure if these approaches even work either. I would like to know what you would suggest. Also, is there a classic way to do this using one of the many R packages?
Thanks a lot for helping. Appreciate it.

Clustering time series in R

i have a problem with clustering time series in R.
I googled a lot and found nothing that fits my problem.
I have made a STL-Decomposition of Timeseries.
The trend component is in a matrix with 64 columns, one for every series.
Now i want to cluster these series in simular groups, involve the curve shapes and the timely shift. I found some functions that imply one of these aspects but not both.
First i tried to calculte a distance matrix with the dtw-distance so i
found clusters based on the values and inply the time shift but not on the shape of the timeseries. After this i tried some correlation based clustering, but then the timely shift
we're not recognized and the result dont satisfy my claims.
Is there a function that could cover my problem or have i to build up something
on my own. Im thankful for every kind of help, after two days of tutorials and examples i totaly uninspired. I hope i could explain the problem well enough to you.
I attached a picture. Here you can see some example time series.
There you could see the problem. The two series in the middle are set to one cluster,
although the upper and the one on the bottom have the same shape as one of the middle.
Have you tried the R package dtwclust
https://cran.r-project.org/web/packages/dtwclust/index.html
(I'm just starting to explore this package, but it seems like it covers a lot of aspects of time series clustering and it has lots of good references.)
you can use the kml package. It is used specifically to longitudinal data. You can consult its help. It has the next example:
### Generation of some data
cld1 <- generateArtificialLongData(25)
### We suspect 3, 4 or 6 clusters, we want 3 redrawing.
### We want to "see" what happen (so printCal and printTraj are TRUE)
kml(cld1,c(3,4,6),3,toPlot='both')
### 4 seems to be the best. We want 7 more redrawing.
### We don't want to see again, we want to get the result as fast as possible.
kml(cld1,4,10)
Example cluster

Peak to Gaussian function estimation

I have a continuous flow of input data : more than 100.000. There are two values a time and an intensity. The data contains many peaks. Let se part of the data.
Objectives : Search for peaks -> identify them -> calculate area.
Problem : a huge peak (like the one betwen 8.0 and 8.5) could be contain multiple Gaussian (this is just one "normal", there are other type of estimation functions also) peaks for example.
Question : How can I "deconvolute" this peaks in order to measure the area of them.
Example : I want to do something similar like the following matlab code :
iPeak
Well, if "iPeak" is good enough, just port the code to R. There are a couple packages in R that do thresholded peak finding. Naturally I forgot their names, being away from my main machine. If I can refresh my brain... possible fits for you: PROcess, ppc, seewave, Peaks, and the one I like: pracma.
Note: I found these using the incredibly useful tool sos

Resources