I am using TSclust package for SAX (symbolic aggregate aggregation) plots. In accordance to example shown on page 25, I am using the function
SAX.plot(as.ts(df$power), w=30, alpha=4)
But, it generates error as:
Error in if ((n <- as.integer(n[1L])) > 0) { : argument is of length zero
I am not able to debug it. Even I looked into the source code of SAX.plot function but I do not find the relevant error message typed in.
The required R dataobject can be found at link
R version: 3.2
TSclust version:1.2.3
Hello apparently it's because you need to normalize your data, check out this example :
# Parameters
w <- 30
alpha <- 4
# PAA
x <- df$power
paax <- PAA(x, w)
plot(x, type="l", main="PAA reduction of series x")
p <- rep(paax,each=length(x)/length(paax)) #just for plotting the PAA
lines(p, col="red")
# SAX
convert.to.SAX.symbol(paax , alpha)
# [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# You need to scale PAA result
convert.to.SAX.symbol(scale(paax) , alpha)
# [1] 1 1 1 1 1 1 1 1 1 2 2 1 4 3 3 1 2 2 2 4 4 4 1 1 2 4 3 3 4 4
# SAX plot : with scaling this works
SAX.plot(as.ts(scale(df$power)), w=w, alpha=alpha)
That's likely the example you can found in the function help page.
Related
I'm trying to run grm in ltm package. My script is as follows:
library (ltm)
library (msm)
library (polycor)
dim(data)
head(data)
str(data)
descript(data)
options(max.print=1000000)
rcor.test(data, method = "pearson")
data_2 <- data
data_2[] <- lapply(data_2, factor)
out <- grm(data_2)
out2 <- grm(data_2, constrained = TRUE)
anova(out2,out)
margins(out)
However, when I run margins(out) I get this error: Error in exp[ind] <- n * colSums(GHw * pp) : subscript out of bounds
Would someone please explain this? And how can I resolve this?
I have 35 items in my questionnaire and 576 responders. Here is is an example of the data (first 6 responders and first 6 items).
pespd_qa1 pespd_qa2 pespd_qa3 pespd_qa4 pespd_qa5 pespd_qa6
1 9 5 7 4 1 3
2 5 0 9 6 0 8
3 5 3 5 6 3 5
4 7 5 4 3 1 1
5 2 3 0 0 0 0
6 10 1 8 2 2 5
Suppose I have a nice little data frame
df <- data.frame(x=seq(1,5),y=seq(5,1),z=c(1,2,3,2,1),a=c(1,1,1,2,2))
df
## x y z a
## 1 1 5 1 1
## 2 2 4 2 1
## 3 3 3 3 1
## 4 4 2 2 2
## 5 5 1 1 2
and I want to aggregate a part of it:
aggregate(cbind(x,z)~a,FUN=sum,data=df)
## a x z
## 1 1 6 6
## 2 2 9 3
How do I go about making it programmatic? I want to pass:
The list of variables to be aggregated cbind(x,z)
The grouping variable a (I will be using it in several other parts of the program, so passing the whole thing cbind(x,z)~a is not helpful)
The environment within which the things are happening
My starting point is
blah <- function(varlist,groupvar,df) {
# I kinda like to see what I am doing here
cat(paste0(deparse(substitute(varlist)),"~",deparse(substitute(groupvar))),"\n")
cat(is.data.frame(df),"\n")
cat(dim(df),"\n")
# but I really need to aggregate this
return( aggregate(eval(deparse(substitute(varlist))~deparse(substitute(groupvar)),df),
FUN=sum,data=df) )
}
and it works halfway:
blah(cbind(x,z),a,df)
## [1] "cbind(x, z)~a"
## TRUE
## 5 4
## Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument
So I am kind of able to build the character representation of the formula that I need, but putting it into aggregate() fails.
I have a weighted directed graph with three strongly connected components(SCC).
The SCCs are obtained from the igraph::clusters function
library(igraph)
SCC<- clusters(graph, mode="strong")
SCC$membership
[1] 9 2 7 7 8 2 6 2 2 5 2 2 2 2 2 1 2 4 2 2 2 3 2 2 2 2 2 2 2 2
SCC$csize
[1] 1 21 1 1 1 1 2 1 1
SCC$no
[1] 9
I want to visualize the SCCs with circles and a colored background as the graph below, is there any ways to do this in R? Thanks!
Take a look at the mark.groups argument of plot.igraph. Something like the following will do the trick:
# Create some toy data
set.seed(1)
library(igraph)
graph <- erdos.renyi.game(20, 1/20)
# Do the clustering
SCC <- clusters(graph, mode="strong")
# Add colours and use the mark.group argument
V(graph)$color <- rainbow(SCC$no)[SCC$membership]
plot(graph, mark.groups = split(1:vcount(graph), SCC$membership))
I have the following code that perform hiearchical
clustering and plot them in heatmap.
library(gplots)
set.seed(538)
# generate data
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# the actual data is much larger that the above
# perform hiearchical clustering and plot heatmap
test <- heatmap.2(y)
Which plot this:
What I want to do is to get the cluster member from each hierarchy of in the plot
yielding:
Clust 1: g3-g2-g4
Clust 2: g2-g4
Clust 3: g4-g7
etc
Cluster last: g1-g2-g3-g4-g5-g6-g7-g8-g9-g10
Is there a way to do it?
I did have the answer, after all! #zkurtz identified the problem ... the data I was using were different than the data you were using. I added a set.seed(538) statement to your code to stabilize the data.
Use this code to create a matrix of cluster membership for the dendrogram of the rows using the following code:
cutree(as.hclust(test$rowDendrogram), 1:dim(y)[1])
This will give you:
1 2 3 4 5 6 7 8 9 10
g1 1 1 1 1 1 1 1 1 1 1
g2 1 2 2 2 2 2 2 2 2 2
g3 1 2 2 3 3 3 3 3 3 3
g4 1 2 2 2 2 2 2 2 2 4
g5 1 1 1 1 1 1 1 4 4 5
g6 1 2 3 4 4 4 4 5 5 6
g7 1 2 2 2 2 5 5 6 6 7
g8 1 2 3 4 5 6 6 7 7 8
g9 1 2 3 4 4 4 7 8 8 9
g10 1 2 3 4 5 6 6 7 9 10
This solution requires computing the cluster structure using a different packags:
# Generate data
y = matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# The new packags:
library(nnclust)
# Create the links between all pairs of points with
# squared euclidean distance less than threshold
links = nncluster(y, threshold = 2, fill = 1, give.up =1)
# Assign a cluster number to each point
clusters=clusterMember(links, outlier = FALSE)
# Display the points that are "alone" in their own cluster:
nas = which(is.na(clusters))
print(rownames(y)[nas])
clusters = clusters[-nas]
# For each cluster (with at least two points), display the included points
for(i in 1:max(clusters, na.rm = TRUE)) print(rownames(y)[clusters == i])
Obviously you would want to revise this into a function of some kind to be more user friendly. In particular, this gives the clusters at only one level of the dendrogram. To get the clusters at other levels, you would have to play with the threshold parameter.
I have some data:
Length(cm) Frequency
1 5
2 2
3 3
4 5
Is there a way to expand these numbers in R without typing them out manually, so I can work out the std error of the mean for length, so I have a dataset like:
1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
which I can then work on? Thanks
You can use rep.
> l <- 1:4
> f <- c(5,2,3,5)
> rep(l,f)
[1] 1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
In addition to using rep to replicate the observations you could also use the wtd.mean and wtd.var functions in the Hmisc package to compute the weighted summaries without expanding (this will be better if the expanded vector would take up a large portion of memory).
I recommend using a dataframe:
sd(rep(data$length, data$freq))