I'm trying to build a Combinatorial Tree model, where the initial leaf is the first 6 digits. The 2nd level is all possible combinations of 5 digits of the parent's 6 digits. Then the 3rd level is all possible combinations of 4 digits of it's parent's digits. This pattern continues until 6th level, which is composed of only single digits.
So my question is there a way to generate a tree in this fashion? I've been searching for examples of basic trees in R and have wound up empty handed. Any advice would be much appreciated. Thank you
You can get something like that using this:
f <- function(x)
{
if(length(x)==1) return(c(value=x))
c(list(value=x), child=lapply(seq(x), function(i)f(x[-i])))
}
Example:
> f(1:3)
$value
[1] 1 2 3
$child1
$child1$value
[1] 2 3
$child1$child1
value
3
$child1$child2
value
2
$child2
$child2$value
[1] 1 3
$child2$child1
value
3
$child2$child2
value
1
$child3
$child3$value
[1] 1 2
$child3$child1
value
2
$child3$child2
value
1
Related
First I create a document term matrix like below
dtm <- DocumentTermMatrix(docs)
Then I take the sum of the occurance of each word vectors as below
totalsums <- colSums(as.matrix(dtm))
My totalsums (R says type 'double') looks like below for first 7 elements.
aaab aabb aabc aacc abbb abbc abcc ...
9 2 10 4 7 3 12 ...
I managed to sort this with the following command
sorted.sums <- sort(totalsums, decreasing=T)
Now I want to extract the first 4 terms/words with the highest sums which are greater than value 5.
I could get the first 4 highest with sorted.sums[1:4] but how can I set a threshold value?
I managed to do this with the order function like below but, is there a way to do this than sort function or without using findFreqTerms fucntion?
ord.totalsums <- order(totalsums)
findFreqTerms(dtm, lowfreq=5)
Appreciate your thoughts on this.
You can use
sorted.sums[sorted.sums > 5][1:4]
But if you have at least 4 values that are greater than 5 only using sorted.sums[1:4] should work as well.
To get the words you can use names.
names(sorted.sums[sorted.sums > 5][1:4])
I am running scripts for a project in Hidden Markov Model with 2 hiddens states at school. At some point, I use Viterbi's algrithm to find the most suitable sequences of hidden states. My output is a vector like that :
c("1","1","1","2","2","1", "1","1","1","2", "2","2")
I would like to count how many subsequences of each states there is, and also record their length and positions. The output would be, for example, a matrx like that:
State Length Starting_Position
1 3 1
2 2 4
1 4 6
2 3 10
Is there any R command or package who can do that easily ?
Thank you.
I've been working on this problem and can't seem to figure out the proper solution. Ultimately I'm going to use dplyr in order to group by and apply a function to a column. I turned the column into a vector. Here is a snippet:
vec1 <- append(append(append(rep(1,3),rep(2,6)), rep(3,5)),rep(4,2))
if the number repeats more than 3 times, I want to change the following number to 1. So in the vector above, the number 2 occurs 6 times and the number 3 occurs 5 times. That means I want to replace the number 3 and 4 with 1. Ultimately in this snippet, the answer I'm looking for is:
c(1,1,1,2,2,2,2,2,2,1,1,1,1,1,1,1)
What I have below worked for cases when only one number was repeated more than 3 times, but not multiple. In addition, if I'm doing this inefficiently I'd like to learn how to better script it.
stack <- table(vec1)
stack1 <- list(as.numeric(rownames(data.frame(stack[stack>3]))) + 1)
replace(vec1,vec1 == stack1,1)
thanks in advance for any help
Try
inverse.rle(within.list(rle(vec1),
values[c(FALSE,(lengths >3)[-length(lengths)])] <- 1))
#[1] 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1
I have a vector of numbers stored in R.
my_vector <- c(1,2,3,4,5)
I want to add two to each number.
my_vector + 2
[1] 3 4 5 6 7
However, I want there to only be a twenty percent chance of adding two to the numbers in my vector each time I run the code. Is there a way to code this in R?
What I mean is, if I run the code, the output could be:
[1] 3 4 5 6 9
Or perhaps
[1] 5 4 5 6 7
i.e. there is only a 20% chance that any one number in the vector will get two added to it.
myvector + 2*sample(c(TRUE,FALSE), length(myvector), prob=c(0.2,0.8), repl=TRUE)
That will give a variable number of 2's to be added (which is what you were asking) but sometimes people want to know that exactly 20% will have a 2 added in whoch case it would be:
myvector + 2*sample(c(TRUE,rep(FALSE,4)))
Users
I have a distance matrix dMat and want to find the 5 nearest samples to the first one. What function can I use in R? I know how to find the closest sample (cf. 3rd line of code), but can't figure out how to get the other 4 samples.
The code:
Mat <- replicate(10, rnorm(10))
dMat <- as.matrix(dist(Mat))
which(dMat[,1]==min(dMat[,1]))
The 3rd line of code finds the index of the closest sample to the first sample.
Thanks for any help!
Best,
Chega
You can use order to do this:
head(order(dMat[-1,1]),5)+1
[1] 10 3 4 8 6
Note that I removed the first one, as you presumably don't want to include the fact that your reference point is 0 distance away from itself.
Alternative using sort:
sort(dMat[,1], index.return = TRUE)$ix[1:6]
It would be nice to add a set.seed(.) when using random numbers in matrix so that we could show the results are identical. I will skip the results here.
Edit (correct solution): The above solution will only work if the first element is always the smallest! Here's the correct solution that will always give the 5 closest values to the first element of the column:
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
Example:
> dMat <- matrix(c(70,4,2,1,6,80,90,100,3), ncol=1)
# James' solution
> head(order(dMat[-1,1]),5) + 1
[1] 4 3 9 2 5 # values are 1,2,3,4,6 (wrong)
# old sort solution
> sort(dMat[,1], index.return = TRUE)$ix[1:6]
[1] 4 3 9 2 5 1 # values are 1,2,3,4,6,70 (wrong)
# Correct solution
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
[1] 6 7 8 5 2 # values are 80,90,100,6,4 (right)