When you want to iterate sequentially over a list of numbers from 1 to N in Julia you will write:
for i in 1:N
# do something with i
end
But what if you want to iterate over the list of numbers from the range (1...N) randomly? There is a need in every iteration to randomly choose the number that wasn't chosen in any previous iteration and there is a need to iterate over all of the numbers from the range (1...N).
using Random
for i in shuffle(1:N)
# do something with i
end
Related
I am supposed to find the mean and standard deviation at each given sample size (N), using the "FOR LOOP". I started writing the code as below, I am required to save all the means into vector "p". How do I save all the means into one vector?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( N in sample.sizes ){
x <- rnorm(3,mean=0,sd=1)
mean.sds[i]
}
mean(x)
Actually you are doing many thing wrong?
If you are using variable N in for loop, you are not using it anywhere
for (N in 'some_vector') actually means N will take that value one by one. So N in sample sizes will first take, 3 then 10 then 50 and so on.
Now where does i come into picture?
You are calculating x for each iteration of N. In fact you are not using N anywhere in the loop?
first x will return 3 values. In the next line you intend to store these three values in just ith value of mean.sds where i is unknown and storing three values into one value, as it is, is not logically possible.
Do you want this?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( i in seq_along(sample.sizes )){
x <- rnorm(sample.sizes[i], mean=0, sd=1)
mean.sds[i] <- mean(x)
}
mean.sds
[1] 0.6085489531 -0.1547286299 0.0052106559 -0.0452804986 -0.0374094936 0.0005667246
I replaced N with seq_along(sample.sizes) which will give iterations equal to the number of that vector. Six in this example.
I passed each ith element to first argument of rnorm to generate these many random values.
Stored each random value into single vector. calculated its mean (one value only) and stored in ith value of your empty vector.
I am dealing with some computational feature extracting problem from RNA data, and I found myself unable to deal with this question:
I have n sequences (say two for example) from which I obtained an iterated statistic i times (kind of doing a Monte Carlo iteration for analizing distribution of obtained statistics compared with original).
Example:
Say we iterate 10 times
n <- 10
I got a vector of 20 values with all the iterations, but this vector corresponds to two different sequences, so I must divide this vector in two equal parts (the iterations are ordered 1:10 - 1:10 for each sequence).
MFEit <- c(10, 12, 34, 32, 12 .....) ## vector of length 20
MFEit.split <- split(MFEit, ceiling(MFEit.along/n5))
This generates a list of two items each with 10 values, named $1 and $2
On the other hand I have a vector of two values which are the original statistics, each corresponding to each original sequence
MFE <- c(25, 15)
What I want to do is to know how many values of first item in the list MFEit.split, are equal or less than the first value of MFE, and, iteratively, how many values of second item in the list MFEit.split, are equal or less than the second value of MFE, and so on, provided that I would have more than two values or items.
I know how to do it one by one, say:
R <- length(subset(MFEit.split$`1`, MFEit.split$`1`<=MFE[1]))
R <- length(subset(MFEit.split$`2`, MFEit.split$`1`<=MFE[2]))
But... how to include this into a loop so that I can get iteratively each comparison, no matter how many MFE values or items in the list I have?
The desired output would be a vector called R, with n values corresponding to each comparison.
Any help?...
Lets say I have this data. My objective is to extraxt combinations of sequences.
I have one constraint, the time between two events may not be more than 5, lets call this maxGap.
User <- c(rep(1,3)) # One users
Event <- c("C","B","C") # Say this is random events could be anything from LETTERS[1:4]
Time <- c(c(1,12,13)) # This is a timeline
df <- data.frame(User=User,
Event=Event,
Time=Time)
If want to use these sequences as binary explanatory variables for analysis.
Given this dataframe the result should be like this.
res.df <- data.frame(User=1,
C=1,
B=1,
CB=0,
BC=1,
CBC=0)
(CB) and (CBC) will be 0 since the maxGap > 5.
I was trying to write a function for this using many for-loops, but it becomes very complex if the sequence becomes larger and the different number of evets also becomes larger. And also if the number of different User grows to 100 000.
Is it possible of doing this in TraMineR with the help of seqeconstraint?
Here is how you would do that with TraMineR
df.seqe <- seqecreate(id=df$User, timestamp=df$Time, event=df$Event)
constr <- seqeconstraint(maxGap=5)
subseq <- seqefsub(df.seqe, minSupport=0, constraint=constr)
(presence <- seqeapplysub(subseq, method="presence"))
which gives
(B) (B)-(C) (C)
1-(C)-11-(B)-1-(C) 1 1 1
presence is a table with a column for each subsequence that occurs at least once in the data set. So, if you have several individuals (event sequences), the table will have one row per individual and the columns will be the binary variable you are looking for. (See also TraMineR: Can I get the complete sequence if I give an event sub sequence? )
However, be aware that TraMineR works fine only with subsequences of length up to about 4 or 5. We suggest to set maxK=3 or 4 in seqefsub. The number of individuals should not be a problem, nor should the number of different possible events (the alphabet) as long as you restrict the maximal subsequence length you are looking for.
Hope this helps
Hello I am new to R and I can't find the way to do exactly what I want to. I have a vector of x numbers, and what i want to do is order it in increasing order, and then start making subtractions like this (let's say the vecto has 100 numbers for example):
[x(100)-x(99)]+[x(99)-x(98)]+[x(98)-x(97)]+[x(97)-x(96)]+...[x(2)-x(1)]
and then divide all that sum by the number of elements the vector has, in this case 100.
The only thing that I am able to do at the moment is order the vector with:
sort(nameOfTheVector)
Sorry for my bad English.
diff returns suitably lagged and iterated differences. In your case you want the default single lag. sum will return the sum any arguments passed to it, so....
sum(diff(sort(nameOfTheVector))) / length(nameOfTheVector)
I have 2 questions,
I've made a vector from a document by finding out how many times each word appeared in a document. Is this the right way of making the vector? Or do I have to do something else also?
Using the above method I've created vectors of 16 documents, which are of different sizes. Now i want to apply cosine similarity to find out how similar each document is. The problem I'm having is getting the dot product of two vectors because they are of different sizes. How would i do this?
Sounds reasonable, as long as it means you have a list/map/dict/hash of (word, count) pairs as your vector representation.
You should pretend that you have zero values for the words that do not occur in some vector, without storing these zeros anywhere. Then, you can use the following algorithm to compute the dot product of these vectors (pseudocode):
algorithm dot_product(a : WordVector, b : WordVector):
dot = 0
for word, x in a do
y = lookup(word, b)
dot += x * y
return dot
The lookup part can be anything, but for speed, I'd use hashtables as the vector representation (e.g. Python's dict).