rle command counting changes in vector

rle command counting changes in vector - r

n <- length(rle(sign(z)))
z contains 1 and -1. n should indicate the number of how many times the sign of z changes.
The code above does not lead to the desired outcome. If I expand the command to
length(rle(sign(z))[[1]])
it works. I don't understand the underlying mechanism of how [[1]] solves the problem?

rle returns a list consisting of two components: lengths, and values. As such, its own length is always 2. By contrast, you want to know the length of either of those components (they obviously have the same length). So either length(rle(…)[[1]]) or length(rle(…)[[2]]) would work. Better to use the names instead of an index though, e.g.
length(rle(z)$lengths)
However, this won’t be the number of times the sign changes; rather, it will be the number of times the changes plus 1.

Related

Julia Langauge Differential Equations example

I tried playing around with this example in the Julia documentation. My attempt was to make the cell split into two parts that have half of the amount of protein each, so I set Theta=0.5. However, the plot looks like this:
It is obvious that the number of cells doubles every time they hit the target amount of protein, at the same time, since they are equal. How could I plot this? I also don't understand why the number of cells stops at 3 in the case below.

Plot the protein amount in each cell and think about the model you've created. After the first division, both cells have the same value. So at exactly the same time, you have an event fire. The "maximum" (whichever index is lower, so 1) will split, while 2 will keep growing above 1. But now that u[2] > 1, the rootfinding condition 1-maximum(u) will never hit zero again, and thus no more splits will occur. This means you'll have two splits total, i.e. 3 cells.
Remember, programs will do exactly what you tell them to. I assume that what you meant was, as your effect, split any cells that are greater than or equal to 1. If that's the affect! that you wanted, then you'd have to write it:
function affect!(integrator)
u = integrator.u
idxs = findall(x->x>=1-eps(eltype(u)),u)
resize!(integrator,length(u)+length(idxs))
u[idxs] ./ 2
u[end-idxs:end] = 0.5
nothing
end
would be one way to do it, and of course there are many others.

Matrice help: Finding average without the zeros

I'm creating a Monte Carlo model using R. My model creates matrices that are filled with either zeros or values that fall within the constraints. I'm running a couple hundred thousand n values thru my model, and I want to find the average of the non zero matrices that I've created. I'm guessing I can do something in the last section.
Thanks for the help!
Code:
n<-252500
PaidLoss_1<-numeric(n)
PaidLoss_2<-numeric(n)
PaidLoss_3<-numeric(n)
PaidLoss_4<-numeric(n)
PaidLoss_5<-numeric(n)
PaidLoss_6<-numeric(n)
PaidLoss_7<-numeric(n)
PaidLoss_8<-numeric(n)
PaidLoss_9<-numeric(n)
for(i in 1:n){
claim_type<-rmultinom(1,1,c(0.00166439057698873, 0.000810856947763742, 0.00183509730283373, 0.000725503584841243, 0.00405428473881871, 0.00725503584841243, 0.0100290201433936, 0.00529190850119495, 0.0103277569136224, 0.0096449300102424, 0.00375554796858996, 0.00806589279617617, 0.00776715602594742, 0.000768180266302492, 0.00405428473881871, 0.00226186411744623, 0.00354216456128371, 0.00277398429498122, 0.000682826903379993))
claim_type<-which(claim_type==1)
claim_Amanda<-runif(1, min=34115, max=2158707.51)
claim_Bob<-runif(1, min=16443, max=413150.50)
claim_Claire<-runif(1, min=30607.50, max=1341330.97)
claim_Doug<-runif(1, min=17554.20, max=969871)
if(claim_type==1){PaidLoss_1[i]<-1*claim_Amanda}
if(claim_type==2){PaidLoss_2[i]<-0*claim_Amanda}
if(claim_type==3){PaidLoss_3[i]<-1* claim_Bob}
if(claim_type==4){PaidLoss_4[i]<-0* claim_Bob}
if(claim_type==5){PaidLoss_5[i]<-1* claim_Claire}
if(claim_type==6){PaidLoss_6[i]<-0* claim_Claire}
}
PaidLoss1<-sum(PaidLoss_1)/2525
PaidLoss3<-sum(PaidLoss_3)/2525
PaidLoss5<-sum(PaidLoss_5)/2525
PaidLoss7<-sum(PaidLoss_7)/2525
partial output of my numeric matrix

First, let me make sure I've wrapped my head around what you want to do: you have several columns -- in your example, PaidLoss_1, ..., PaidLoss_9, which have many entries. Some of these entries are 0, and you'd like to take the average (within each column) of the entries that are not zero. Did I get that right?
If so:
Comment 1: At the very end of your code, you might want to avoid using sum and dividing by a number to get the mean you want. It obviously works, but it opens you up to a risk: if you ever change the value of n at the top, then in the best case scenario you have to edit several lines down below, and in the worst case scenario you forget to do that. So, I'd suggest something more like mean(PaidLoss_1) to get your mean.
Right now, you have n as 252500, and your denominator at the end is 2525, which has the effect of inflating your mean by a factor of 100. Maybe that's what you wanted; if so, I'd recommend mean(PaidLoss_1) * 100 for the same reasons as above.
Comment 2: You can do what you want via subsetting. Take a smaller example as a demonstration:
test <- c(10, 0, 10, 0, 10, 0)
mean(test) # gives 5
test!=0 # a vector of TRUE/FALSE for which are nonzero
test[test!=0] # the subset of test which we found to be nonzero
mean(test[test!=0]) # gives 10, the average of the nonzero entries
The middle three lines are just for demonstration; the only necessary lines to do what you want are the first (to declare the vector) and the last (to get the mean). So your code should be something like PaidLoss1 <- mean(PaidLoss_1[PaidLoss_1 != 0]), or perhaps that times 100.
Comment 3: You might consider organizing your stuff into a dataframe. Instead of typing PaidLoss_1, PaidLoss_2, etc., it might make sense to organize all this PaidLoss stuff into a matrix. You could then access elements of the matrix with [ , ] indexing. This would be useful because it would clean up some of the code and prevent you from having to type lots of things; you could also then make use of things like the apply() family of functions to save you from having to type the same commands over and over for different columns (such as the mean). You could also use a dataframe or something else to organize it, but having some structure would make your life easier.
(And to be super clear, your code is exactly what my code looked like when I first started writing in R. You can decide if it's worth pursuing some of that optimization; it probably just depends how much time you plan to eventually spend in R.)

closed/fixed:Interpertation of basic R code

I have a basic question in regards to the R programming language.
I'm at a beginners level and I wish to understand the meaning behind two lines of code I found online in order to gain a better understanding. Here is the code:
as.data.frame(y[1:(n-k)])
as.data.frame(y[(k+1):n])
... where y and n are given. I do understand that the results are transformed into a data frame by the function as.data.frame() but what about the rest? I'm still at a beginners level so pardon me if this question is off-topic or irrelevant in this forum. Thank you in advance, I appreciate every answer :)

Looks like you understand the as.data.frame() function so let's look at what is happening inside of it. We're looking at y[1:(n-k)]. Here, y is a vector which is a collection of data points of the same type. For example:
> y <- c(1,2,3,4,5,6)
Try running that and then calling back y. What you get are those numbers listed out. Now, consider the case you want to just call out the number 1 in that vector. How would you do that? Well, this is where the brackets come into play. If you wanted to just call the number 1 in y:
> y[1]
[1] 1
Therefore, the brackets are a way of calling out or indexing specific items in the vector. Note that the indexing starts at the value 1 and goes up to the number of items in the vector, or length. One last thing before we go back to the example you gave. What if we want to index the numbers 1, 2, and 3 from the vector but not the rest?
> y[1:3]
[1] 1 2 3
This is where the colon comes into play. It allows us to reference a subset of the numbers. However, it will reference all the numbers between the index left of the colon and right of it. Try this out for yourself in R! Play around and see what happens.
Finally going back to your example:
y[1:(n-k)]
How would this work based on what we discussed? Well, the colon means that we are indexing all values in the vector y from two index values. What are those values? Well, they are the numbers to the left and right of the colon. Therefore, we are asking R to give us the values from the first position (index of 1) to the (n-k) position. Therefore, it's important to know what n and k are. If n is 4 and k is 1 then the command becomes:
y[1:3]
The same logic can apply to the second as.data.frame() command in your question. Essentially, R is picking out different numbers from a vector y and multiplying them together.
Hope this helps. The best way to learn R is to play around with a command, throw different numbers at it, guess what will happen, and then see what happens!

Compute nearly equal pattern of a string

Find near duplicate string. Hi, I know there is a match, unique, duplicated function in R, but none of these does wha I'm really need. I've a unique column in my dataset that I need to go trough it to check if the number are nearly the same. For instance, the first element compared with the second has nearly equal pattern, except for the number '9'. The second compared with the third is nearly equal, except for the last number o the sequence, one is ending with 6 while other ending with 5. Lastly, the two last numbers are 100% equal. If I've used unique() function, only the last case would be correctly excluded.
I'm wondering if there is a function that I can flag nearly equal, maybe calculating the percentage of equality, so I can drive my attention to those cases with highly equality rate.
dat <- data.frame(text = c("87775956",
"987775956",
"987775955",
"987481732",
"987481732"))

Stopping a large number of zeros being printed (not scientific notation)

What I'm trying to achieve is to have all printed numbers display at maximum 7 digits. Here are examples of what I want printed:
0.000000 (versus the actual number which is 0.000000000029481.....)
0.299180 (versus the actual number which is 0.299180291884922.....)
I've had success with the latter types of numbers by using options(scipen=99999) and options(digits=6). However, the former example will always print a huge number of zeros followed by five non-zero digits. How do I stop this from occurring and achieve my desired result? I also do not want scientific notation.
I want this to apply to ALL printed numbers in EVERY context. For example if I have some matrix, call it A, and I print this matrix, I want every element to just be 6-7 digits. I want this to be automatic for every print in every context; just like using options(digits=6) and options(scipen=99999) makes it automatic for every context.

You can define a new print method for the type you wish to print. For example, if all your numbers are doubles, you can create
print.double=function(x){sprintf("%.6f", x)}
Now, when you print a double (or a vector of doubles), the function print.double() will be called instead of print.default().
You may have to create similar functions print.integer(), print.complex(), etc., depending on the types you need to print.
To return to the default print method, simply delete the function print.double().

Are all your numbers < 1? You could try a simple sprintf( "%.6f", x ). Otherwise you could try wrapping things to sprintf based on the number of digits; check ?sprintf for other details.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

rle command counting changes in vector - r

Related

Julia Langauge Differential Equations example

Matrice help: Finding average without the zeros

closed/fixed:Interpertation of basic R code

Compute nearly equal pattern of a string

Stopping a large number of zeros being printed (not scientific notation)

Categories

Resources