Make the sum of all the subtractions of a vector elements in R - r

Hello I am new to R and I can't find the way to do exactly what I want to. I have a vector of x numbers, and what i want to do is order it in increasing order, and then start making subtractions like this (let's say the vecto has 100 numbers for example):
[x(100)-x(99)]+[x(99)-x(98)]+[x(98)-x(97)]+[x(97)-x(96)]+...[x(2)-x(1)]
and then divide all that sum by the number of elements the vector has, in this case 100.
The only thing that I am able to do at the moment is order the vector with:
sort(nameOfTheVector)
Sorry for my bad English.

diff returns suitably lagged and iterated differences. In your case you want the default single lag. sum will return the sum any arguments passed to it, so....
sum(diff(sort(nameOfTheVector))) / length(nameOfTheVector)

Related

How can I randomly change the sign of numbers in a vector?

I have a long vector of numbers that vary in the their sign (e.g.):
data <- c(1,-23,67,-21,10,32,64,-34,-6,10)
Working in R, how do I create a new vector that contains the same list of numbers, but give them a random sign (either positive or negative)? For each number, the probability of it being negative should be 0.5.
There are a bunch of options but
sample(c(-1,1), size=length(data), replace=TRUE) * abs(data)
should work. You could also multiply by sign(runif(length(data))-0.5) or sign(runif(length(data),-1,1)) [either of which should be a little more efficient than sample(), although in this case it hardly matters].

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

Octave: Values inside a matrix that are close

I have a vector that is being filled with random numbers within this range [0,1]. I want to somehow accept only the vectors, in which an element inside of it has a maximum deviation of 0,02 from its previous one and its next one.
For example I have the below vector [3,1]. This is acceptable, because the deviation of the 2nd element, between the first and the third element is not bigger than 0,02. Vector is not always consisted of 3 rows, it could be more.
**Vector**
0.32957
0.33097
0.33946
This is what i thought:
n=4
P=rand(1,n);
sort(P,"ascend");
for L=2:n
while P(L-1)-P(L)>0.02
P=rand(1,n);
endwhile
endfor
Vectorize this!
isvalid=~any(diff(sort(a))>0.02);
sort(a) : if its not sorted, sort
diff() : take the difference between adjacent elements
___ >0.02: Check if any of those differences is bigger than what you accept
~any(): if any is bigger, then return zero, "not valid".
From your code, it seems that there may be more to the question than what you ask, you seem to have the XY problem. You want to create a random vector that has the properties that you describe. You seem to be using uniform random numbers, so let me propose a way to generate your vector where your conditions are always true.
a(1)=rand(1); %or any other way to generate a first value.
length=100; %desired length.
a(2:length)=rand(length-1,1)*0.02; %generate random numbers never bigger than 0.02
a=cumsum(a); %cumulative sum
This ensures the vector is increasing in value, and never increasing more than 0.02

Representing closeness among elements of a double vector

I have a double vector:
r = -50 + (50+50)*rand(10,1)
Now i want to ideally have all the numbers in the vector equal upto a tolerance of say 1e-4. I want to represent each r with a scalar say s(r) such that its value gives an idea of the quality of the vector. The vector is high quality if all elements in the vector are equal-like. I can easily run a for loop like
for i=1:10
for j=i+1:10
check equality upto the tolerance
end
end
But even then i cannot figure what computation to do inside the nested for loops to assign a scalar representing the quality . Is there a better way such that given any vector r length n, i can quickly calculate a scalar representing the quality of the vector.
Your double-loop algorithm is somewhat slow, of order O(n**2) where n is the number of dimensions of the vector. Here is a quick way to find the closeness of the vector elements, which can be done in order O(n), just one pass through the elements.
Find the maximum and the minimum of the vector elements. Just use two variables to store the maximum and minimum so far and run once through all the elements. The difference between the maximum and the minimum is called the range of the values, a commonly accepted measure of dispersion of the values. If the values are exactly equal, the range is zero which shows perfect quality. If the range is below 1e-4 then the vector is of acceptable quality. The bigger the range, the worse the equality.
The code is obvious for just about any given language, so I'll leave that to you. If the fact that the range only really considers the two extreme values of the vector bothers you, you could use other measures of variation such as the interquartile range, variance, or standard deviation. But the range seems to best fit what you request.

Identifying most frequent fractional numbers in vector

I have a vector that contains fractional numbers:
a<-c(0.5,0.5,0.3,0.5,0.2)
I would like to determine the most frequent (i.e. majority) number in the vector and return that number.
table(a) doesn't work because it will return the whole table. I want it to return only 0.5.
In case of ties I would like to choose randomly.
I have a function that does this for integers:
function(x){
a<-tabulate(x,nbins=max(x)); b<-which(a==max(a))
if (length(b)>1) {a<-sample(b,1)} else{b}
}
However, this won't work for fractions.
Can someone help?
You can use
names(which.max(table(a)))
If you want the numeric one as in your case, then coerce it to numeric
as.numeric(names(which.max(table(a))))
To randomize the tie case, you can add randomize the table
as.numeric(names(which.max(sample(table(a))))) #note this works only if length(unique(a)) > 1

Resources