Proportion of dataset equal to a value - r

I have the following dataset called asteroids
3 4 3 3 1 4 1 3 2 3
1 1 4 2 3 3 2 6 1 1
3 3 2 2 2 2 1 3 2 1
6 1 3 2 2 1 2 2 4 2
I need to find out what proportion of this dataset is 1.

If you have a specific value in mind you can just do an equality comparison and then use mean on the resulting logical vector.
> asteroids <- scan(what=numeric())
1: 3 4 3 3 1 4 1 3 2 3 1 1 4 2 3 3 2 6 1 1 3 3 2 2 2 2 1 3 2 1 6 1 3 2 2 1 2 2 4 2
41:
Read 40 items
> mean(asteroids == 1)
[1] 0.25
This works since the equality comparison will give TRUE and FALSE and when T/F are coerced numerically they become 1s and 0s so mean ends up giving us the proportion of TRUEs.
I assumed asteroids was a vector. You don't specify in your question but if it's a different type of structure you'll probably need to coerce it into a vector in some way or another.

Assuming that 'asteroids' is a data.frame, unlist it, get the table and find the proportion with prop.table.
prop.table(table(unlist(asteroids)==1))
# FALSE TRUE
# 0.75 0.25
Or as #Richard Scriven mentioned, we can convert the data.frame to a logical matrix, and use table directly on it as 'matrix' is a vector with dim attributes.
prop.table(table(asteroids == 1))

Related

R generate number (Id) along sequence using two different vectors, lapply?

I have two vectors, which are basically starting and ending Row indices.
I want to group them using this vectors.
Example
a<-c(1,4,7,12)
b<-c(3,6,11,15)
my output vector should be
d <- c(1,1,1,2,2,2,3,3,3,3,3,4,4,4,4)
You can use rep to repeat value b-a times.
rep(seq_along(a), (b - a) + 1)
#[1] 1 1 1 2 2 2 3 3 3 3 3 4 4 4 4
Will this work:
> rep(1:length(a), c(b[1],diff(b)))
[1] 1 1 1 2 2 2 3 3 3 3 3 4 4 4 4
>
We can use
rep(seq_len(length(a)), (b - a) + 1)
#[1] 1 1 1 2 2 2 3 3 3 3 3 4 4 4 4

Retain Max Value of Vector until vector Catches up

I have some cumulative count data. Because of reporting innacuracies, sometimes the cumulative sum decreases such as 0 1 2 2 3 3 2 4 5.
I would like to created a new vector that retains the largest value reported and carries it forward until the cumulative count data catches up. So the corrected version of the above would be 0 1 2 2 3 3 3 4 5
I tried the following
mydf <- data.frame(ts1 = c(0,1,1,1,2,3,2,2,3,4,4,5))
mydf$lag1 <- lag(mydf[,1])
mydf$corrected <- ifelse(is.na(mydf[,2]),mydf[,1],
ifelse(mydf[,2] > mydf[,1], mydf[,2], mydf[,1]))
which returns:
ts1 lag1 corrected
1 0 NA 0
2 1 0 1
3 1 1 1
4 1 1 1
5 2 1 2
6 3 2 3
7 2 3 3
8 2 2 2
9 3 2 3
10 4 3 4
11 4 4 4
12 5 4 5
This worked for the case of the first time that the next value was smaller than the previous value(line7) but it fails for the second time(line 8).
I thought there must be a better way of doing this. New Vector that is equal to input vector unless value decreases in which case it retains prior value until input vector exceeds that retained value.
You are looking for cummax :
cummax(mydf$ts1)
#[1] 0 1 1 1 2 3 3 3 3 4 4 5

rep and/or seq function to create continuously reducing vector?

Suppose I have a vector from 1 to 5,
a<-c(1:5)
What I need to do is to repeat the vector by losing one element continuously. That is, the final outcome should be like
1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
We can reverse the vector and apply sequence
sequence(rev(a))
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
Or another option is toeplitz
m1 <- toeplitz(a)
m1[lower.tri(m1, diag=TRUE)]
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1

R merge matrices with function

I would like to merge two matrices with different length on their incommon row.names with a function:
My first matrix (T) looks similar to this:
1 2 3 4
1 -4 3 2 2
1 2 1 1 5
2 3 -2 4 6
2 -2 1 -1 -9
Now I want to join this function into my new matrix (M), however in this matrix there should be only the colsum of the matching rows which are >=0 plus 1:
1 2 3 4
1 2 3 3 3
2 2 2 2 2
I tried following formula, which I found here in the forum, however it does not work:
merge.default(as.data.frame(M), as.data.frame(T), by = "row.names", function(x){colSums(T[,]>0)+1})
Do you have an idea, where my mistake is?
Thank you very much
EDIT: my desired output would be my Matrix T, which is at the moment empty:
T now:
1 2 3 4
1
2
T after merge which is now filled with the function:
colsums(T[,] >=0)+1
1 2 3 4
1 2 3 3 3
2 2 2 2 2
T[1,1]= 2 as there is 1 value in Matrix M which is >=0 and then I add 1 to it
T[2,1]= 3 : two values >=0 and plus 1

Recoding an arbitrary grouping variable or factor in R

Suppose I have a vector or column of arbitrary length representing some grouping/factor variable with an arbitrary number of groups and arbitrary values for same along the lines of this:
a <- c(2,2,2,2,2,7,7,7,7,10,10,10,10,10)
a
[1] 2 2 2 2 2 7 7 7 7 10 10 10 10 10
How would I most easily turn that into this:
a
[1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3
a <- c(2,2,2,2,2,7,7,7,7,10,10,10,10,10)
c(factor(a))
#[1] 1 1 1 1 1 2 2 2 2 3 3 3 3 3
Explanation:
A factor is just an integer vector with levels attribute and a class attribute. c removes attributes as a side effect. You could use as.numeric or as.integer instead of c with similar or the same results, respectively.

Resources