More than one value for "each" argument in "rep" function? - r

How to assign more than one value for "each" argument in "rep" function in R?
A trivial example, where each value in a vector is 3-times repeated in a row:
a <- seq(2,6,2)
rep (a,each = 3)
However, if I add more than one value in "each" argument in order to change the number of repetition of each value, it doesn't work properly:
rep (a, each = c(2,4,7))
How to solve it? Thank you in advance.

Depending on what you think the output should be, I'm guessing you want the times= parameter:
rep (a, times = c(2, 4, 7))
# [1] 2 2 4 4 4 4 6 6 6 6 6 6 6
See ?rep for the difference

Related

Find a vector given certain criteria in R

I was searching online if it is possible to create a vector given certain conditions, such as it must contain 2 and 6 but not 5 and 1, also that it is in a specific range (2 000 000-4 999 999), and also that it must be even.
I have genuinely no idea about how to give these commands to R even if I know the basic functions to create a vector.
Thanks in advance for your time and for the big help
You can try the code below
# create a sequence from 2000000 to 4999999
v <- 2e6:(5e6 - 1)
# filter the sequence with given criteria
v[grepl("(2.*6)|(6.*2)", v) & !grepl("(1.*5)|(5.*1)", v)]
You can create it using "seq" function.
seq(from = 2, to = 7, by = 2)
#> [1] 2 4 6
Then use "setdiff" function to remove specific values you dont need.
remove <- c(2)
#> a
[1] 2 4 6
#> setdiff(a, remove)
[1] 4 6

R -- mean function of column section

I am trying to include a mean calculation as part of a larger code. The idea is to calculate the mean from a series of values within a column, but not all the column.
For example, from column_x (10 entries) in yFile, calculate the mean of the last 4 values:
column_x
1
5
8
3
0
3
3
7
9
9
Result = 7
This is what I've got:
avg_subx <- mean(yFile$column_x, 7:10, trim = 0, na.rm = FALSE)
But for some reason, the result I am getting back is not the correct value.
Could you help me finding out where I'm going wrong?
Thanks!
have you tried with tail function? With tail you can select the last n values of a data frame or a vector.
example:
avg_subx <- mean(tail(yFile$column_x,4))
In this case you're selecting the las 4 values.
Hope this can help you!

Create vector by given distibution of values

Let's say I have a vector a = (1,3,4).
I want to create new vector with integer numbers in range [1,length(a)]. But the i-th number should appear a[i] times.
For the vector a I want to get:
(1,2,2,2,3,3,3,3)
Would you explain me how to implement this operation without several messy concatenations?
You can try rep
rep(seq_along(a), a)
#[1] 1 2 2 2 3 3 3 3
data
a <- c(1,3,4)

Using cut2 from Hmisc to calculate cuts for different number of groups

I was trying to calculate equal quantile cuts for a vector by using cut2 from Hmisc.
library(Hmisc)
c <- c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,
-1.83124,-1.74953,-1.74858,-0.63265,-0.59626,-0.5681)
cut2(c, g=3, onlycuts=TRUE)
[1] -4.18304 -2.01892 -1.74858 -0.56810
But I was expecting the following result (33%, 33%, 33%):
[1] -4.18304 -2.13478 -1.74858 -0.56810
Should I still use cut2 or try something different? How can I make it work? Thanks for your advice.
You are seeing the cutpoints, but you want the tabular counts, and you want them as fractions of the total, so do this instead:
> prop.table(table(cut2(c, g=3) ) )
[-4.18,-2.019) [-2.02,-1.749) [-1.75,-0.568]
0.3846154 0.3076923 0.3076923
(Obviously you cannot expect cut2 to create an exact split when the count of elements was not evenly divisible by 3.)
It seems that there were accidentally thirteen values in the original data set, instead of twelve. Thirteen values cannot be equally divided into three quantile groups (as mentioned by BondedDust). Here is the original problem, except that one selected data value (-1.74953) is excluded, making it twelve values. This gives the result originally expected:
library(Hmisc)
c<-c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,-1.83124,-1.74858,-0.63265,-0.59626,-0.5681)
cut2(c, g=3,onlycuts=TRUE)
#[1] -4.18304 -2.13478 -1.74953 -0.5681
To make it clearer to anyone not familiar with cut2 from the Hmisc package (like me as of this morning), here's a similar problem, except that we'll use the integers 1 through 12 (assigned to the vector dozen_values).
library(Hmisc)
dozen_values <-1:12
quantile_groups <- cut2(dozen_values,g=3)
levels(quantile_groups)
## [1] "[1, 5)" "[5, 9)" "[9,12]"
cutpoints <- cut2(dozen_values, g=3, onlycuts=TRUE)
cutpoints
## [1] 1 5 9 12
# Show which values belong to which quantile group, using a data frame
quantile_DF <- data.frame(dozen_values, quantile_groups)
names(quantile_DF) <- c("value", "quantile_group")
quantile_DF
## value quantile_group
## 1 1 [1, 5)
## 2 2 [1, 5)
## 3 3 [1, 5)
## 4 4 [1, 5)
## 5 5 [5, 9)
## 6 6 [5, 9)
## 7 7 [5, 9)
## 8 8 [5, 9)
## 9 9 [9,12]
## 10 10 [9,12]
## 11 11 [9,12]
## 12 12 [9,12]
Notice that, the first quantile group includes everything up to, but not including, 5 (i.e. 1 thorough 4, in this case). The second quantile group contains 5 up to, but not including, 9 (i.e. 5 through 8, in this case). The third (last) quantile group contains 9 through 12, which includes the last value 12. Unlike the other quantile groups, the third quantile group includes the last value shown.
Anyway, you can see that the "cutpoints" 1, 5, 9, and 12 describe the start and end points of the quantile groups in the most concise way, but it is obtuse without reading relevant documentation (link to single page Inside-R site, instead of the almost 400 page PDF manual).
See this explanation about the parentheses vs square bracket notation, if it is unfamiliar to you.

Determining minimum values in a vector in R

I need some help in determining more than one minimum value in a vector. Let's suppose, I have a vector x:
x<-c(1,10,2, 4, 100, 3)
and would like to determine the indexes of the smallest 3 elements, i.e. 1, 2 and 3. I need the indexes of because I will be using the indexes to access the corresponding elements in another vector. Of course, sorting will provide the minimum values but I want to know the indexes of their actual occurrence prior to sorting.
In order to find the index try this
which(x %in% sort(x)[1:3]) # this gives you and index vector
[1] 1 3 6
This says that the first, third and sixth elements are the first three lowest values in your vector, to see which values these are try:
x[ which(x %in% sort(x)[1:3])] # this gives the vector of values
[1] 1 2 3
or just
x[c(1,3,6)]
[1] 1 2 3
If you have any duplicated value you may want to select unique values first and then sort them in order to find the index, just like this (Suggested by #Jeffrey Evans in his answer)
which(x %in% sort(unique(x))[1:3])
I think you mean you want to know what are the indices of the bottom 3 elements? In that case you want order(x)[1:3]
You can use unique to account for duplicate minimum values.
x<-c(1,10,2,4,100,3,1)
which(x %in% sort(unique(x))[1:3])
Here's another way with rank that includes duplicates.
x <- c(x, 3)
# [1] 1 10 2 4 100 3 3
which(rank(x, ties.method='min') <= 3)
# [1] 1 3 6 7

Resources