How to make a sequences from a range - r

I am trying to make a sequences from a range from the output of range.
> range(wines$quality)
[1] 3 8
> seq(3, 8)
[1] 3 4 5 6 7 8
> seq(range(wines$quality))
[1] 1 2
but I am trying to get put the output of range 3, 8 into seq to get the list of 3, 4, 5, 6, 7 ,8 why is giving me a list with 1 2? How do a make it behave as I want?

Another option:
do.call(seq, as.list(range(wines$quality)))
# [1] 3 4 5 6 7 8
You problem right now is you are passing a two element vector as one argument, when seq expects two one element arguments in order for it to do what you want.
do.call calls seq with each of the items in as.list... as an argument.

I am sure there is a fancier way to do it but why not just:
x <- range(wine$quality)
seq(x[1], x[2])

Some possible solutions, though the eval parse is more fooling around:
set.seed(10)
x <- rpois(20, 10)
y <- range(x); y[1]:y[2]
seq(y[1], y[2])
eval(parse(text = paste(range(x), collapse=":")))
## > y <- range(x); y[1]:y[2]
## [1] 5 6 7 8 9 10 11 12 13 14 15
## > seq(y[1], y[2])
## [1] 5 6 7 8 9 10 11 12 13 14 15
## > eval(parse(text = paste(range(x), collapse=":")))
## [1] 5 6 7 8 9 10 11 12 13 14 15

Related

How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

I have a column:
Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
I would like to split into multiple columns, based on the positions of the column values. For instance, I would like:
Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)
Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period.
You can use the base split to split this vector into vectors that are each 5 items long. You could also use a variable to store this interval length.
Using rep with each = 5, and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. Then split returns a list of vectors.
It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here.
Y <- 1:20
breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#>
#> $`2`
#> [1] 6 7 8 9 10
#>
#> $`3`
#> [1] 11 12 13 14 15
#>
#> $`4`
#> [1] 16 17 18 19 20
Created on 2019-02-12 by the reprex package (v0.2.1)
Not a dplyr solution, but I believe the easiest way would involve using matrices.
foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}
I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind()
We can make use of split (writing the commented code as solution) to split the vector into a list of vectors.
lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5
#$`2`
#[1] 6 7 8 9 10
#$`3`
#[1] 11 12 13 14 15
#$`4`
#[1] 16 17 18 19 20
Here, the gl create a grouping index by specifying the n, k and length parameters where n - an integer giving the number of levels, k - an integer giving the number of replications, and length -an integer giving the length of the result.
In our case, we want to have 'k' as 5.
as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
If we want to have multiple objects in the global environment, use list2env
list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1] 6 7 8 9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20
Or as the OP mentioned dplyr/tidyr in the question, we can use those packages as well
library(tidyverse)
tibble(Y) %>%
group_by(grp = (row_number()-1) %/% 5 + 1) %>%
summarise(Y = list(Y)) %>%
pull(Y)
#[[1]]
#[1] 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 10
#[[3]]
#[1] 11 12 13 14 15
#[[4]]
#[1] 16 17 18 19 20
data
Y <- 1:20

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

How can I add vector elements to corresponding vectors in lists?

I have a vectors of variable length in lists and a vector, somewhat like this:
set.seed(0)
x <- lapply(as.list(sample(1:10, 10, repl=TRUE)),
function(x) sample(1:10, x, repl=TRUE))
y <- sample(1:10, 10, repl=TRUE)
I need to add each element of y to a corresponding vector in x. Currently I accomplish this as so:
newList <- list()
for (i in seq_along(y)) {
newList <- c(newList, list(y[i] + x[[i]]))
}
> x[1:2]
[[1]]
[1] 1 3 2 7 4 8 5 8 10
[[2]]
[1] 4 8 10
> y[1:2]
[1] 4 8
> newList
[[1]]
[1] 5 7 6 11 8 12 9 12 14
[[2]]
[1] 12 16 18
[[3]]
[1] 13 17 12 13
...
Is there a better way, perhaps using a lapply-like function?
This is very similar to previous questions, which use Map or mapply to operate on two lists/vectors of the same length in tandem:
How do I apply an index vector over a list of vectors?
Add respective dataframes in list together in R
For this specific case, try:
Map("+",x,y)
#[[1]]
#[1] 5 7 6 11 8 12 9 12 14
#
#[[2]]
#[1] 12 16 18
#
#[[3]]
#[1] 13 17 12 13

Subset columns using logical vector

I have a dataframe that I want to drop those columns with NA's rate > 70% or there is dominant value taking over 99% of rows. How can I do that in R?
I find it easier to select rows with logic vector in subset function, but how can I do the similar for columns? For example, if I write:
isNARateLt70 <- function(column) {//some code}
apply(dataframe, 2, isNARateLt70)
Then how can I continue to use this vector to subset dataframe?
If you have a data.frame like
dd <- data.frame(matrix(rpois(7*4,10),ncol=7, dimnames=list(NULL,letters[1:7])))
# a b c d e f g
# 1 11 2 5 9 7 6 10
# 2 10 5 11 13 11 11 8
# 3 14 8 6 16 9 11 9
# 4 11 8 12 8 11 6 10
You can subset with a logical vector using one of
mycols<-c(T,F,F,T,F,F,T)
dd[mycols]
dd[, mycols]
There's really no need to write a function when we have colMeans (thanks #MrFlick for the advice to change from colSums()/nrow(), and shown at the bottom of this answer).
Here's how I would approach your function if you want to use sapply on it later.
> d <- data.frame(x = rep(NA, 5), y = c(1, NA, NA, 1, 1),
z = c(rep(NA, 3), 1, 2))
> isNARateLt70 <- function(x) mean(is.na(x)) <= 0.7
> sapply(d, isNARateLt70)
# x y z
# FALSE TRUE TRUE
Then, to subset with the above line your data using the above line of code, it's
> d[sapply(d, isNARateLt70)]
But as mentioned, colMeans works just the same,
> d[colMeans(is.na(d)) <= 0.7]
# y z
# 1 1 NA
# 2 NA NA
# 3 NA NA
# 4 1 1
# 5 1 2
Maybe this will help too. The 2 parameter in apply() means apply this function column wise on the data.frame cars.
> columns <- apply(cars, 2, function(x) {mean(x) > 10})
> columns
speed dist
TRUE TRUE
> cars[1:10, columns]
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17

Split a numeric vector into continuous chunks in R [duplicate]

This question already has answers here:
Collapse continuous integer runs to strings of ranges
(6 answers)
Closed 9 years ago.
If I have a numeric vector [1 2 3 4 7 8 9 10 15 16 17], how can I split it so that I have multiple vectors returned that separate the continuous elements of that vector? I.e. [1 2 3 4] [7 8 9 10] [15 16 17]. I've found an answer of how to do this in matlab, but I only use R.
Thanks.
Here's another alternative:
vec <- c( 1, 2, 3, 4, 7, 8, 9, 10, 15, 16, 17 )
split(vec, cumsum(seq_along(vec) %in% (which(diff(vec)>1)+1)))
# $`0`
# [1] 1 2 3 4
#
# $`1`
# [1] 7 8 9 10
#
# $`2`
# [1] 15 16 17
Another option:
split(vec, cummax(c(1,diff(vec))))
Result
$`1`
[1] 1 2 3 4
$`3`
[1] 7 8 9 10
$`5`
[1] 15 16 17

Resources