How to generate sequence with exclusions in R - r

I need 4 functions that generate some numbers (each)
First function generates sequence from n odd numbers except 5, 15, 25, etc...
example with n=2: 1, 1, 3, 3, 7, 7, 9, 9, 11, 11, 13, 13, 17, 17,...
Second function generates sequence from n even numbers except 10, 20, 30, etc...
example witn n=2: 2, 2, 4, 4, 6, 6, 8, 8, 12, 12, 14, 14, 16, 16,...
Third function generates sequence from n numbers from 5 by 10
example witn n=2: 5, 5, 15, 15, 25, 25,...
Fourth function generates sequence from n numbers from 10 by 10
example witn n=2: 10, 10, 20, 20, 30, 30,...
Each function has to get vector 1: N and n as inputs.
For example,
f1(1:10, 3)
> 1, 1, 1, 3, 3, 3, 7, 7, 7, 9
f2(1:5, 10)
> 2, 2, 2, 2, 2
f3(1:15, 5)
> 5, 5, 5, 5, 5, 15, 15, 15, 15, 15, 25, 25, 25, 25, 25
f4(1:2, 1)
> 10, 20
I have some decision for first two functions but I don`t know how to exclude some numbers:
f1 <- function(x) 2*((x-1) %/% 10) + 1 # goes 1, 3, 5, etc for n = 10
f2 <- function(x) 2*((x-1) %/% 10 + 1) # goes 2, 4, 6, etc for n = 10

why not use seq and rep ?
n = 25
nrep = 2 # number of repetitions
by5 <- sort(rep(seq(5, n, by = 10), nrep )) # numbers from 5 by 10
by5
by10 <- sort(rep(seq(10, n, by = 10), nrep )) # numbers from 10 by 10
by10
odd <- sort(rep(seq(1, n, by = 2), nrep )) # odd number
odd[!odd %in% by5] # remove all the by5 values
even <- sort(rep(seq(2, n, by = 2), nrep )) # Even numbers
even[!even %in% by10] # remove all the by 10 values
output
> [1] 5 5 15 15 25 25
> [1] 10 10 20 20
> [1] 1 1 3 3 7 7 9 9 11 11 13 13 17 17 19 19 21 21 23 23
> [1] 2 2 4 4 6 6 8 8 12 12 14 14 16 16 18 18 22 22 24 24.

Related

How to cutoff values in dataframe?

From the following dataframe:
df1 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
revenue=c(34, 1000, 40, 49, 43, 55, 99))
df2 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
expenses=c(22, 26, 31, 40, 20, 2000, 22))
df3 <- data.frame(id=c(1, 2, 3, 4, 5, 6, 7),
profit=c(12, 10, 14, 12, 9, 15, 16))
df_list <- list(df1, df2, df3)
test <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list)
rownames(test) <- test[,1]
test[,1] <- NULL
test
I would like to eliminate extreme values (e.g. 1000 and 2000). I need to cutoff everything that is greater than 100. When I check test<100 I see TRUE and FALSE positions but I would like to replace them with NA or zeroes.
To replace all values in a dataframe (df) which values are higher than 100 with a 0 simply use: df[df > 100] = 0
We can use replace()
replace(test, test>100, NA)
revenue expenses profit
1 34 22 12
2 NA 26 10
3 40 31 14
4 49 40 12
5 43 20 9
6 55 NA 15
7 99 22 16

dplyr differences between pairs in nested groups

I'd like to use dplyr to calculate differences in value between people nested in pair by session.
dat <- data.frame(person=c(rep(1, 10),
rep(2, 10),
rep(3, 10),
rep(4, 10),
rep(5, 10),
rep(6, 10),
rep(7, 10),
rep(8, 10)),
pair=c(rep(1, 20),
rep(2, 20),
rep(3, 20),
rep(4, 20)),
condition=c(rep("NEW", 10),
rep("OLD", 10),
rep("NEW", 10),
rep("OLD", 10),
rep("NEW", 10),
rep("OLD", 10),
rep("NEW", 10),
rep("OLD", 10)),
session=rep(seq(from=1, to=10, by=1), 8),
value=c(0, 2, 4, 8, 16, 16, 18, 20, 20, 20,
0, 1, 1, 2, 4, 5, 8, 12, 15, 15,
0, 2, 8, 10, 15, 16, 18, 20, 20, 20,
0, 4, 4, 6, 6, 8, 10, 12, 12, 18,
0, 6, 8, 10, 16, 16, 18, 20, 20, 20,
0, 2, 2, 3, 4, 8, 8, 8, 10, 12,
0, 10, 12, 16, 18, 18, 18, 20, 20, 20,
0, 2, 2, 8, 10, 10, 11, 12, 15, 20)
)
For instance, person 1 and 2 make a pair (pair==1):
person==1 & session==2: 2
person==2 & session==2: 1
Difference (NEW-OLD) is 2-1=1.
Here's what I have tried so far. I think I need to group_by() first and then summarise(), but I have not cracked this nut.
dat %>%
mutate(session = factor(session)) %>%
group_by(condition, pair, session) %>%
summarise(pairDiff = value-first(value))
Desired output:
Your output can be obtained by:
dat %>% group_by(pair,session) %>% arrange(condition) %>% summarise(diff = -diff(value))
Source: local data frame [40 x 3]
Groups: pair [?]
# A tibble: 40 x 3
pair session diff
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 1
3 1 3 3
4 1 4 6
5 1 5 12
6 1 6 11
7 1 7 10
8 1 8 8
9 1 9 5
10 1 10 5
# ... with 30 more rows
The arrange ensures that NEW and OLD are in the correct positions, but the solution does depend on there being exactly 2 values for each combination of pair and session.
You can spread condition to headers and then do the subtraction NEW - OLD:
library(dplyr); library(tidyr)
dat %>%
select(-person) %>%
spread(condition, value) %>%
mutate(diff = NEW - OLD) %>%
select(session, pair, diff)
# A tibble: 40 x 3
# session pair diff
# <dbl> <dbl> <dbl>
# 1 1 1 0
# 2 2 1 1
# 3 3 1 3
# 4 4 1 6
# 5 5 1 12
# 6 6 1 11
# 7 7 1 10
# 8 8 1 8
# 9 9 1 5
#10 10 1 5
# ... with 30 more rows

2 columns into list and sort in R

Let's say we have two list
x <- c(1, 3, 4, 2, 6, 5)
y <- c(12, 14, 15, 61, 71, 21)
I want to combine into a list so that we have 2 column x and y and values should be in same order.
x <- c(1, 3, 4, 2, 6, 5)
y <- c(12, 14, 15, 61, 71, 21)
After you have a list I want to sort it on y so the final list looks like
x <- c(1, 3, 4, 5, 2, 6)
y <- c(12, 14, 15, 21, 61, 71)
I am really new to R.
I tried list(x,y) but it seems to make a
list(1, 3, 4, 2, 6, 5, 12, 14, 15, 61, 71, 21)
so I was wondering someone could help me.
You need to put them in a data.frame first and then use order:
x <- c(1, 3, 4, 2, 6, 5)
y <- c(-12, 14, 15, 61, 71, 21)
DF <- data.frame(x, y)
> DF[order(DF$y),]
x y
1 1 -12
2 3 14
3 4 15
6 5 21
4 2 61
5 6 71
keeping as a list, using lapply:
x <- c(1, 3, 4, 2,6,5)
y <- c(12, 14,15,61,71,21)
l <- list(x = x, y = y)
## thelatemail
lapply(l, `[`, order(l$y))
# $x
# [1] 1 3 4 5 2 6
#
# $y
# [1] 12 14 15 21 61 71
a more explicit version of the short one given by #thelatemail above but doesn't preserve the names:
lapply(seq_along(l), function(x) l[[x]][order(l$y)])
# [[1]]
# [1] 1 3 4 5 2 6
#
# [[2]]
# [1] 12 14 15 21 61 71
or rapply:
rapply(l, function(x) x[order(l$y)], how = 'list')
# $x
# [1] 1 3 4 5 2 6
#
# $y
# [1] 12 14 15 21 61 71

Categorizing the contents of a vector

Here is my problem:
myvec <- c(1, 2, 2, 2, 3, 3,3, 4, 4, 5, 6, 6, 6, 6, 7, 8, 8, 9, 10, 10, 10)
I want to develop a function that can caterize this vector depending upon number of categories I define.
if categories 1 all newvec elements will be 1
if categories are 2 then
unique (myvec), i.e.
1 = 1, 2 =2, 3 = 1, 4 = 2, 5 =1, 6 = 2, 7 = 1, 8 = 2, 9 = 1, 10 = 2
(which is situation of odd or even numbers)
If categories are 3 then first three number will be 1:3 and then pattern will be repeated.
1 = 1, 2 = 2, 3=3, 4=1, 5 = 2, 6 = 3, 7 =1, 8 = 2, 9 = 3, 10 =1
If caterogies are 4 then first number will be 1:4 and then pattern will be repeated
1 = 1, 2 = 2, 3= 3, 4 = 4, 5 = 1, 6 = 2, 7=3, 8=4, 9 =1, 10 = 2
Similarly for n categories the first 1:n, then the pattern repeated.
This should do what you need, if I correctly understood the question. You can vary variable n to choose the number of groups.
myvec <- c(1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 6, 6, 7, 8, 8, 9, 10, 10, 10)
out <- vector(mode="integer", length=length(myvec))
uid <- sort(unique(myvec))
n <- 3
for (i in 1:n) {
s <- seq(i, length(uid), n)
out[myvec %in% s] <- i
}
Using the recycling features of R (this gives a warning if the vector length is not divisible by n):
R> myvec <- c(1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 6, 6, 7, 8, 8, 9, 10, 10, 10)
R> n <- 3
R> y <- cbind(x=sort(unique(myvec)), y=1:n)[, 2]
R> y
[1] 1 2 3 1 2 3 1 2 3 1
or using rep:
R> x <- sort(unique(myvec))
R> y <- rep(1:n, length.out=length(x))
R> y
[1] 1 2 3 1 2 3 1 2 3 1
Update: you could just use the modulo operator
R> myvec
[1] 1 2 2 2 3 3 3 4 4 5 6 6 6 6 7 8 8 9 10 10 10
R> n <- 4
R> ((myvec - 1) %% n) + 1
[1] 1 2 2 2 3 3 3 4 4 1 2 2 2 2 3 4 4 1 2 2 2

Another nested loop in R

I have the following data and nested for loop:
x <- c(12, 27, 21, 16, 12, 21, 18, 16, 20, 23, 21, 10, 15, 26, 21, 22, 22, 19, 26, 26)
y <- c(8, 10, 7, 7, 9, 5, 7, 7, 10, 4, 10, 3, 9, 6, 4, 2, 4, 2, 3, 6)
a <- c(20,25)
a.sub <- c()
df <- c()
for(j in 1:length(a)){
a.sub <- which(x >= a[j])
for(i in 1:length(a.sub)){
df[i] <- y[a.sub[i]]
}
print(df)
}
I'd like the loop to return values for df as:
[1] 10 6 3 6 4 10 6 4 2 4 3 6
[1] 10 6 3 6
As I have it, however, the loop returns the same values twice of df for a <- 20 but not a <- 25:
[1] 10 7 5 10 4 10 6 4 2 4 3 6
[1] 10 6 3 6 4 10 6 4 2 4 3 6
for(i in 1:length(a.sub)){
df[i] <- y[a.sub[i]]
}
can become
df <- y[a.sub]
neither a.sub nor df need to be predefined then and thus...
x <- c(12, 27, 21, 16, 12, 21, 18, 16, 20, 23, 21, 10, 15, 26, 21, 22, 22, 19, 26, 26)
y <- c(8, 10, 7, 7, 9, 5, 7, 7, 10, 4, 10, 3, 9, 6, 4, 2, 4, 2, 3, 6)
a <- c(20,25)
for(j in 1:length(a)){
a.sub <- which(x >= a[j])
df <- y[a.sub]
print(df)
}
It could be made shorter. df is unnecessary if you're just printing the subset of y anyway. Just print it directly. And the selector is so short it wouldn't make a single line confusing. Furthermore, why use length of a and index.. loop through a directly. So, it could be...
a <- c(20,25)
for(ax in a){
print( y[ which(x >= ax) ] )
}
Not sure if this is a simplified version of a more complex problem, but I'd probably solve this using some direct indexing and an apply function. Something like this:
z <- cbind(x,y)
sapply(c(20,25), function(x) z[z[, 1] >= x, 2])
[[1]]
[1] 10 7 5 10 4 10 6 4 2 4 3 6
[[2]]
[1] 10 6 3 6

Resources