Add sequence to each element of a vector - r

I have a vector as indicated below
x <- c(1,32,60,86,115,142,171,198)
I would like to create a sequence as seq(x[i],x[i]+2,by=1) for each element of the vector. The resulting vector should be
1,2,3,32,33,34,60,61,62,86,87,88.....
I was wondering if there is a function similar to rep to do this? Appreciate your input on this.

We can use the vectorized rep
rep(x, each = 3) + 0:2
#[1] 1 2 3 32 33 34 60 61 62 86 87 88 115 116 117 142 143
#[18] 144 171 172 173 198 199 200

You can use saaply to loop over every element of x and generate a sequence of numbers and combine them with c
c(sapply(x, function(x) seq(x, x+2)))
# [1] 1 2 3 32 33 34 60 61 62 86 87 88 115 116 117 142 143 144 171 172 173
# 198 199 200

Related

Optimal way to reshape dataframe in R to have observation on columns

Given e.g. the Orange data set, I would like to arrange the observations in a matrix in which the measurements (circumference) taken on each tree are arranged in rows (for a total of 5 rows).
One unsatisfactory way of obtaining this result is as follows:
mat<-matrix(Orange[,3],nrow=5, ncol = 7,byrow=T, dimnames = list(c(unique(Orange$Tree)),c(1:7)))
An alternative way would be using the dcast( ) function within the data.table package.
This allows you to convert data from long to wide. In this case, I've created an ID to could the number of records per Tree.
In the re-shaped data, Tree becomes our primary column and circumference is recorded in 7 unique columns (one for each age).
library(data.table)
Orange <- data.table(Orange)[,ID := seq(1:.N), by=Tree]
Orange2 <- dcast(
data = Orange,
formula = Tree ~ ID,
value.var = "circumference")
Orange2
Tree 1 2 3 4 5 6 7
1: 3 30 51 75 108 115 139 140
2: 1 30 58 87 115 120 142 145
3: 5 30 49 81 125 142 174 177
4: 2 33 69 111 156 172 203 203
5: 4 32 62 112 167 179 209 214
EDIT (in response to additional comments/questions):
Technically the data is already ordered by Tree (defined within the data). This is because the variable Tree is a factor variable with preset levels. To order numerically, here are 2 things: (1) Order by as.character( ) and (2) Re-level the variable.
Orange2[order(as.character(Tree),]
1: 1 30 58 87 115 120 142 145
2: 2 33 69 111 156 172 203 203
3: 3 30 51 75 108 115 139 140
4: 4 32 62 112 167 179 209 214
5: 5 30 49 81 125 142 174 177
class(Orange$Tree)
[1] "ordered" "factor"
levels(Orange$Tree)
[1] "3" "1" "5" "2" "4"
Orange2[,Tree := factor(Tree, c("1","2","3","4","5"), ordered = FALSE)]
Orange2[order(Tree),]
Tree 1 2 3 4 5 6 7
1: 1 30 58 87 115 120 142 145
2: 2 33 69 111 156 172 203 203
3: 3 30 51 75 108 115 139 140
4: 4 32 62 112 167 179 209 214
5: 5 30 49 81 125 142 174 177
In base, you could simply do:
aggregate(circumference ~ Tree, Orange, I)
If you don't want to order it afterwards: aggregate(circumference ~ as.character(Tree), Orange, I) (that will strip the factor ordering).
Or similar to #RyanF:
Orange$id <- sequence(rle(as.character(Orange$Tree))$lengths)
reshape(Orange[,-2],
idvar = "Tree",
timevar = "id",
direction = "wide")
Output:
Tree circumference.1 circumference.2 circumference.3 circumference.4 circumference.5 circumference.6 circumference.7
1 1 30 58 87 115 120 142 145
8 2 33 69 111 156 172 203 203
15 3 30 51 75 108 115 139 140
22 4 32 62 112 167 179 209 214
29 5 30 49 81 125 142 174 177

I've a vector from which I want to select its elements that aren't divisible by 3 nor 5

I've the following vector:
set.seed(1)
v2 <- sample(1:200,50)
I've been trying to select all numbers that are not divisible by 3 nor 5 but I haven't had success
nodivisibles <- function(v2){
prueba <- (v2%%3==0)
tres <- v2[-prueba]
pruebados <- (v2%%5==0)
cinco <- v2[-pruebados]
list(tres,cinco)
}
Like others mentioned use ! to invert.
You can do it in one line without the function.
v2[!(v2%%3==0 | v2%%5==0)]
# > v2[!(v2%%3==0 | v2%%5==0)]
# [1] 179 176 184 128 121 196 34 133 182 169 38 116 23 68 67 59 82 83
# 32 131 118 173 103 124 4 74 112

Creating a (half-) regular sequence - a regular rate of varying intervals

I want to create a kind of nested regular sequence in R. It follows a repeating pattern, but without consistent intervals between values. It is:
8, 9, 10, 11, 12, 13, 17, 18, 19, 20, 21, 22, 26, 27, 28, ....
So 6 numbers with an interval of 1, then an interval of 3, and then the same again. I'd like to have this all the way up to about 200, ideally being able to specify that end point.
I have tried using rep and seq, but do not know how to get the regularly varying interval length into either function.
I started plotting it and thinking about creating a step function based on the the length... it can't be that difficult - what's the trick/magic package I don't know of??
Without doing any math to figure out how many groups and such, we can just over-generate.
Defining terminology, I'll say you have a bunch of groups of sequences, with 6 elements per group. We'll start with 100 groups to make sure we definitely cross the 200 threshhold.
n_per_group = 6
n_groups = 100
# first generate a regular sequence, with no adjustments
x = seq(from = 8, length.out = n_per_group * n_groups)
# then calculate an adjustment to add
# as you say, the interval is 3 (well, 3 more than the usual 1)
adjustment = rep(0:(n_groups - 1), each = n_per_group) * 3
# if your prefer modular arithmetic, this is equivalent
# adjustment = (seq_along(x) %/% 6) * 3
# then we just add
x = x + adjustment
# and subset down to the desired result
x = x[x <= 200]
x
# [1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30
# [18] 31 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56
# [35] 57 58 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82
# [52] 83 84 85 89 90 91 92 93 94 98 99 100 101 102 103 107 108
# [69] 109 110 111 112 116 117 118 119 120 121 125 126 127 128 129 130 134
# [86] 135 136 137 138 139 143 144 145 146 147 148 152 153 154 155 156 157
#[103] 161 162 163 164 165 166 170 171 172 173 174 175 179 180 181 182 183
#[120] 184 188 189 190 191 192 193 197 198 199 200
The differences between successive values in the sequence are as given by diffs so take the cumsum of those. To get it to go to about 200 use the indicated repitition value where 1+1+1+1+1+4 = 9.
diffs <- c(8, rep(c(1, 1, 1, 1, 1, 4), (200-8)/9))
cumsum(diffs)
giving:
[1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30 31
[19] 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56 57 58
[37] 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82 83 84 85
[55] 89 90 91 92 93 94 98 99 100 101 102 103 107 108 109 110 111 112
[73] 116 117 118 119 120 121 125 126 127 128 129 130 134 135 136 137 138 139
[91] 143 144 145 146 147 148 152 153 154 155 156 157 161 162 163 164 165 166
[109] 170 171 172 173 174 175 179 180 181 182 183 184 188 189 190 191 192 193
[127] 197
My first attempt would be using for loops, but keep in mind that they are slow compared to build in functions. But as you only want to "count" to 200, it should be fast enough.
for(i=1:199) {
if( mod(i, 7) != 0) {
result[i+1] = result[i] + 1;
} else {
result[i+1] = result[i] + 3;
}
}
note: i do not have Matlab on my computer at the time of answering, thus the above code is untested, but I hope you get the idea.

Return id numbers if missing over a set of variables

If I have a large database, including an 'id' var, I want to list all variables of interest, and return back to myself a list of ids that are missing each particular variable.
#Fake Data:
set.seed(11100)
missdata<-data.frame(id<-1:1000,C1<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)), C2<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)))
names(missdata)<-c("id","v1","v2")
#One variable solution:
missdatatest<-subset(missdata, is.na(v1),select=id)
missdatatest[1:10,]
> missdatatest[1:10,]
[1] 5 30 44 47 48 49 57 65 68 74
#Looking to build a function...
FindMissings<-function(indata,varslist,printvar){
printonevar<-function(var){
missdatalist<-subset(indata, is.na(var),select=printvar)
print(missdatalist)
}
lapply(vars,printonevar)
}
#Run function:
vars<-c("v1","v2")
FindMissings(missdata,vars,id)
#Error:
> FindMissings(missdata,vars,id)
Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected
Any help would be appreciated. I originally wrote a function to do this in SAS, and it works perfectly fine, but I'm trying to move a lot of my work into R.
There's no need for such a function. Just use lapply:
> lapply(missdata[-1], function(x) which(is.na(x)))
$v1
[1] 5 30 44 47 48 49 57 65 68 74 89 103 107 110 115 119 152 167
[19] 175 176 194 197 199 202 204 212 215 223 231 232 233 239 245 280 281 293...
<<SNIP>>
$v2
[1] 3 6 18 19 22 23 27 28 33 38 41 50 51 55 60 66 68 77
[19] 81 84 86 96 97 99 109 116 117 134 139 141 143 146 148 153 165 168...
<<SNIP>>
If you specifically wanted to return the values from your "id" column (not just the position of the NA values), you can modify the statement to be:
lapply(missdata[-1], function(x) missdata$id[which(is.na(x))])
If your concern is how to use this approach for specific variables, it's pretty straightforward:
vars <- c("v1","v2")
lapply(missdata[vars], function(x) which(is.na(x)))

Create a for loop which prints every number that is x%%3=0 between 1-200

Like the title says I need a for loop which will write every number from 1 to 200 that is evenly divided by 3.
Every other method posted so far generates the 1:200 vector then throws away two thirds of it. What a waste. In an attempt to be eco-conscious, this method does not waste any electrons:
seq(3,200,by=3)
You don't need a for loop, use match function instead, as in:
which(1:200 %% 3 == 0)
[1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81
[28] 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
[55] 165 168 171 174 177 180 183 186 189 192 195 198
Two other alternatives:
c(1:200)[c(F, F, T)]
c(1:200)[1:200 %% 3 == 0]

Resources