Creating a (half-) regular sequence - a regular rate of varying intervals - r

I want to create a kind of nested regular sequence in R. It follows a repeating pattern, but without consistent intervals between values. It is:
8, 9, 10, 11, 12, 13, 17, 18, 19, 20, 21, 22, 26, 27, 28, ....
So 6 numbers with an interval of 1, then an interval of 3, and then the same again. I'd like to have this all the way up to about 200, ideally being able to specify that end point.
I have tried using rep and seq, but do not know how to get the regularly varying interval length into either function.
I started plotting it and thinking about creating a step function based on the the length... it can't be that difficult - what's the trick/magic package I don't know of??

Without doing any math to figure out how many groups and such, we can just over-generate.
Defining terminology, I'll say you have a bunch of groups of sequences, with 6 elements per group. We'll start with 100 groups to make sure we definitely cross the 200 threshhold.
n_per_group = 6
n_groups = 100
# first generate a regular sequence, with no adjustments
x = seq(from = 8, length.out = n_per_group * n_groups)
# then calculate an adjustment to add
# as you say, the interval is 3 (well, 3 more than the usual 1)
adjustment = rep(0:(n_groups - 1), each = n_per_group) * 3
# if your prefer modular arithmetic, this is equivalent
# adjustment = (seq_along(x) %/% 6) * 3
# then we just add
x = x + adjustment
# and subset down to the desired result
x = x[x <= 200]
x
# [1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30
# [18] 31 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56
# [35] 57 58 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82
# [52] 83 84 85 89 90 91 92 93 94 98 99 100 101 102 103 107 108
# [69] 109 110 111 112 116 117 118 119 120 121 125 126 127 128 129 130 134
# [86] 135 136 137 138 139 143 144 145 146 147 148 152 153 154 155 156 157
#[103] 161 162 163 164 165 166 170 171 172 173 174 175 179 180 181 182 183
#[120] 184 188 189 190 191 192 193 197 198 199 200

The differences between successive values in the sequence are as given by diffs so take the cumsum of those. To get it to go to about 200 use the indicated repitition value where 1+1+1+1+1+4 = 9.
diffs <- c(8, rep(c(1, 1, 1, 1, 1, 4), (200-8)/9))
cumsum(diffs)
giving:
[1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30 31
[19] 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56 57 58
[37] 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82 83 84 85
[55] 89 90 91 92 93 94 98 99 100 101 102 103 107 108 109 110 111 112
[73] 116 117 118 119 120 121 125 126 127 128 129 130 134 135 136 137 138 139
[91] 143 144 145 146 147 148 152 153 154 155 156 157 161 162 163 164 165 166
[109] 170 171 172 173 174 175 179 180 181 182 183 184 188 189 190 191 192 193
[127] 197

My first attempt would be using for loops, but keep in mind that they are slow compared to build in functions. But as you only want to "count" to 200, it should be fast enough.
for(i=1:199) {
if( mod(i, 7) != 0) {
result[i+1] = result[i] + 1;
} else {
result[i+1] = result[i] + 3;
}
}
note: i do not have Matlab on my computer at the time of answering, thus the above code is untested, but I hope you get the idea.

Related

Working on vectors and finding their square

I am getting my self familiar with R, working on it using some mathematical work. I am working on indexing and seq function and getting help from here
I am first creating a vector x with all the integer from 1 to 200, I am performing this task using the code below
t <- 1:200
now I want to display the every 5th number using from above vector, I am doing it with below method
u <- seq (1,200, by=5)
First question: though the every 5th number is 5, 10 , 15 but its showing me 1, 6 , 11 etc
Now I want to take the square of any random numbers from vector t for that I am doing it in below way:\
square <- t[c(4, 6, 7, 9, 16, 24, 26, 29,30)]^2
Second question This is displaying me the square of these numbers but without using loops how I can display the numbers like 1,2,3,16,5,36 etc
I am using the below web pages for practice and understanding
https://rspatial.org/intr/4-indexing.html
https://www.r-exercises.com/start-here-to-learn-r/
Another option is replace
t <- 1:200
v <- c(4, 6, 7, 9, 16, 24, 26, 29, 30)
replace(t, v, t[v]^2)
We can use an ifelse
ifelse(seq_along(t) %in% c(4, 6, 7, 9, 16, 24, 26, 29,30), t^2, t)
-output
[1] 1 2 3 16 5 36 49 8 81 10 11 12 13 14 15 256 17 18 19 20 21 22 23 576 25 676 27 28 841 900 31 32 33 34 35
[36] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
[71] 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
[106] 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
[141] 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
[176] 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200

How do I retrieve an element along with the list in which it resides from a list of lists?

Suppose, I am having a list of lists like below:
> myList
[[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
[[2]]
[1] 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218
[[3]]
[1] 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100 107 114 121 128 135 142 149 156 163 170 177 184 191 198 205 212 219
[[4]]
[1] 3 10 17 24 31 38 45 52 59 66 73 80 87 94 101 108 115 122 129 136 143 150 157 164 171 178 185 192 199 206 213 220
[[5]]
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
How do I search for an element in this list of lists and retrieve the entire list in which it belongs?
I tried something like below:
> myList[grep(7, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
This case looks correct, but when I tried this for the below case, I got the wrong result.
> myList[grep(18, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
while the correct output should be :
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
Is there any possible solution to this?
EDIT::
The sample list can be produced using --
l <- seq(0, 194)
myList <- list()
for (d in l){
temp <- intersect(seq(d, max(l), by = 7),l)
if (any(sapply(myList,function(x) d %in% x)) == FALSE){
myList <- append(myList, list(temp))
}
}
Could try:
myList[sapply(myList, function(x) any(x %in% 7))]
Use purrr package:
library(purrr)
keep(mylist, function(x, y) {any(x == y)}, y = 18)
purrr provides many useful list-handling functions which are documented in a cheatsheet that can be found here
If 18 is a number you wish to find in the list, try:
myList[sapply(myList, function(x) 18 %in% x)]

R, how to create a sequence of numbers with gaps in it? [duplicate]

This question already has answers here:
Generate sequence with alternating increments in R? [duplicate]
(4 answers)
Closed 6 years ago.
How to efficiently generate the following sequence of numbers in R?
4, 5, 10, 11, 16, 17, ..., 178, 179
One way is to think of it as the union of two sequences.
sort(c(seq(4,178,6), seq(5,179,6)))
[1] 4 5 10 11 16 17 22 23 28 29 34 35 40 41 46 47 52 53 58
[20] 59 64 65 70 71 76 77 82 83 88 89 94 95 100 101 106 107 112 113
[39] 118 119 124 125 130 131 136 137 142 143 148 149 154 155 160 161 166 167 172
[58] 173 178 179

Filtering my R data frame is causing it to sort the data frame incorrectly

Consider the following two code snippets.
A:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5, nrows=190) # Specify nrows, get correct answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
# No need to remove unranked countries because we specified nrows
# No need to convert V2 from factor to numeric
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get KNA, correct answer
B:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5) # Don't specify nrows, get incorrect answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
mergedData = mergedData[which(mergedData$V2 != ""),] # Remove unranked countries
mergedData$V2 = as.numeric(mergedData$V2) # make V2 a numeric column
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get SRB, incorrect answer
I would think the two code snippets would be identical, except that in A you never add the unranked countries to your dataframe and in B you add them but then remove them. Why is the sorting different for these two code snippets?
The file downloads are from Coursera's Getting and Cleaning Data class (Quiz 3, Question 3).
Edit: To avoid security concerns, I've pasted the raw .csv files below
gdp.csv - http://pastebin.com/raw.php?i=4aRZwBRd
education.csv - http://pastebin.com/raw.php?i=0pbhDCSX
Edit2: The problem is occurring in the as.numeric step. For case B, here is mergedData$V2 before and after mergedData$V2 = as.numeric(mergedData$V2) is applied:
> mergedData$V2
[1] 161 105 60 125 32 26 133 172 12 27 68 162 25 140 128 59 76 93
[19] 138 111 69 169 149 96 7 153 113 167 117 165 11 20 36 2 99 98
[37] 121 30 182 166 81 67 102 51 4 183 33 72 48 64 38 159 13 103
[55] 85 43 155 5 185 109 6 114 86 148 175 176 110 42 178 77 160 37
[73] 108 71 139 58 16 10 46 22 47 122 40 9 116 92 3 50 87 145
[91] 120 189 178 15 146 56 136 83 168 171 70 163 84 74 94 82 62 147
[109] 141 132 164 14 188 135 129 137 151 130 118 154 127 152 34 123 144 39
[127] 126 18 23 107 55 66 44 89 49 41 187 115 24 61 45 97 54 52
[145] 8 142 19 73 119 35 174 157 100 88 186 150 63 80 21 158 173 65
[163] 124 156 31 143 91 170 184 101 79 17 190 95 106 53 78 1 75 180
[181] 29 57 177 181 90 28 112 104 134
194 Levels: .. Not available. 1 10 100 101 102 103 104 105 106 107 ... Note: Rankings include only those economies with confirmed GDP estimates. Figures in italics are for 2011 or 2010.
> mergedData$V2 = as.numeric(mergedData$V2)
> mergedData$V2
[1] 72 10 149 32 118 111 41 84 26 112 157 73 110 49 35 147 166 185
[19] 46 17 158 80 58 188 159 63 19 78 23 76 15 105 122 104 191 190
[37] 28 116 94 77 172 156 7 139 126 95 119 162 135 153 124 69 37 8
[55] 176 130 65 137 97 14 148 20 177 57 87 88 16 129 90 167 71 123
[73] 13 161 47 146 70 4 133 107 134 29 127 181 22 184 115 138 178 54
[91] 27 101 90 59 55 144 44 174 79 83 160 74 175 164 186 173 151 56
[109] 50 40 75 48 100 43 36 45 61 38 24 64 34 62 120 30 53 125
[127] 33 91 108 12 143 155 131 180 136 128 99 21 109 150 132 189 142 140
[145] 170 51 102 163 25 121 86 67 5 179 98 60 152 171 106 68 85 154
[163] 31 66 117 52 183 82 96 6 169 81 103 187 11 141 168 3 165 92
[181] 114 145 89 93 182 113 18 9 42
Can anyone explain why the numbers change when I apply as.numeric()?
The real reason for getting different results are in the second case i.e. the full dataset have some footer notes, which were also read with the read.csv resulting in most of the columns to be 'factor' class because of the 'character' elements in the footer. This could have avoided either by
skipping the last few lines using skip argument in read.csv
using stringsAsFactors=FALSE in the read.csv call along with skipping the lines.
The columns were ordered based on the "levels" of the factor.
If you have already read the files without skipping the lines, convert to the respective classes. If it is 'numeric' column, convert it to numeric by as.numeric(as.character(df$column)) or as.numeric(levels(df$column))[df$column].

Create a for loop which prints every number that is x%%3=0 between 1-200

Like the title says I need a for loop which will write every number from 1 to 200 that is evenly divided by 3.
Every other method posted so far generates the 1:200 vector then throws away two thirds of it. What a waste. In an attempt to be eco-conscious, this method does not waste any electrons:
seq(3,200,by=3)
You don't need a for loop, use match function instead, as in:
which(1:200 %% 3 == 0)
[1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81
[28] 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
[55] 165 168 171 174 177 180 183 186 189 192 195 198
Two other alternatives:
c(1:200)[c(F, F, T)]
c(1:200)[1:200 %% 3 == 0]

Resources