Working on vectors and finding their square - r

I am getting my self familiar with R, working on it using some mathematical work. I am working on indexing and seq function and getting help from here
I am first creating a vector x with all the integer from 1 to 200, I am performing this task using the code below
t <- 1:200
now I want to display the every 5th number using from above vector, I am doing it with below method
u <- seq (1,200, by=5)
First question: though the every 5th number is 5, 10 , 15 but its showing me 1, 6 , 11 etc
Now I want to take the square of any random numbers from vector t for that I am doing it in below way:\
square <- t[c(4, 6, 7, 9, 16, 24, 26, 29,30)]^2
Second question This is displaying me the square of these numbers but without using loops how I can display the numbers like 1,2,3,16,5,36 etc
I am using the below web pages for practice and understanding
https://rspatial.org/intr/4-indexing.html
https://www.r-exercises.com/start-here-to-learn-r/

Another option is replace
t <- 1:200
v <- c(4, 6, 7, 9, 16, 24, 26, 29, 30)
replace(t, v, t[v]^2)

We can use an ifelse
ifelse(seq_along(t) %in% c(4, 6, 7, 9, 16, 24, 26, 29,30), t^2, t)
-output
[1] 1 2 3 16 5 36 49 8 81 10 11 12 13 14 15 256 17 18 19 20 21 22 23 576 25 676 27 28 841 900 31 32 33 34 35
[36] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
[71] 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
[106] 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
[141] 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
[176] 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200

Related

How do I retrieve an element along with the list in which it resides from a list of lists?

Suppose, I am having a list of lists like below:
> myList
[[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
[[2]]
[1] 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218
[[3]]
[1] 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100 107 114 121 128 135 142 149 156 163 170 177 184 191 198 205 212 219
[[4]]
[1] 3 10 17 24 31 38 45 52 59 66 73 80 87 94 101 108 115 122 129 136 143 150 157 164 171 178 185 192 199 206 213 220
[[5]]
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
How do I search for an element in this list of lists and retrieve the entire list in which it belongs?
I tried something like below:
> myList[grep(7, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
This case looks correct, but when I tried this for the below case, I got the wrong result.
> myList[grep(18, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
while the correct output should be :
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
Is there any possible solution to this?
EDIT::
The sample list can be produced using --
l <- seq(0, 194)
myList <- list()
for (d in l){
temp <- intersect(seq(d, max(l), by = 7),l)
if (any(sapply(myList,function(x) d %in% x)) == FALSE){
myList <- append(myList, list(temp))
}
}
Could try:
myList[sapply(myList, function(x) any(x %in% 7))]
Use purrr package:
library(purrr)
keep(mylist, function(x, y) {any(x == y)}, y = 18)
purrr provides many useful list-handling functions which are documented in a cheatsheet that can be found here
If 18 is a number you wish to find in the list, try:
myList[sapply(myList, function(x) 18 %in% x)]

R, how to create a sequence of numbers with gaps in it? [duplicate]

This question already has answers here:
Generate sequence with alternating increments in R? [duplicate]
(4 answers)
Closed 6 years ago.
How to efficiently generate the following sequence of numbers in R?
4, 5, 10, 11, 16, 17, ..., 178, 179
One way is to think of it as the union of two sequences.
sort(c(seq(4,178,6), seq(5,179,6)))
[1] 4 5 10 11 16 17 22 23 28 29 34 35 40 41 46 47 52 53 58
[20] 59 64 65 70 71 76 77 82 83 88 89 94 95 100 101 106 107 112 113
[39] 118 119 124 125 130 131 136 137 142 143 148 149 154 155 160 161 166 167 172
[58] 173 178 179

Creating a (half-) regular sequence - a regular rate of varying intervals

I want to create a kind of nested regular sequence in R. It follows a repeating pattern, but without consistent intervals between values. It is:
8, 9, 10, 11, 12, 13, 17, 18, 19, 20, 21, 22, 26, 27, 28, ....
So 6 numbers with an interval of 1, then an interval of 3, and then the same again. I'd like to have this all the way up to about 200, ideally being able to specify that end point.
I have tried using rep and seq, but do not know how to get the regularly varying interval length into either function.
I started plotting it and thinking about creating a step function based on the the length... it can't be that difficult - what's the trick/magic package I don't know of??
Without doing any math to figure out how many groups and such, we can just over-generate.
Defining terminology, I'll say you have a bunch of groups of sequences, with 6 elements per group. We'll start with 100 groups to make sure we definitely cross the 200 threshhold.
n_per_group = 6
n_groups = 100
# first generate a regular sequence, with no adjustments
x = seq(from = 8, length.out = n_per_group * n_groups)
# then calculate an adjustment to add
# as you say, the interval is 3 (well, 3 more than the usual 1)
adjustment = rep(0:(n_groups - 1), each = n_per_group) * 3
# if your prefer modular arithmetic, this is equivalent
# adjustment = (seq_along(x) %/% 6) * 3
# then we just add
x = x + adjustment
# and subset down to the desired result
x = x[x <= 200]
x
# [1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30
# [18] 31 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56
# [35] 57 58 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82
# [52] 83 84 85 89 90 91 92 93 94 98 99 100 101 102 103 107 108
# [69] 109 110 111 112 116 117 118 119 120 121 125 126 127 128 129 130 134
# [86] 135 136 137 138 139 143 144 145 146 147 148 152 153 154 155 156 157
#[103] 161 162 163 164 165 166 170 171 172 173 174 175 179 180 181 182 183
#[120] 184 188 189 190 191 192 193 197 198 199 200
The differences between successive values in the sequence are as given by diffs so take the cumsum of those. To get it to go to about 200 use the indicated repitition value where 1+1+1+1+1+4 = 9.
diffs <- c(8, rep(c(1, 1, 1, 1, 1, 4), (200-8)/9))
cumsum(diffs)
giving:
[1] 8 9 10 11 12 13 17 18 19 20 21 22 26 27 28 29 30 31
[19] 35 36 37 38 39 40 44 45 46 47 48 49 53 54 55 56 57 58
[37] 62 63 64 65 66 67 71 72 73 74 75 76 80 81 82 83 84 85
[55] 89 90 91 92 93 94 98 99 100 101 102 103 107 108 109 110 111 112
[73] 116 117 118 119 120 121 125 126 127 128 129 130 134 135 136 137 138 139
[91] 143 144 145 146 147 148 152 153 154 155 156 157 161 162 163 164 165 166
[109] 170 171 172 173 174 175 179 180 181 182 183 184 188 189 190 191 192 193
[127] 197
My first attempt would be using for loops, but keep in mind that they are slow compared to build in functions. But as you only want to "count" to 200, it should be fast enough.
for(i=1:199) {
if( mod(i, 7) != 0) {
result[i+1] = result[i] + 1;
} else {
result[i+1] = result[i] + 3;
}
}
note: i do not have Matlab on my computer at the time of answering, thus the above code is untested, but I hope you get the idea.

Filtering my R data frame is causing it to sort the data frame incorrectly

Consider the following two code snippets.
A:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5, nrows=190) # Specify nrows, get correct answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
# No need to remove unranked countries because we specified nrows
# No need to convert V2 from factor to numeric
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get KNA, correct answer
B:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5) # Don't specify nrows, get incorrect answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
mergedData = mergedData[which(mergedData$V2 != ""),] # Remove unranked countries
mergedData$V2 = as.numeric(mergedData$V2) # make V2 a numeric column
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get SRB, incorrect answer
I would think the two code snippets would be identical, except that in A you never add the unranked countries to your dataframe and in B you add them but then remove them. Why is the sorting different for these two code snippets?
The file downloads are from Coursera's Getting and Cleaning Data class (Quiz 3, Question 3).
Edit: To avoid security concerns, I've pasted the raw .csv files below
gdp.csv - http://pastebin.com/raw.php?i=4aRZwBRd
education.csv - http://pastebin.com/raw.php?i=0pbhDCSX
Edit2: The problem is occurring in the as.numeric step. For case B, here is mergedData$V2 before and after mergedData$V2 = as.numeric(mergedData$V2) is applied:
> mergedData$V2
[1] 161 105 60 125 32 26 133 172 12 27 68 162 25 140 128 59 76 93
[19] 138 111 69 169 149 96 7 153 113 167 117 165 11 20 36 2 99 98
[37] 121 30 182 166 81 67 102 51 4 183 33 72 48 64 38 159 13 103
[55] 85 43 155 5 185 109 6 114 86 148 175 176 110 42 178 77 160 37
[73] 108 71 139 58 16 10 46 22 47 122 40 9 116 92 3 50 87 145
[91] 120 189 178 15 146 56 136 83 168 171 70 163 84 74 94 82 62 147
[109] 141 132 164 14 188 135 129 137 151 130 118 154 127 152 34 123 144 39
[127] 126 18 23 107 55 66 44 89 49 41 187 115 24 61 45 97 54 52
[145] 8 142 19 73 119 35 174 157 100 88 186 150 63 80 21 158 173 65
[163] 124 156 31 143 91 170 184 101 79 17 190 95 106 53 78 1 75 180
[181] 29 57 177 181 90 28 112 104 134
194 Levels: .. Not available. 1 10 100 101 102 103 104 105 106 107 ... Note: Rankings include only those economies with confirmed GDP estimates. Figures in italics are for 2011 or 2010.
> mergedData$V2 = as.numeric(mergedData$V2)
> mergedData$V2
[1] 72 10 149 32 118 111 41 84 26 112 157 73 110 49 35 147 166 185
[19] 46 17 158 80 58 188 159 63 19 78 23 76 15 105 122 104 191 190
[37] 28 116 94 77 172 156 7 139 126 95 119 162 135 153 124 69 37 8
[55] 176 130 65 137 97 14 148 20 177 57 87 88 16 129 90 167 71 123
[73] 13 161 47 146 70 4 133 107 134 29 127 181 22 184 115 138 178 54
[91] 27 101 90 59 55 144 44 174 79 83 160 74 175 164 186 173 151 56
[109] 50 40 75 48 100 43 36 45 61 38 24 64 34 62 120 30 53 125
[127] 33 91 108 12 143 155 131 180 136 128 99 21 109 150 132 189 142 140
[145] 170 51 102 163 25 121 86 67 5 179 98 60 152 171 106 68 85 154
[163] 31 66 117 52 183 82 96 6 169 81 103 187 11 141 168 3 165 92
[181] 114 145 89 93 182 113 18 9 42
Can anyone explain why the numbers change when I apply as.numeric()?
The real reason for getting different results are in the second case i.e. the full dataset have some footer notes, which were also read with the read.csv resulting in most of the columns to be 'factor' class because of the 'character' elements in the footer. This could have avoided either by
skipping the last few lines using skip argument in read.csv
using stringsAsFactors=FALSE in the read.csv call along with skipping the lines.
The columns were ordered based on the "levels" of the factor.
If you have already read the files without skipping the lines, convert to the respective classes. If it is 'numeric' column, convert it to numeric by as.numeric(as.character(df$column)) or as.numeric(levels(df$column))[df$column].

R: number in a txt file split up by line

I have a problem in reading a .txt in to R.
The data is something like this:
68 89 103 1
37 8 103 9
78 93 8 12
3 50
I used readLine() in R and came up with a list. But when I compare it to the raw data, I find that , for example, the last "1" in the first line is not 1, it should be connected to the second line, which make the number to e 137, instead of 1 and 37. I think this data is split by " ". If I use readLine(), I manually split up the lines. How could I correctly read it?
And, number 9 is not connect to 78 since at the beginning of line 3, there is a space. number 12 is connected with 3 to form 123, since there is no space before 3.
Thanks. I even don't know how to search my problem in Google. Don't know how to express it.
182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63
102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91
1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1
63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1
37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134
134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9
1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123
137 161 179 182 140 152 182 182 81 63 88 134 84 134 182
7 11 9 2 9 4 6 7 6 1 13 2 1 10 4 5 11 11 9 12 1 3 1 3 3
Basically, what I am doing now is:
For example, the vector:
ind <- c(7, 11, 9, 2 ,9 ,4 ,6, 7, 6 ,1, 13, 2 ,1 ,10 ,4 ,5 ,11 ,11, 9 ,12, 1, 3 ,1, 3 ,3)
indicates that the block of number above should be split up according to the length specified by the vector. I know I can split up a vector by
split(vector, rep(1:length(ind), ind))
However, the problem is I can't read the block of number correctly.
Based on the conditions you described, i.e. if there is a space at the beginning of line after you read the file with readLines, then the last number in the previous line should be joined with the first number of the current line.
Using your second example (I didn't understand the ind though)
lines1 <- readLines(n=10)
182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63
102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91
1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1
63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1
37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134
134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9
1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123
137 161 179 182 140 152 182 182 81 63 88 134 84 134 182
lines2 <- lines1[lines1!=''] #remove blank lines
indx <- grep("^ ", lines2) #create a numeric index for lines that start with a space
indx1 <- indx-1 #index that is one above the previous `indx`
lines2[indx1] <- paste0(lines2[indx1], gsub("^\\s+", "", lines2[indx])) #paste the lines together using the two indexes
lines3 <- lines2[-indx] #remove the lines that belong to the first index
lines3
#[1] "182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91"
#[2] "1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1"
#[3] "63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1"
#[4] "37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9"
#[5] "1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123137 161 179 182 140 152 182 182 81 63 88 134 84 134 182"

Resources