How do I flip numerical data in R? - r

I am an R beginner but have thus far been able to find answers to my questions by googling. After a few days of searching I still can't figure this out though.
I have a dataset with cognitive test results. Most tests are scored so that higher scores are better. ONE test is scored in the opposite way, so that lower scores are better (completion time of the task). I want to combine three tests (so values from three columns in my dataframe) but first I need to flip the values of this one test.
By flip I mean that my lowest value (i.e. fastest completion time and best score) instead gets the highest value and that the highetst value (i.e. the slowest completion time and worst score) gets the lowest value. My data is numerical.
I have tried the dense_rank() function as well as the rev() function. dense_rank() returns a vector where the values are ranked but where the spread of the values are not kept and rev() only reverses the order of the values in the vector, it does not change the values themselves.
Example code:
> (.packages())
[1] "readxl" "rethinking" "parallel" "rstan" "StanHeaders" "uwIntroStats"
[7] "ggplot2" "dplyr" "quantreg" "SparseM" "foreign" "aod"
[13] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[19] "base"
> testresults <- seq(from = 12, to = 120, by = 2)
>
> testresults
[1] 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
[25] 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106
[49] 108 110 112 114 116 118 120
> test.frame <- data.frame(testresults, rev(testresults), rank(testresults))
> test.frame
testresults rev.testresults. rank.testresults.
1 12 120 1
2 14 118 2
3 16 116 3
4 18 114 4
5 20 112 5
6 22 110 6
7 24 108 7
8 26 106 8
9 28 104 9
10 30 102 10
11 32 100 11
12 34 98 12
13 36 96 13
14 38 94 14
15 40 92 15
16 42 90 16
17 44 88 17
18 46 86 18
19 48 84 19
20 50 82 20
21 52 80 21
22 54 78 22
23 56 76 23
24 58 74 24
25 60 72 25
26 62 70 26
27 64 68 27
28 66 66 28
29 68 64 29
30 70 62 30
31 72 60 31
32 74 58 32
33 76 56 33
34 78 54 34
35 80 52 35
36 82 50 36
37 84 48 37
38 86 46 38
39 88 44 39
40 90 42 40
41 92 40 41
42 94 38 42
43 96 36 43
44 98 34 44
45 100 32 45
46 102 30 46
47 104 28 47
48 106 26 48
49 108 24 49
50 110 22 50
51 112 20 51
52 114 18 52
53 116 16 53
54 118 14 54
55 120 12 55
I am sure I have overlooked a simple solution to this problem, thank you in advance to anyone who can help or point me in the right direction.
Best,
Maria

You can subtract your values from the maximum value and then add the minimum value. For example:
x <- seq(1, 5, by = .4)
x
[1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 4.6 5.0
(max(x) - x) + min(x)
[1] 5.0 4.6 4.2 3.8 3.4 3.0 2.6 2.2 1.8 1.4 1.0

Related

Can RStudio show number of changes done to an object after a line is run (as in Stata)?

In Stata, when changing values to variables (or other related operations), the output includes a comment regarding the number of changes. E.g:
Is there a way to obtain similar commentary in RStudio?
For instance, sometimes I want to check how many changes a command made (partly to see if command worked, or to count the extent of a potential problem in the data). Currently, I have to inspect the data manually or do a pretty uninformative comparison using all(), for instance.
Base R doesn't do this, but you could write a function to do it, and then instead of saying
x <- y
you'd say
x <- showChanges(x, y)
For example,
library(waldo)
showChanges <- function(oldval, newval) {
print(compare(oldval, newval))
newval
}
set.seed(123)
x <- 1:100
x <- showChanges(x, x + rbinom(100, size = 1, prob = 0.01))
#> `old[21:27]`: 21 22 23 24 25 26 27
#> `new[21:27]`: 21 22 23 25 25 26 27
x
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 25 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
Created on 2021-10-21 by the reprex package (v2.0.0)

Generate sequence with alternating increments in R? [duplicate]

This question already has answers here:
Get a seq() in R with alternating steps
(6 answers)
Closed 6 years ago.
I want to use R to create the sequence of numbers 1:8, 11:18, 21:28, etc. through 1000 (or the closest it can get, i.e. 998). Obviously typing that all out would be tedious, but since the sequence increases by one 7 times and then jumps by 3 I'm not sure what function I could use to achieve this.
I tried seq(1, 998, c(1,1,1,1,1,1,1,3)) but it does not give me the results I am looking for so I must be doing something wrong.
This is a perfect case of vectorisation( recycling too) in R. read about them
(1:100)[rep(c(TRUE,FALSE), c(8,2))]
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32
#[27] 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 61 62 63 64
#[53] 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96
#[79] 97 98
rep(seq(0,990,by=10), each=8) + seq(1,8)
You want to exclude numbers that are 0 or 9 (mod 10). So you can try this too:
n <- 1000 # upper bound
x <- 1:n
x <- x[! (x %% 10) %in% c(0,9)] # filter out (0, 9) mod (10)
head(x,80)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27
# 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85
# 86 87 88 91 92 93 94 95 96 97 98
Or in a single line using Filter:
Filter(function(x) !((x %% 10) %in% c(0,9)), 1:100)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# [48] 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96 97 98
With a cycle: for(value in c(seq(1,991,10))){vector <- c(vector,seq(value,value+7))}

How to write OR condition inside which in R

I am unable to figure out how can i write or condition inside which in R.
This statemnet does not work.
which(value>100 | value<=200)
I know it very basic thing but i am unable to find the right solution.
Every value is either larger than 100 or smaller-or-equal to 200. Maybe you need other numbers or & instead of |? Otherwise, there is no problem with that statement, the syntax is correct:
> value <- c(110, 2, 3, 4, 120)
> which(value>100 | value<=200)
[1] 1 2 3 4 5
> which(value>100 | value<=2)
[1] 1 2 5
> which(value>100 & value<=200)
[1] 1 5
> which(iris$Species == "setosa" | iris$Species == "virginica")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
does work. Remember to fully qualify the names of the variables you are selecting, as iris$Species in the example at hand (and not only Species).
Have a look at the documentation here.
Also notice that whatever you do with which can be generally done otherwise in a faster and better way.

Indexing a matrix with another matrix in R - index out of bounds

I am experiencing some strange behavior in R when trying to index a matrix with another matrix. I run into an error of subscript out of bounds with indexing with a 2 column matrix, but not with a four column matrix. See the following reproducible code. Any insight would be appreciated!
This
data <- matrix(rbinom(100, 1, .5), nrow = 10)
idx <- cbind(1:50, 51:100)
data[idx]
results in:
Error in data[idx] : subscript out of bounds
However
data[cbind(idx,idx)]
works.
My session info:
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.5.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)
The key insight as to why this is wrong isn't working is given in ?'[':
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
and it is clear when the subscript out of bounds error arises; data doesn't have 50 rows and 100 columns.
What's happening in the second example the indexing matrix is just being treated as a vector because it has more columns than the matrix being indexed has dimensions, and is extracting elements c(1:100, 1:100) from data.
This is more easily see with
m <- matrix(1:100, ncol = 10, byrow = TRUE)
and indexing with cbind(idx, idx) gives
> m[cbind(idx,idx)]
[1] 1 11 21 31 41 51 61 71 81 91 2 12 22 32 42 52 62 72
[19] 82 92 3 13 23 33 43 53 63 73 83 93 4 14 24 34 44 54
[37] 64 74 84 94 5 15 25 35 45 55 65 75 85 95 6 16 26 36
[55] 46 56 66 76 86 96 7 17 27 37 47 57 67 77 87 97 8 18
[73] 28 38 48 58 68 78 88 98 9 19 29 39 49 59 69 79 89 99
[91] 10 20 30 40 50 60 70 80 90 100 1 11 21 31 41 51 61 71
[109] 81 91 2 12 22 32 42 52 62 72 82 92 3 13 23 33 43 53
[127] 63 73 83 93 4 14 24 34 44 54 64 74 84 94 5 15 25 35
[145] 45 55 65 75 85 95 6 16 26 36 46 56 66 76 86 96 7 17
[163] 27 37 47 57 67 77 87 97 8 18 28 38 48 58 68 78 88 98
[181] 9 19 29 39 49 59 69 79 89 99 10 20 30 40 50 60 70 80
[199] 90 100
which is the same as
m[c(idx[,1], idx[,2], idx[,1], idx[,2])]
or specifically,
m[c(1:50, 51:100, 1:50, 51:100)]

Shingles with lattice package's equal.count()

Why does the equal.count() function create overlapping shingles when it is clearly possible to create groupings with no overlap. Also, on what basis are the overlaps decided?
For example:
equal.count(1:100,4)
Data:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
Intervals:
min max count
1 0.5 40.5 40
2 20.5 60.5 40
3 40.5 80.5 40
4 60.5 100.5 40
Overlap between adjacent intervals:
[1] 20 20 20
Wouldn't it be better to create groups of size 25 ? Or maybe I'm missing something that makes this functionality useful?
The overlap smooths transitions between the shingles (which, as the name says, overlap on the roof), but a better choice would have been to use some windowing function such as in spectral analysis.
I believe it is a pre-historic relic, because the behavior goes back to some very old pre-lattice code and is used in coplot remembered only by veteRans. lattice::equal.count calls co.intervals in graphics, where you will find some explanation. Try:
lattice:::equal.count(1:100,4,overlap=0)

Resources