Why does the equal.count() function create overlapping shingles when it is clearly possible to create groupings with no overlap. Also, on what basis are the overlaps decided?
For example:
equal.count(1:100,4)
Data:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
Intervals:
min max count
1 0.5 40.5 40
2 20.5 60.5 40
3 40.5 80.5 40
4 60.5 100.5 40
Overlap between adjacent intervals:
[1] 20 20 20
Wouldn't it be better to create groups of size 25 ? Or maybe I'm missing something that makes this functionality useful?
The overlap smooths transitions between the shingles (which, as the name says, overlap on the roof), but a better choice would have been to use some windowing function such as in spectral analysis.
I believe it is a pre-historic relic, because the behavior goes back to some very old pre-lattice code and is used in coplot remembered only by veteRans. lattice::equal.count calls co.intervals in graphics, where you will find some explanation. Try:
lattice:::equal.count(1:100,4,overlap=0)
Related
I am an R beginner but have thus far been able to find answers to my questions by googling. After a few days of searching I still can't figure this out though.
I have a dataset with cognitive test results. Most tests are scored so that higher scores are better. ONE test is scored in the opposite way, so that lower scores are better (completion time of the task). I want to combine three tests (so values from three columns in my dataframe) but first I need to flip the values of this one test.
By flip I mean that my lowest value (i.e. fastest completion time and best score) instead gets the highest value and that the highetst value (i.e. the slowest completion time and worst score) gets the lowest value. My data is numerical.
I have tried the dense_rank() function as well as the rev() function. dense_rank() returns a vector where the values are ranked but where the spread of the values are not kept and rev() only reverses the order of the values in the vector, it does not change the values themselves.
Example code:
> (.packages())
[1] "readxl" "rethinking" "parallel" "rstan" "StanHeaders" "uwIntroStats"
[7] "ggplot2" "dplyr" "quantreg" "SparseM" "foreign" "aod"
[13] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[19] "base"
> testresults <- seq(from = 12, to = 120, by = 2)
>
> testresults
[1] 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
[25] 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106
[49] 108 110 112 114 116 118 120
> test.frame <- data.frame(testresults, rev(testresults), rank(testresults))
> test.frame
testresults rev.testresults. rank.testresults.
1 12 120 1
2 14 118 2
3 16 116 3
4 18 114 4
5 20 112 5
6 22 110 6
7 24 108 7
8 26 106 8
9 28 104 9
10 30 102 10
11 32 100 11
12 34 98 12
13 36 96 13
14 38 94 14
15 40 92 15
16 42 90 16
17 44 88 17
18 46 86 18
19 48 84 19
20 50 82 20
21 52 80 21
22 54 78 22
23 56 76 23
24 58 74 24
25 60 72 25
26 62 70 26
27 64 68 27
28 66 66 28
29 68 64 29
30 70 62 30
31 72 60 31
32 74 58 32
33 76 56 33
34 78 54 34
35 80 52 35
36 82 50 36
37 84 48 37
38 86 46 38
39 88 44 39
40 90 42 40
41 92 40 41
42 94 38 42
43 96 36 43
44 98 34 44
45 100 32 45
46 102 30 46
47 104 28 47
48 106 26 48
49 108 24 49
50 110 22 50
51 112 20 51
52 114 18 52
53 116 16 53
54 118 14 54
55 120 12 55
I am sure I have overlooked a simple solution to this problem, thank you in advance to anyone who can help or point me in the right direction.
Best,
Maria
You can subtract your values from the maximum value and then add the minimum value. For example:
x <- seq(1, 5, by = .4)
x
[1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 4.6 5.0
(max(x) - x) + min(x)
[1] 5.0 4.6 4.2 3.8 3.4 3.0 2.6 2.2 1.8 1.4 1.0
I would like to take a random sample of rows from a data.frame, apply a function to the subset, then take a sample from the remaining rows, apply the function to the new subset (with different parameters), and so on.
A simple example would be if 5% of a population dies each month, in month 2 I need the population minus those ones who died in time month 1.
I have put together a very verbose method of doing this involving where I save the IDs from the sampled rows, then subset them out from the data for the second period, etc.
library(data.table)
dt <- data.table(Number=1:100, ID=paste0("A", 1:100))
first<-dt[sample(nrow(dt), nrow(dt)*.05)]$ID
mean(dt[ID %in% first]$Number)
second<-dt[!(ID %in% first)][sample(nrow(dt[!(ID %in% first)]),
nrow(dt[!(ID %in% first)])*.05)]$ID
mean(dt[ID %in% c(first,second)]$Number)
dt[!(ID %in% first)][!(ID %in% second)] #...
Obviously, this is not sustainable past a couple periods. What is the better way to do this? I imagine this is a standard method but couldn't think what to look for specifically. Thanks for any and all input.
This shows how to "grow" a vector of items that have been sampled at a 5% per interval time course:
removed <- numeric(0)
for ( i in 1:10){
removed <- c(removed, sample( (1:100)[!(1:100) %in% removed], # items out so far
(100-length(removed))*.05)) # 5% of remainder
cat(c(removed, "\n")) # print to console with each iteration.
}
54 1 76 96 93
54 1 76 96 93 81 16 13 79
54 1 76 96 93 81 16 13 79 80 74 30 29
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51 22 23 20
Notice that the actual number of items added to the list of "removals" will be decreasing.
This question already has answers here:
Get a seq() in R with alternating steps
(6 answers)
Closed 6 years ago.
I want to use R to create the sequence of numbers 1:8, 11:18, 21:28, etc. through 1000 (or the closest it can get, i.e. 998). Obviously typing that all out would be tedious, but since the sequence increases by one 7 times and then jumps by 3 I'm not sure what function I could use to achieve this.
I tried seq(1, 998, c(1,1,1,1,1,1,1,3)) but it does not give me the results I am looking for so I must be doing something wrong.
This is a perfect case of vectorisation( recycling too) in R. read about them
(1:100)[rep(c(TRUE,FALSE), c(8,2))]
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32
#[27] 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 61 62 63 64
#[53] 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96
#[79] 97 98
rep(seq(0,990,by=10), each=8) + seq(1,8)
You want to exclude numbers that are 0 or 9 (mod 10). So you can try this too:
n <- 1000 # upper bound
x <- 1:n
x <- x[! (x %% 10) %in% c(0,9)] # filter out (0, 9) mod (10)
head(x,80)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27
# 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85
# 86 87 88 91 92 93 94 95 96 97 98
Or in a single line using Filter:
Filter(function(x) !((x %% 10) %in% c(0,9)), 1:100)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# [48] 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96 97 98
With a cycle: for(value in c(seq(1,991,10))){vector <- c(vector,seq(value,value+7))}
I am unable to figure out how can i write or condition inside which in R.
This statemnet does not work.
which(value>100 | value<=200)
I know it very basic thing but i am unable to find the right solution.
Every value is either larger than 100 or smaller-or-equal to 200. Maybe you need other numbers or & instead of |? Otherwise, there is no problem with that statement, the syntax is correct:
> value <- c(110, 2, 3, 4, 120)
> which(value>100 | value<=200)
[1] 1 2 3 4 5
> which(value>100 | value<=2)
[1] 1 2 5
> which(value>100 & value<=200)
[1] 1 5
> which(iris$Species == "setosa" | iris$Species == "virginica")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
does work. Remember to fully qualify the names of the variables you are selecting, as iris$Species in the example at hand (and not only Species).
Have a look at the documentation here.
Also notice that whatever you do with which can be generally done otherwise in a faster and better way.
The getOption("max.print") can be used to limit the number of values that can be printed from a single function call. For example:
options(max.print=20)
print(cars)
prints only the first 10 rows of 2 columns. However, max.print doesn't work very well lists. Especially if they are nested deeply, the amount of lines printed to the console can still be infinite.
Is there any way to specify a harder cutoff of the amount that can be printed to the screen? For example by specifying the amount of lines after which the printing can be interrupted? Something that also protects against printing huge recursive objects?
Based in part on this question, I would suggest just building a wrapper for print that uses capture.output to regulate what is printed:
print2 <- function(x, nlines=10,...)
cat(head(capture.output(print(x,...)), nlines), sep="\n")
For example:
> print2(list(1:10000,1:10000))
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
[13] 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108