Stop printing after n number of lines - r

The getOption("max.print") can be used to limit the number of values that can be printed from a single function call. For example:
options(max.print=20)
print(cars)
prints only the first 10 rows of 2 columns. However, max.print doesn't work very well lists. Especially if they are nested deeply, the amount of lines printed to the console can still be infinite.
Is there any way to specify a harder cutoff of the amount that can be printed to the screen? For example by specifying the amount of lines after which the printing can be interrupted? Something that also protects against printing huge recursive objects?

Based in part on this question, I would suggest just building a wrapper for print that uses capture.output to regulate what is printed:
print2 <- function(x, nlines=10,...)
cat(head(capture.output(print(x,...)), nlines), sep="\n")
For example:
> print2(list(1:10000,1:10000))
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
[13] 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108

Related

How do I flip numerical data in R?

I am an R beginner but have thus far been able to find answers to my questions by googling. After a few days of searching I still can't figure this out though.
I have a dataset with cognitive test results. Most tests are scored so that higher scores are better. ONE test is scored in the opposite way, so that lower scores are better (completion time of the task). I want to combine three tests (so values from three columns in my dataframe) but first I need to flip the values of this one test.
By flip I mean that my lowest value (i.e. fastest completion time and best score) instead gets the highest value and that the highetst value (i.e. the slowest completion time and worst score) gets the lowest value. My data is numerical.
I have tried the dense_rank() function as well as the rev() function. dense_rank() returns a vector where the values are ranked but where the spread of the values are not kept and rev() only reverses the order of the values in the vector, it does not change the values themselves.
Example code:
> (.packages())
[1] "readxl" "rethinking" "parallel" "rstan" "StanHeaders" "uwIntroStats"
[7] "ggplot2" "dplyr" "quantreg" "SparseM" "foreign" "aod"
[13] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[19] "base"
> testresults <- seq(from = 12, to = 120, by = 2)
>
> testresults
[1] 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
[25] 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106
[49] 108 110 112 114 116 118 120
> test.frame <- data.frame(testresults, rev(testresults), rank(testresults))
> test.frame
testresults rev.testresults. rank.testresults.
1 12 120 1
2 14 118 2
3 16 116 3
4 18 114 4
5 20 112 5
6 22 110 6
7 24 108 7
8 26 106 8
9 28 104 9
10 30 102 10
11 32 100 11
12 34 98 12
13 36 96 13
14 38 94 14
15 40 92 15
16 42 90 16
17 44 88 17
18 46 86 18
19 48 84 19
20 50 82 20
21 52 80 21
22 54 78 22
23 56 76 23
24 58 74 24
25 60 72 25
26 62 70 26
27 64 68 27
28 66 66 28
29 68 64 29
30 70 62 30
31 72 60 31
32 74 58 32
33 76 56 33
34 78 54 34
35 80 52 35
36 82 50 36
37 84 48 37
38 86 46 38
39 88 44 39
40 90 42 40
41 92 40 41
42 94 38 42
43 96 36 43
44 98 34 44
45 100 32 45
46 102 30 46
47 104 28 47
48 106 26 48
49 108 24 49
50 110 22 50
51 112 20 51
52 114 18 52
53 116 16 53
54 118 14 54
55 120 12 55
I am sure I have overlooked a simple solution to this problem, thank you in advance to anyone who can help or point me in the right direction.
Best,
Maria
You can subtract your values from the maximum value and then add the minimum value. For example:
x <- seq(1, 5, by = .4)
x
[1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 4.2 4.6 5.0
(max(x) - x) + min(x)
[1] 5.0 4.6 4.2 3.8 3.4 3.0 2.6 2.2 1.8 1.4 1.0

How to write OR condition inside which in R

I am unable to figure out how can i write or condition inside which in R.
This statemnet does not work.
which(value>100 | value<=200)
I know it very basic thing but i am unable to find the right solution.
Every value is either larger than 100 or smaller-or-equal to 200. Maybe you need other numbers or & instead of |? Otherwise, there is no problem with that statement, the syntax is correct:
> value <- c(110, 2, 3, 4, 120)
> which(value>100 | value<=200)
[1] 1 2 3 4 5
> which(value>100 | value<=2)
[1] 1 2 5
> which(value>100 & value<=200)
[1] 1 5
> which(iris$Species == "setosa" | iris$Species == "virginica")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
does work. Remember to fully qualify the names of the variables you are selecting, as iris$Species in the example at hand (and not only Species).
Have a look at the documentation here.
Also notice that whatever you do with which can be generally done otherwise in a faster and better way.

Indexing a matrix with another matrix in R - index out of bounds

I am experiencing some strange behavior in R when trying to index a matrix with another matrix. I run into an error of subscript out of bounds with indexing with a 2 column matrix, but not with a four column matrix. See the following reproducible code. Any insight would be appreciated!
This
data <- matrix(rbinom(100, 1, .5), nrow = 10)
idx <- cbind(1:50, 51:100)
data[idx]
results in:
Error in data[idx] : subscript out of bounds
However
data[cbind(idx,idx)]
works.
My session info:
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.5.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)
The key insight as to why this is wrong isn't working is given in ?'[':
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
and it is clear when the subscript out of bounds error arises; data doesn't have 50 rows and 100 columns.
What's happening in the second example the indexing matrix is just being treated as a vector because it has more columns than the matrix being indexed has dimensions, and is extracting elements c(1:100, 1:100) from data.
This is more easily see with
m <- matrix(1:100, ncol = 10, byrow = TRUE)
and indexing with cbind(idx, idx) gives
> m[cbind(idx,idx)]
[1] 1 11 21 31 41 51 61 71 81 91 2 12 22 32 42 52 62 72
[19] 82 92 3 13 23 33 43 53 63 73 83 93 4 14 24 34 44 54
[37] 64 74 84 94 5 15 25 35 45 55 65 75 85 95 6 16 26 36
[55] 46 56 66 76 86 96 7 17 27 37 47 57 67 77 87 97 8 18
[73] 28 38 48 58 68 78 88 98 9 19 29 39 49 59 69 79 89 99
[91] 10 20 30 40 50 60 70 80 90 100 1 11 21 31 41 51 61 71
[109] 81 91 2 12 22 32 42 52 62 72 82 92 3 13 23 33 43 53
[127] 63 73 83 93 4 14 24 34 44 54 64 74 84 94 5 15 25 35
[145] 45 55 65 75 85 95 6 16 26 36 46 56 66 76 86 96 7 17
[163] 27 37 47 57 67 77 87 97 8 18 28 38 48 58 68 78 88 98
[181] 9 19 29 39 49 59 69 79 89 99 10 20 30 40 50 60 70 80
[199] 90 100
which is the same as
m[c(idx[,1], idx[,2], idx[,1], idx[,2])]
or specifically,
m[c(1:50, 51:100, 1:50, 51:100)]

Finding files present in directory

An analysis which I ran produced around 500 files which are named file1 to file500
However, some files in between are missing (such as file233 and file245 as well as others). I would like to further process them in a loop in R but then I would need to filter out the files which are not present.
Is there an easy way to store the number after file in a vector in R which I can then use for the loop?
v<-containing all numbers after file which are present in the directory
Should have mentioned that the files do not have the ending .txt but are just names fileXX where the XX is the number
The best way is to simply create a list of the files that are actually present in the directory, like #beginneR said:
list_of_files = list.files('/path/to/dir')
do_some_processing = function(list_element) {
# Perform some processing and return something
}
lapply(list_of_files, do_some_processing)
If you need the numbers in the filename, a simple regular expression will do:
> grep('[0-9]', sprintf('file%d', 1:100))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100

Shingles with lattice package's equal.count()

Why does the equal.count() function create overlapping shingles when it is clearly possible to create groupings with no overlap. Also, on what basis are the overlaps decided?
For example:
equal.count(1:100,4)
Data:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
Intervals:
min max count
1 0.5 40.5 40
2 20.5 60.5 40
3 40.5 80.5 40
4 60.5 100.5 40
Overlap between adjacent intervals:
[1] 20 20 20
Wouldn't it be better to create groups of size 25 ? Or maybe I'm missing something that makes this functionality useful?
The overlap smooths transitions between the shingles (which, as the name says, overlap on the roof), but a better choice would have been to use some windowing function such as in spectral analysis.
I believe it is a pre-historic relic, because the behavior goes back to some very old pre-lattice code and is used in coplot remembered only by veteRans. lattice::equal.count calls co.intervals in graphics, where you will find some explanation. Try:
lattice:::equal.count(1:100,4,overlap=0)

Resources