Finding files present in directory - r

An analysis which I ran produced around 500 files which are named file1 to file500
However, some files in between are missing (such as file233 and file245 as well as others). I would like to further process them in a loop in R but then I would need to filter out the files which are not present.
Is there an easy way to store the number after file in a vector in R which I can then use for the loop?
v<-containing all numbers after file which are present in the directory
Should have mentioned that the files do not have the ending .txt but are just names fileXX where the XX is the number

The best way is to simply create a list of the files that are actually present in the directory, like #beginneR said:
list_of_files = list.files('/path/to/dir')
do_some_processing = function(list_element) {
# Perform some processing and return something
}
lapply(list_of_files, do_some_processing)
If you need the numbers in the filename, a simple regular expression will do:
> grep('[0-9]', sprintf('file%d', 1:100))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100

Related

Problems with column labels after importing a csv file

I'm trying to import an anova data set csv file into R using the read.csv function. When I import it the columns are labelled X........ Even though the csv file the column labels are clearly person, gender etc
I don't know why this is. I've copied the code below. Any help would be appreciated
read.csv("/Users/Desktop/R /anova data set.csv")
X.......
1 ;Person;gender;Age;Height;pre.weight;Diet;weight6weeks
2 ;25; ;41;171;60;2;60
3 ;26; ;32;174;103;2;103
4 ;1;0;22;159;58;1;54.2
5 ;2;0;46;192;60;1;54
6 ;3;0;55;170;64;1;63.3
7 ;4;0;33;171;64;1;61.1
8 ;5;0;50;170;65;1;62.2
9 ;6;0;50;201;66;1;64
10 ;7;0;37;174;67;1;65
11 ;8;0;28;176;69;1;60.5
12 ;9;0;28;165;70;1;68.1
13 ;10;0;45;165;70;1;66.9
14 ;11;0;60;173;72;1;70.5
15 ;12;0;48;156;72;1;69
16 ;13;0;41;163;72;1;68.4
17 ;14;0;37;167;82;1;81.1
18 ;27;0;44;174;58;2;60.1
19 ;28;0;37;172;58;2;56
20 ;29;0;41;165;59;2;57.3
21 ;30;0;43;171;61;2;56.7
22 ;31;0;20;169;62;2;55
23 ;32;0;51;174;63;2;62.4
24 ;33;0;31;163;63;2;60.3
25 ;34;0;54;173;63;2;59.4
26 ;35;0;50;166;65;2;62
27 ;36;0;48;163;66;2;64
28 ;37;0;16;165;68;2;63.8
29 ;38;0;37;167;68;2;63.3
30 ;39;0;30;161;76;2;72.7
31 ;40;0;29;169;77;2;77.5
32 ;52;0;51;165;60;3;53
33 ;53;0;35;169;62;3;56.4
34 ;54;0;21;159;64;3;60.6
35 ;55;0;22;169;65;3;58.2
36 ;56;0;36;160;66;3;58.2
37 ;57;0;20;169;67;3;61.6
38 ;58;0;35;163;67;3;60.2
39 ;59;0;45;155;69;3;61.8
40 ;60;0;58;141;70;3;63
41 ;61;0;37;170;70;3;62.7
42 ;62;0;31;170;72;3;71.1
43 ;63;0;35;171;72;3;64.4
44 ;64;0;56;171;73;3;68.9
45 ;65;0;48;153;75;3;68.7
46 ;66;0;41;157;76;3;71
47 ;15;1;39;168;71;1;71.6
48 ;16;1;31;158;72;1;70.9
49 ;17;1;40;173;74;1;69.5
50 ;18;1;50;160;78;1;73.9
51 ;19;1;43;162;80;1;71
52 ;20;1;25;165;80;1;77.6
53 ;21;1;52;177;83;1;79.1
54 ;22;1;42;166;85;1;81.5
55 ;23;1;39;166;87;1;81.9
56 ;24;1;40;190;88;1;84.5
57 ;41;1;51;191;71;2;66.8
58 ;42;1;38;199;75;2;72.6
59 ;43;1;54;196;75;2;69.2
60 ;44;1;33;190;76;2;72.5
61 ;45;1;45;160;78;2;72.7
62 ;46;1;37;194;78;2;76.3
63 ;47;1;44;163;79;2;73.6
64 ;48;1;40;171;79;2;72.9
65 ;49;1;37;198;79;2;71.1
66 ;50;1;39;180;80;2;81.4
67 ;51;1;31;182;80;2;75.7
68 ;67;1;36;155;71;3;68.5
69 ;68;1;47;179;73;3;72.1
70 ;69;1;29;166;76;3;72.5
71 ;70;1;37;173;78;3;77.5
72 ;71;1;31;177;78;3;75.2
73 ;72;1;26;179;78;3;69.4
74 ;73;1;40;179;79;3;74.5
75 ;74;1;35;183;83;3;80.2
76 ;75;1;49;177;84;3;79.9
77 ;76;1;28;164;85;3;79.7
78 ;77;1;40;167;87;3;77.8
79 ;78;1;51;175;88;3;81.9
colnames(aov)
[1] "X......."

How to write OR condition inside which in R

I am unable to figure out how can i write or condition inside which in R.
This statemnet does not work.
which(value>100 | value<=200)
I know it very basic thing but i am unable to find the right solution.
Every value is either larger than 100 or smaller-or-equal to 200. Maybe you need other numbers or & instead of |? Otherwise, there is no problem with that statement, the syntax is correct:
> value <- c(110, 2, 3, 4, 120)
> which(value>100 | value<=200)
[1] 1 2 3 4 5
> which(value>100 | value<=2)
[1] 1 2 5
> which(value>100 & value<=200)
[1] 1 5
> which(iris$Species == "setosa" | iris$Species == "virginica")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
does work. Remember to fully qualify the names of the variables you are selecting, as iris$Species in the example at hand (and not only Species).
Have a look at the documentation here.
Also notice that whatever you do with which can be generally done otherwise in a faster and better way.

Fixing inconsistent spacing after ## in output of knitted document

Working in RStudio, I am using knitr to create pdf files with chunks of code. In the following example, notice how in the output, spacing after the ## characters is different across the three vectors:
This looks pretty neat, but I am writing a document with examples having only one line of output and I'd like to have all the [1]'s properly in line with one another.
In the example, that would mean removing an extra space after the ##'s for the second vector. I am only starting to work with knitr and latex, so I'm not sure how I would achieve such a thing. Some sort of post-processing of the .tex? Or maybe something simpler?
This is not a knitr problem but arises from R's printing:
> 1:5
[1] 1 2 3 4 5
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> 1:100
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
Post processing would stop your output looking like it would from R.
I'd work on getting that fixed in base R (if it is really a bug and not intended) rather than try to special case this. An RDevel email with the above example (confirmed in a recent R - the above was with 3.0.x-patched) would help you clarify (if) you need a work around.
To focus attention, consider (From #Dominic Comtois' comment)
> 20:28
[1] 20 21 22 23 24 25 26 27 28
> 20:29
[1] 20 21 22 23 24 25 26 27 28 29
why does adding a tenth element change the way R prints the vector?
This may not necessarily be an ideal solution, but I hope it will vaguely suit your needs after some tweaks.
I've defined an "adjusted" print function:
print_adj <- function(x, adjpos=6, width=3) {
# capture output
con <- textConnection("text", open="w")
sink(con)
print(format(x, width=width), quote=FALSE)
sink()
close(con)
library(stringr)
pos <- str_locate(text, fixed("]"))
for (i in seq_along(text))
text[i] <- str_c(str_dup(" ", adjpos-pos[i,1]), text[i])
cat(text, sep="\n")
}
It prints a vector x in such a way that:
the square bracket ] always occurs in the given text column
each element occupies exactly width text columns
Sample output:
> print_adj(1:5)
[1] 1 2 3 4 5
> print_adj(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> print_adj(1:100)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
[57] 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
If you'd like to use this function in a knitr chunk, try:
<<eval=2,echo=1>>=
print(1:100) # don't eval
print_adj(1:100) # don't print this cmd
#
I was able to solve my problem by defining a hook, as Gavin Simpson suggested.
\documentclass{article}
\begin{document}
<<setup, include=FALSE>>=
require(stringr)
hook.out <- function(x, options)
return(str_c("\\begin{verbatim}",
sub("\\s+\\[1\\]\\s+"," [1] ",x),
"\\end{verbatim}"))
knit_hooks$set(output=hook.out)
#
<<>>=
1:9
1:10
#
\end{document}
Output now looks like this:
My only remaining concern is that for longer vectors, I will need to bypass the hook and I don't know how to do that.
Credits also go to Rod Alence for his example on this page.

Stop printing after n number of lines

The getOption("max.print") can be used to limit the number of values that can be printed from a single function call. For example:
options(max.print=20)
print(cars)
prints only the first 10 rows of 2 columns. However, max.print doesn't work very well lists. Especially if they are nested deeply, the amount of lines printed to the console can still be infinite.
Is there any way to specify a harder cutoff of the amount that can be printed to the screen? For example by specifying the amount of lines after which the printing can be interrupted? Something that also protects against printing huge recursive objects?
Based in part on this question, I would suggest just building a wrapper for print that uses capture.output to regulate what is printed:
print2 <- function(x, nlines=10,...)
cat(head(capture.output(print(x,...)), nlines), sep="\n")
For example:
> print2(list(1:10000,1:10000))
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
[13] 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108

Shingles with lattice package's equal.count()

Why does the equal.count() function create overlapping shingles when it is clearly possible to create groupings with no overlap. Also, on what basis are the overlaps decided?
For example:
equal.count(1:100,4)
Data:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
Intervals:
min max count
1 0.5 40.5 40
2 20.5 60.5 40
3 40.5 80.5 40
4 60.5 100.5 40
Overlap between adjacent intervals:
[1] 20 20 20
Wouldn't it be better to create groups of size 25 ? Or maybe I'm missing something that makes this functionality useful?
The overlap smooths transitions between the shingles (which, as the name says, overlap on the roof), but a better choice would have been to use some windowing function such as in spectral analysis.
I believe it is a pre-historic relic, because the behavior goes back to some very old pre-lattice code and is used in coplot remembered only by veteRans. lattice::equal.count calls co.intervals in graphics, where you will find some explanation. Try:
lattice:::equal.count(1:100,4,overlap=0)

Resources