Read multiple files into single list using pipes - r

I have a number simple files with a single entry per line and I want to read those files into a list with the content of a file as a vector.
> list(file_one = c(1,2,3,4), file_two = c(9,99,999))
$file_one
[1] 1 2 3 4
$file_two
[1] 9 99 999
...
This is basically the resulting format i want.
What I have so far is a similar result, but not correct:
> list.files("/home/x/y/z", pattern="^rep.*List$", full.names=TRUE) %>% lapply(read.table)
[[1]]
V1
1 a
2 b
3 c
4 d
How can I read the data in the correct format or transform it from here? - preferably I would have a "pipeline" to read the data:
list files
read files in correct format or
format the read data into a list of named vectors

Perhaps you need something like this
library(tidyverse)
list.files("xyz/", full.names = TRUE) %>%
set_names(basename(.)) %>%
map(read_lines)
#> $`rep1List`
#> [1] "a" "b" "c" "d" "e" "f"
#>
#> $rep2List
#> [1] "e" "f" "g" "h" "i" "j" "k"
#>
#> $rep3List
#> [1] "l" "m" "m" "o" "p" "q" "r" "s"
where each of the files look like this:

based on the information you gave, I would try something like below, using the purrr-Package:
list.files("/home/x/y/z", pattern="^rep.*List$", full.names = TRUE) %>%
purrr::map_df(., read.table, ADD YOUR ARGUMENTS HERE)
This is working for a real-life example for me. It fails with your made up file. I would have just commented, but I am too low. ^^

Related

Get a list of directories in R in order

I'm trying to get a list of directories in R. I ran the following code and I sort of got what I wanted except for one hitch, R didn't list it in order. It shows me a numbered list but in the order of 1,3,5,7 etc and lists two folders beside each other. I want to know how to get a list with one folder name per line. I attached a picture for referenceenter image description here
This is the default in R to print a vector. If you want to format it otherwise, you need to use something like writeLines(list.dirs("c:/"))
# vectror printing
letters[1:10]
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
# formatted printing
writeLines(letters[1:10])
#> a
#> b
#> c
#> d
#> e
#> f
#> g
#> h
#> i
#> j

How to use a list as an argument in a function in R

myfunction<-function(x){if (x=="g"){g_var<-x g_nvar<-length(g_var)} return(g_nvar)}
I have written the above script to obtain specific elements out of a list. The argument x will be a list when I will call upon this function but R does not consider x as a list. How can I write a function such that when I provide a list, my output are the elements that I have specified in the function?
m
[[1]]
[[1]] [[1]]
[1] "g" "g" "h" "g" "g" "g" "k" "l"
[[2]]
[[2]] [[1]]
[1] "g" "h" "k" "k" "l" "g"
Expected result
[[1]] 5 # No. of g
[[2]] 2 # No. of g
Similarly I would like to obtain numbers for h,k and l also. I am putting m as x while calling the function.
For eg:- myfunction (m)
Your case is somewhat complicated by the fact that your m is not simply a list of character vectors, but, in your example, a list of 2 lists of 1 vector of characters, as would be generated by
m = list(strsplit("gghgggkl", ""), strsplit("ghkklg", ""))
If we want myfunction to operate on this data structure, we have to refer to the component of the length-1-lists with the operation [[1]] (see x[[1]] below), and, as loki suggested, we can use lapply to work on all components of the outer list, and sum with a logical expression to obtain the desired count:
myfunction = function(m) lapply(m, function(x) sum(x[[1]]=='g'))
myfunction(m)
result:
[[1]]
[1] 5
[[2]]
[1] 2

How to retain character strings using positional indexing?

What I need to do is very similar to what the function below does
x = c("abcde", "ghij", "klmnopq")
tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5), names=c('first','second','third'))
However, I would like to be able to return strings using ranges of values. For example, I would like to specify that in first I want to have the first two letters for each element.
Thus instead of having:
$first
[1] "a" "g" "k"
$second
[1] "c" "i" "m"
$third
[1] "e" NA "o"
The output should look like
$first
[1] "ab" "gh" "kl"
$second
[1] "c" "i" "m"
$third
[1] "e" NA "o"
Background:
I have a large .txt file of records and a lookup table that tells from which position to which position each attribute goes, and the expected max width from which position. The txt file looks like:
James Brown M 01-01-1970
And then in a separate file I have a lookup table that says:
Field Start width
Name 1 7
FamilyN 9 7
Gender 11 1
Incidentally, I would appreciate any feedback on the best way to import this type of large .txt file. I feel like read.table is inappropriate since it tries to reduce to a dataframe format which is not what these files really are.
Something like this maybe:
x = c("abcde", "ghij", "klmnopq")
library(tidyverse)
list(c(1,3,5), c(2,1,1)) %>%
pmap(~ substr(x, .x, .x + .y - 1) %>% replace(., .=="", NA))
[[1]]
[1] "ab" "gh" "kl"
[[2]]
[1] "c" "i" "m"
[[3]]
[1] "e" NA "o"
I've hardcoded the positions. Per #MrFlick's comment, if you have a large number of strings, you'll need some strategy for deciding on the character positions so that you can automate it, rather than hardcoding it.

R - How to get at a string from a single column and row in a data frame

So I'm trying to do these problems in R in order to learn it.
But I'm stuck on the first problem to simply count the frequency of charactors in a string. I can't even seem to get past loading the data and getting to the string :-(
How do I do something like print the first charactor of the string from this text file?
Here's what I've tried so far:
> rosalind_dna <- read.table("~/Downloads/rosalind_dna.txt", quote="")
Warning message:
In read.table("~/Downloads/rosalind_dna.txt", quote = "") :
incomplete final line found by readTableHeader on '~/Downloads/rosalind_dna.txt'
> viewData(rosalind_dna)
> str(rosalind_dna[1,1,1])
Factor w/ 1 level "GGCCCGGTTACTGCGACTGAACAATCAAAATCTGAAGCATTTAAGCCAAACCAATTGAGATCGACTTACGAGCGATAACCCAGTATATTCAAGTGCTACTGATGAGGCGTGGTCCCCTGGACAAGGC"| __truncated__: 1
What you've done so far is just fine.
read.table returns a data frame. In this case, you just get a data frame with a single column and only a single value in that column.
By default, R will convert character columns in data frames to factors. You can convert it back using as.character.
Then you'll simply want to split that single string into individual characters (strsplit) and then make a table (table). (No need for loops!)
Here's a toy example illustrating all the functions I mentioned:
> dat <- data.frame(V1 = factor("abcdfjtusje"))
> str(dat)
'data.frame': 1 obs. of 1 variable:
$ V1: Factor w/ 1 level "abcdfjtusje": 1
> x <- as.character(dat[1,1])
> x
[1] "abcdfjtusje"
> strsplit(x,"")
[[1]]
[1] "a" "b" "c" "d" "f" "j" "t" "u" "s" "j" "e"
> strsplit(x,"")[[1]]
[1] "a" "b" "c" "d" "f" "j" "t" "u" "s" "j" "e"
> table(strsplit(x,"")[[1]])
a b c d e f j s t u
1 1 1 1 1 1 2 1 1 1
>
I've copied the file in the link into /tmp/string.txt This file has just has a single line of:
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
We can read the file using the readLines command:
s = readLines("/tmp/string.txt")
The variable s is just a single string. To split up the bases, we use:
strsplit(s, "")
then tabulate using table:
table(strsplit(s, ""))
If you want to display the first character of the whole file you may act as follows:
s = readLines("Your file.txt",n=1)
substr(s, 1, 1)
To display the first character of every line:
s = readLines("Your file.txt")
substr(s, 1, 1)
To display n-th character of every line:
n = 5
s = readLines("Your file.txt")
substr(s, n, n)
you can use readLine and substr command to solve the problem, but if you insist to grep the first character from a datafram, simply, you can use
substr(dataframe$colname,1,1)
it will return a string vector.

what does '[[' mean in the function lapply(x, '[[', VarNames[[type]]) in R?

Can anyone tell me what [[ means in the function lapply(x, '[[', VarNames[[type]]) in R?
It's an extraction function. As #mnel notes, the help file at ?Extract will give you lots of information.
Here are a couple of examples using [[ and [ as functions as you would more normal looking base functions like sum table etc:
> test <- list(a=1:10,b=letters[1:10])
> test
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> "[["(test,1)
[1] 1 2 3 4 5 6 7 8 9 10
> "[["(test,2)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> "["(test,1)
$a
[1] 1 2 3 4 5 6 7 8 9 10
> "["(test,2)
$b
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
It is the function [[ which extracts single elements. See ?"[["
It is the same function you see at work in
VarNames[[type]]
That expression will cause each successive value of 'x' to be given to [[ as its first argument and for VarNames[[type]] to be evaluated and used as the second argument. The result should be a series of function calls of the form:
`[[`( x[[1]], VarNames[[type]] )
Notice I presented this as a functional form. The usual way of seeing this written for a first single case would be :
x[[1]][[ VarNames[[type]]) ]]
That second form gets parsed into the first form by the R interpreter.

Resources