Two Column R Dataframe to Named LIst [duplicate] - r

This question already has answers here:
Named List To/From Data.Frame
(4 answers)
Closed 2 years ago.
I am trying to convert a two-column dataframe to a named list. There are several solutions on StackOverflow where every value in the first column becomes the 'name', but I am looking to collapse the values in column 2 into common values in column 1.
For example, the list should look like the following:
# Create a Named list of keywords associated with each file.
fileKeywords <- list(fooBar.R = c("A","B","C"),
driver.R = c("A","F","G"))
Where I can retrieve all keywords for "fooBar.R" using:
# Get the keywords for a named file
fileKeywords[["fooBar.R"]]
My data frame looks like:
df <- read.table(header = TRUE, text = "
file keyWord
'fooBar.R' 'A'
'fooBar.R' 'B'
'fooBar.R' 'C'
'driver.R' 'A'
'driver.R' 'F'
'driver.R' 'G'
")
I'm sure there is a simple solution that I am missing.

You could use unstack:
as.list(unstack(rev(df)))
$driver.R
[1] "A" "F" "G"
$fooBar.R
[1] "A" "B" "C"
This is equivalent to as.list(unstack(df, keyWord~file))

We can use stack in base R
stack(fileKeywords)[2:1]
if it is the opposite, then we can do
with(df, tapply(keyWord, file, FUN = I))
-output
#$driver.R
#[1] "A" "F" "G"
#$fooBar.R
#[1] "A" "B" "C"

Related

R remove repetitive strings [duplicate]

This question already has answers here:
How to delete multiple values from a vector?
(9 answers)
Closed 3 years ago.
I want to remove repetitve strings in R.
I simplified my situation and tried two things.
#1 remove a vector
x=c("a","a","b","c","d")
x[-(x=="a")]
I expected it would remove all "a"s but the result is
[1] "a" "b" "c"
Secondly, I tried "NULL"
x[x=="a"]=NULL
But there was an error :
Error in x[x == "a"] = NULL : replacement has length zero
How can I remove repetitve strings? In this situation, removing all "a"s and print
[1] "b" "c" "d"
?
If the intention is to remove 'a' as it is repeated, then use table to get the frequency of elements and based on that subset the string with %in% and negate (!)
x[!x %in% names(which(table(x) > 1))]
#[1] "b" "c" "d"
Or using duplicated
x[!(duplicated(x)|duplicated(x, fromLast = TRUE))]
Or if it is based on the adjacent elements that are repeated, use rle
with(rle(x), values[lengths ==1])
#[1] "b" "c" "d"
NOTE: All the above removes the elements programmatically instead of manual checks
From the OP's comments, it is clear that they want to remove only specific elements that are known to be duplicates. In that case,
x[! x %in% c("a")]
Here, we use %in% as == can only be used for comparing a single element

Modifying the suffix of a specific range of columns in R

I have a data frame with N number of columns. In this case 25 and would like to change the suffix only from colum variables 15 to 30.
t0 is the dataframe with the 30 column variables
For all the variable 1 to 30, the following command works perfect:
t0<-data.frame(a=c(1),b=c(1),c=c(1),d=c(1),e=c(1),f=c(1),g=c(1),h=c(1))
colnames(t0) <- paste( colnames(t0), "Sub",sep = "_")
names(t0)
[1] "a_Sub" "b_Sub" "c_Sub" "d_Sub" "e_Sub"
[6] "f_Sub" "g_Sub" "h_Sub" "i_Sub" "ii_Sub"
[15] "j_Sub" "k_Sub" "l_Sub" "m_Sub" "n_Sub"
Desired output:
names(t0)
[1] "a" "b" "c" "d" "e"
[6] "f" "g" "h" "i" "ii"
[15] "j_Sub" "k_Sub" "l_Sub" "m_Sub" "n_Sub"
Any idea how to get this done in R?
Thanks,
Albit
The reason why it didn't work was due to subsetting the dataset and then get the column names. Instead, we can directly get the column names of the entire dataset and subset the columns with numeric index
colnames(t0)[15:30] <- paste(colnames(t0)[15:30], "Sub", sep="_")

Regex Matching Negative values

I'm trying to create some simple and easy to write content-clusters with multiple regexes.
Imagine a list of strings: c("a","b","ac")
The groups I need to define are "All: a's" and "All: b's". So the values "a" and "ac" are "A" and "b" is "B".
myDF$contentGroup <- sub(".*a.*", "A", myDF$stringList)
However this will result in a column within my dataframe "contentGroup" which contains the value of "stringList" if no match occured. So if I do the same line of code with "B" it will overwrite the "A"s.
myDF$contentGroup <- sub(".*b.*", "B", myDF$stringList)
I just cant figure out how to do simple clustering in a single line of code. Making it as simple as possible.
You can use grep to match 'a' and 'b', and replace as follows,
x[grep('a', x, fixed = TRUE)] <- 'A'
x[grep('b', x, fixed = TRUE)] <- 'B'
x
#[1] "A" "B" "A"

Create dataframe from list of lists in R [duplicate]

This question already has answers here:
R list of lists to data.frame
(6 answers)
Closed 6 years ago.
I have a variable out that is a list of lists and I want to format the first child list to a dataframe. Say my out looks like this:
[[1]]
[[1]]$id
[1] "1"
[[1]]$input
[1] "A" "B" "C"
[[2]]
[[2]]$id
[1] "2"
[[2]]$input
[1] "R" "S" "T"
class(out) and class(out[[1]]) confirms that this is a list of lists.
I want to create a "long" dataframe that should look like this:
id input
1 "A"
1 "B"
1 "C"
2 "R"
2 "S"
2 "T"
I tried:
lapply(out, function(x){
as.data.frame(x)
})
but this seems to do an cbind and creates new columns for each child list.
Any help is highly appreciated.
try
library(plyr)
ldply(out, as.data.frame)

R - How to get at a string from a single column and row in a data frame

So I'm trying to do these problems in R in order to learn it.
But I'm stuck on the first problem to simply count the frequency of charactors in a string. I can't even seem to get past loading the data and getting to the string :-(
How do I do something like print the first charactor of the string from this text file?
Here's what I've tried so far:
> rosalind_dna <- read.table("~/Downloads/rosalind_dna.txt", quote="")
Warning message:
In read.table("~/Downloads/rosalind_dna.txt", quote = "") :
incomplete final line found by readTableHeader on '~/Downloads/rosalind_dna.txt'
> viewData(rosalind_dna)
> str(rosalind_dna[1,1,1])
Factor w/ 1 level "GGCCCGGTTACTGCGACTGAACAATCAAAATCTGAAGCATTTAAGCCAAACCAATTGAGATCGACTTACGAGCGATAACCCAGTATATTCAAGTGCTACTGATGAGGCGTGGTCCCCTGGACAAGGC"| __truncated__: 1
What you've done so far is just fine.
read.table returns a data frame. In this case, you just get a data frame with a single column and only a single value in that column.
By default, R will convert character columns in data frames to factors. You can convert it back using as.character.
Then you'll simply want to split that single string into individual characters (strsplit) and then make a table (table). (No need for loops!)
Here's a toy example illustrating all the functions I mentioned:
> dat <- data.frame(V1 = factor("abcdfjtusje"))
> str(dat)
'data.frame': 1 obs. of 1 variable:
$ V1: Factor w/ 1 level "abcdfjtusje": 1
> x <- as.character(dat[1,1])
> x
[1] "abcdfjtusje"
> strsplit(x,"")
[[1]]
[1] "a" "b" "c" "d" "f" "j" "t" "u" "s" "j" "e"
> strsplit(x,"")[[1]]
[1] "a" "b" "c" "d" "f" "j" "t" "u" "s" "j" "e"
> table(strsplit(x,"")[[1]])
a b c d e f j s t u
1 1 1 1 1 1 2 1 1 1
>
I've copied the file in the link into /tmp/string.txt This file has just has a single line of:
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
We can read the file using the readLines command:
s = readLines("/tmp/string.txt")
The variable s is just a single string. To split up the bases, we use:
strsplit(s, "")
then tabulate using table:
table(strsplit(s, ""))
If you want to display the first character of the whole file you may act as follows:
s = readLines("Your file.txt",n=1)
substr(s, 1, 1)
To display the first character of every line:
s = readLines("Your file.txt")
substr(s, 1, 1)
To display n-th character of every line:
n = 5
s = readLines("Your file.txt")
substr(s, n, n)
you can use readLine and substr command to solve the problem, but if you insist to grep the first character from a datafram, simply, you can use
substr(dataframe$colname,1,1)
it will return a string vector.

Resources