I'm trying to find the difference between two columns in a CSV file, which I named Test.
I'd like to add a new column called 'Results' that contains the difference between Events_1 & Events_2. If there is no difference the Results can be blank.
This is a basic example, for what I'm trying to accomplish, the real list contains hundreds of events in both columns.
Not tested with your data, but
vec2 <- c("hello,goodbye","hello,goodbye")
vec1 <- c("hello","hello,goodbye")
Map(setdiff, strsplit(vec2, "[,\\s]+"), strsplit(vec1, "[,\\s]+"))
# [[1]]
# [1] "goodbye"
# [[2]]
# character(0)
If you need them to be comma-delimited strings, then
mapply(function(a,b) paste(setdiff(a,b), collapse=","), strsplit(vec2, "[,\\s]+"), strsplit(vec1, "[,\\s]+"))
# [1] "goodbye" ""
I have a character string that I have split into a list of smaller strings using strsplit. For example:
> full.seq <- "FZpcgK3VdAQzEFZpcAVdV8QM8ZpsEFZpacgGKi3VdVSQzEFZpcgGKAVdVRpEFKGIZpg13"
> full.seq
[1] "FZpcgK3VdAQzEFZpcAVdV8QM8ZpsEFZpacgGKi3VdVSQzEFZpcgGKAVdVRpEFKGIZpg13"
> sequences <- strsplit(full.seq, "cg")
> sequences
[[1]]
[1] "FZp" "K3VdAQzEFZpcAVdV8QM8ZpsEFZpa" "GKi3VdVSQzEFZp"
[4] "GKAVdVRpEFKGIZpg13"
I would like to give each of these new strings a unique, sequential name that I can still use to identify that they were from the same original string (for a later analysis I will do on these strings). For example, "ID.seq1", "ID.seq2", "ID.seq3" etc. I have tried doing this manually but receive this error:
> names(sequences) <- c("ID.seq1", "ID.seq2", "ID.seq3", "ID.seq4")
Error in names(sequences) <- c("ID.seq1", "ID.seq2", "ID.seq3", "ID.seq4") :
'names' attribute [4] must be the same length as the vector [1]
I would also like an automated way of doing this though, as I will need to label up to 30 new strings from a number of original strings. Any suggestions?
First of all, if you want a character vector, you will have to subset the list, because strsplit returns a list. After doing that, you can easily assign names to that vector of terms.
full.seq <- "FZpcgK3VdAQzEFZpcAVdV8QM8ZpsEFZpacgGKi3VdVSQzEFZpcgGKAVdVRpEFKGIZpg13"
sequences <- strsplit(full.seq, "cg")[[1]]
names(sequences) <- paste0("ID.seq", c(1:4))
sequences
ID.seq1 ID.seq2
"FZp" "K3VdAQzEFZpcAVdV8QM8ZpsEFZpa"
ID.seq3 ID.seq4
"GKi3VdVSQzEFZp" "GKAVdVRpEFKGIZpg13"
Answer by Tim is perfect. I just want to add if you want to keep your list and name elements of each item:
full.seq <- "FZpcgK3VdAQzEFZpcAVdV8QM8ZpsEFZpacgGKi3VdVSQzEFZpcgGKAVdVRpEFKGIZpg13"
full.seq
sequences <- strsplit(full.seq, "cg")
names(sequences[[1]]) <- paste("ID.seq",1:4,sep="")
I am trying to name a nested list. This would be one of the several lists in my nested list:
paths_list[i]
[[1]]
[[1]]$CLASS
[1] "Signal Transduction (Saccharomyces cerevisiae)"
[[1]]$GENES
[1] "YPR165W"
[[1]]$ORGANISM
[1] "Saccharomyces cerevisiae"
Basically what I want to do is to put an ID name for example R-SCE-198203 as the main name for the list (so above $CLASS it should appear the name R-SCE-198203). List paths_list[i] to have the name R-SCE-198203.
I want this:
paths_list[i]
[[1]]R-SCE-198203
[[1]]$CLASS
[1] "Signal Transduction (Saccharomyces cerevisiae)"
[[1]]$GENES
[1] "YPR165W"
[[1]]$ORGANISM
[1] "Saccharomyces cerevisiae"
I have searched and the closest I have found was with lapply but you ends up like this:
setNames(lapply(tabs, setNames, varB), varA)
#$varA1
#$varA1$varB1
#[1] "integer"
#
#$varA1$varB2
#[1] "integer"
# ...
I want to avoid the main ID to appear in every element of the list (do not want $varA1 being repeated all the time).
Is that possible?
Thanks in advance
I think your lapply approach is already what you want. The "true" names of the sub-elements do not have the $ signs. The output in the console when you print a full list object shows these signs to help you read the data, however, if you access the individual sub-element via [[]] their names do not have these signs. Maybe the following code example helps. Check the outputs.
a_list <- list("dummy")
names(a_list) <- "dummyname"
a_list <- list(a_list)
names(a_list) <- "name"
a_list
#$name
#$name$dummyname
#[1] "dummy"
names(a_list)
#[1] "name"
a_list[[1]]
#$dummyname
#[1] "dummy"
names(a_list[[1]])
#[1] "dummyname"
I want to read the synonyms from a csv file , where the first word is the "main" word and the rest of the words in the same record are its synonyms
now i basically want to create a list like i would have in R ,
**synonyms <- list(
list(word="ss", syns=c("yy","yyss")),
list(word="ser", syns=c("sert","sertyy","serty"))
)**
This gives me a list as
synonyms
[[1]]
[[1]]$word
[1] "ss"
[[1]]$syns
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$syns
[1] "sert" "sertyy" "serty"
which is essentially a list of lists of "word" and "syns".
how do i go about creating the similar list while reading the word and synonyms from a csv file
any pointers would help !! Thanks
This process should return what you want.
# read in data using readLines
myStuff <- readLines(textConnection(temp))
This will return a character vector with one element per line in the file. Note that textConnection is not necessary for reading in files. Just supply the file path. Now, split each vector element into a vectors using strsplit and return a list.
myList <- strsplit(myStuff, split=" ")
Now, separate the first element from the remaining element for each vector within the list.
result <- lapply(myList, function(x) list(word=x[1], synonyms=x[-1]))
This returns the desired result. We use lapply to move through the list items. For each list item, we return a named list where the first element, named word, corresponds to the first element of the vector that is the list item and the remaining elements of this vector are placed in a second list element called synonyms.
result
[[1]]
[[1]]$word
[1] "ss"
[[1]]$synonyms
[1] "yy" "yyss"
[[2]]
[[2]]$word
[1] "ser"
[[2]]$synonyms
[1] "sert" "sertyy" "serty"
[[3]]
[[3]]$word
[1] "at"
[[3]]$synonyms
[1] "ate" "ater" "ates"
[[4]]
[[4]]$word
[1] "late"
[[4]]$synonyms
[1] "lated" "lates" "latee"
data
temp <-
"ss yy yyss
ser sert sertyy serty
at ate ater ates
late lated lates latee"
I have a list in R with the following elements:
[[812]]
[1] "" "668" "12345_s_at" "667" "4.899777748"
[6] "49.53333333" "10.10930207" "1.598228663" "5.087437057"
[[813]]
[1] "" "376" "6789_at" "375" "4.899655078"
[6] "136.3333333" "27.82508792" "2.20223398" "5.087437057"
[[814]]
[1] "" "19265" "12351_s_at" "19264" "4.897730912"
[6] "889.3666667" "181.5874908" "1.846451572" "5.087437057"
I know I can access them with something like list_elem[[814]][3] in case that I want to extract the third element of the position 814.
I need to extract the third element of all the list, for example 12345_s_at, and I want to put them in a vector or list so I can compare their elements to another list later on. Below is my code:
elem<-(c(listdata))
lp<-length(elem)
for (i in 1:lp)
{
newlist<-c(listdata[[i]][3]) ###maybe to put in a vector
print(newlist)
}
When I print the results I get the third element, but like this:
[1] "1417365_a_at"
[1] "1416336_s_at"
[1] "1416044_at"
[1] "1451201_s_at"
so I cannot traverse them with an index like newlist[3], because it returns NA. Where is my mistake?
If you want to extract the third element of each list element you can do:
List <- list(c(1:3), c(4:6), c(7:9))
lapply(List, '[[', 3) # This returns a list with only the third element
unlist(lapply(List, '[[', 3)) # This returns a vector with the third element
Using your example and taking into account #GSee comment you can do:
yourList <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
sapply(yourList, '[[', 3)
[1] "12345_s_at" "6789_at" "12351_s_at"
Next time you can provide some data using dput on a portion of your dataset so we can reproduce your problem easily.
With purrr you can extract elements and ensure data type consistency:
library(purrr)
listdata <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
map_chr(listdata, 3)
## [1] "12345_s_at" "6789_at" "12351_s_at"
There are other map_ functions that enforce the type consistency as well and a map_df() which can finally help end the do.call(rbind, …) madness.
In case you wanted to use the code you typed in your question, below is the fix:
listdata <- list(c("","668","12345_s_at","667", "4.899777748","49.53333333",
"10.10930207", "1.598228663","5.087437057"),
c("","376", "6789_at", "375", "4.899655078","136.3333333",
"27.82508792", "2.20223398", "5.087437057"),
c("", "19265", "12351_s_at", "19264", "4.897730912",
"889.3666667", "181.5874908","1.846451572","5.087437057" ))
v <- character() #creates empty character vector
list_len <- length(listdata)
for(i in 1:list_len)
v <- c(v, listdata[[i]][3]) #fills the vector with list elements (not efficient, but works fine)
print(v)
[1] "12345_s_at" "6789_at" "12351_s_at"