I've been looking around for quite a while now, but can't seem to solve this problem, although I feel like it should be an easy one.
I have 54 factors containing differing amounts of strings, names of pathways to be exact. For example, here are two factors with the elements they contain:
> PWe1
[1] Gene_Expression
[2] miR-targeted_genes_in_muscle_cell_-_TarBase
[3] Generic_Transcription_Pathway
> PWe2
[1] miR-targeted_genes_in_epithelium_-_TarBase
[2] miR-targeted_genes_in_leukocytes_-_TarBase
[3] miR-targeted_genes_in_lymphocytes_-_TarBase
[4] miR-targeted_genes_in_muscle_cell_-_TarBase
What I would like to do is take these, and combine them into one big data frame with 54 columns, where each column has the names of one corresponding factor. I've tried cbind, cbind.data.frame and a couple of other options but those return numeric values instead of strings.
Expected output:
PWe1 PWe2
Gene_Expression miR-targeted_genes_in_epithelium_-_TarBase
miR-targeted_genes_in_muscle_cell_-_TarBase miR-targeted_genes_in_leukocytes_-_TarBase
Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
NA miR-targeted_genes_in_muscle_cell_-_TarBase
I'm quite a beginner when it comes to R, could anyone nudge me towards a possible solution?
Thanks in advance!
lst <- mget(ls(pattern="PW")) #<--- Create list with all necessary vectors.
ind <- lengths(lst) #<--- find maximum length
as.data.frame(do.call(cbind,
lapply(lst, `length<-`, max(ind)))) #<--- Convert to data.frmae
# PWe1 PWe2
# 1 Gene_Expression miR-targeted_genes_in_epithelium_-_TarBase
# 2 miR-targeted_genes_in_muscle_cell_-_TarBase miR-targeted_genes_in_leukocytes_-_TarBase
# 3 Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
# 4 <NA> miR-targeted_genes_in_muscle_cell_-_TarBase
l1 <- max(length(v1), length(v2))
length(v1) <- l1
length(v2) <- l1
cbind(as.character(v1), as.character(v2))
# [,1] [,2]
#[1,] "Gene_Expression" "miR-#targeted_genes_in_epithelium_-_TarBase"
#[2,] "miR-targeted_genes_in_muscle_cell_-_TarBase" "miR-#targeted_genes_in_leukocytes_-_TarBase"
#[3,] "Generic_Transcription_Pathway" "miR-#targeted_genes_in_lymphocytes_-_TarBase"
#[4,] NA "miR-#targeted_genes_in_muscle_cell_-_TarBase"
If you convert your factors to characters before you use cbind, you don't get numeric values:
testFrame <- data.frame(cbind(as.character(PWe1), as.character(PWe3))
If the length of both vectors differs, cbind throws a warning and elements of the shorter vectors will be replicated. If that is unsatisfying in your case, maybe a data.frame object might not be the right choice?
Related
This question already has answers here:
Merge Two Lists in R
(9 answers)
Merge contents within list of list by duplicate name
(1 answer)
Closed 3 years ago.
So I'm heavily simplifying my actual problem, but I am trying to find a way to append values inside vectors from one list, to values in vectors in another list, and do it by name ( assuming the two lists are not ordered).So this is the setup to the problem ( the numbers themselves are arbitrary here):
Data1 <- list( c(1),c(2),c(3))
names(Data1) <- c("A", "B","C")
Data2 <- list(c(11), c(12), c(13))
names(Data2) <- c("B","A","C")
Now what Im trying to do, is find a way to get a third list - say Data3, so that calling Data3[["A"]] will give me the same result as calling c(1,12):
[1] 1 12
so >Data3 should give:
[1] 1 12
[2] 2 11
[3] 3 13
Essentially im looking to append many values from one list of vectors, to another list of vectors, and do it by names rather than order, if that makes sense. (I did think about trying some loops, but I feel like there should be another way that is simpler)
nm = names(Data1)
setNames(lapply(nm, function(x){
c(Data1[[x]], Data2[[x]])
}), nm)
#$A
#[1] 1 12
#$B
#[1] 2 11
#$C
#[1] 3 13
list(do.call("cbind", list(Data1, Data2)))
[,1] [,2]
A 1 11
B 2 12
C 3 13
If you don't mind your output to be a dataframe:
Data3 <- rbind(data.frame(Data1), data.frame(Data2))
Then Data3[["A"]] will give you:
[1] 1 12
We can use Map and arrange the elements of Data2 in the same order as Data1 (or vice versa) using names and then concatenate them.
Map(c, Data1, Data2[names(Data1)])
#$A
#[1] 1 12
#$B
#[1] 2 11
#$C
#[1] 3 13
I'm trying to create a calculator that multiplies permutation groups written in cyclic form (the process of which is described in this post, for anyone unfamiliar: https://math.stackexchange.com/questions/31763/multiplication-in-permutation-groups-written-in-cyclic-notation). Although I know this would be easier to do with Python or something else, I wanted to practice writing code in R since it is relatively new to me.
My gameplan for this is take an input, such as "(1 2 3)(2 4 1)" and split it into two separate lists or vectors. However, I am having trouble starting this because from my understanding of character functions (which I researched here: https://www.statmethods.net/management/functions.html) I will ultimately have to use the function grep() to find the points where ")(" occur in my string to split from there. However, grep only takes vectors for its argument, so I am trying to coerce my string into a vector. In researching this problem, I have mostly seen people suggest to use as.integer(unlist(str_split())), however, this doesn't work for me as when I split, not everything is an integer and the values become NA, as seen in this example.
library(tidyverse)
x <- "(1 2 3)(2 4 1)"
x <- as.integer(unlist(str_split(x," ")))'
x
Is there an alternative way to turn a string into a vector when there are not just integers involved? I also realize that the means by which I am trying to split up the two permutations is very roundabout, but that is because of the character functions that I researched this seems like the only way. If there are other functions that would make this easier, please let me know.
Thank you!
Comments in the code.
x <- "(1 2 3)(2 4 1)"
out1 <- strsplit(x, split = ")(", fixed = TRUE)[[1]] # split on close and open bracket
out2 <- gsub("[\\(|\\)]", replacement = "", out1) # remove brackets
out3 <- strsplit(out2, " ") # tease out numbers between spaces
lapply(out3, as.integer)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 4 1
There aren't really any scalars on R. Single values like 1, TRUE, and "a" are all 1-element vectors. grep(pattern, x) will work fine on your original string. As a starting point for getting towards your desired goal, I would suggest splitting the groups using:
> str_extract_all(x, "\\([0-9 ]+\\)")
[[1]]
[1] "(1 2 3)" "(2 4 1)"
If we need to split the strings with the brackets
strsplit(x, "(?<=\\))(?=\\()", perl = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
Or we can use convenient wrapper from qdapRegex
library(qdapRegex)
ex_round(x, include.marker = TRUE)[[1]]
#[1] "(1 2 3)" "(2 4 1)"
alternative: using library(magrittr)
x <- "(1 2 3)(2 4 1)"
x %>%
gsub("^\\(","c(",.) %>% gsub("\\)\\(","),c(",.) %>% gsub("(?=\\s\\d)",", ",.,perl=T) %>%
paste0("list(",.,")") %>% {eval(parse(text=.))}
result:
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 2 4 1
You could use chartr with read.table :
read.table(text= chartr("()"," \n",x))
# V1 V2 V3
# 1 1 2 3
# 2 2 4 1
What is the difference between matrix(unlist(DF[1,])) and matrix(DF[1,]) where DF is my dataframe. How does unlist() help here?
DF[1,] will extract the first row of the data.frame. This row is still a data.frame, a type of list. unlist() will convert it to a vector that can be made into a matrix. If you don't use unlist, the you can still make a matrix, but it is a matrix of the elements of the list, rather than of the elements of a vector. For example,
> cars[1,]
speed dist
1 4 2
> a <- matrix(cars[1,])
> b <- matrix(unlist(cars[1,]))
> a[,1]
[[1]]
[1] 4
[[2]]
[1] 2
> b[,1]
[1] 4 2
I have a list with 100 vectors indexed by:
[[1]]
[1]
[[2]]
[1]
[[3]]
[1]
.
.
.
[[100]]
[1]
Each vector has 3 entries.
I would like to apply a function separately for each of the vectors. The function returns a single number for each vector so the result of the apply() would be a 100 element vector.
How can this be done using apply?
I know I can use apply for matrices by indexing 1 or 2 depending on row or column but can it also be used for lists?
You are looking for sapply:
l <- list(1:3, 2:4, 5:7)
sapply(l, sum)
# [1] 6 9 18
This answer might help you in the future.
I have a csv that looks like
Deamon,Host,1:2:4,aaa.03
Pixe,Paradigm,1:3:5,11.us
I need to read this into a dataframe for analysis but the 3rd column in my data is separated by : and need to be read like a set or list 1.e splitted by : so that it returns (1,2,4) . Is it possible to have a columns that has a class list in R . Or How best do you think i can approach this problem.
You can use strsplit to split a character vector into a list of components:
x <- c("1:2:4", "1:3:5")
strsplit(x, split=":")
[[1]]
[1] "1" "2" "4"
[[2]]
[1] "1" "3" "5"
As noted above, the answer will vary depending on if the number of separators in the columns are consistent or not. The answer is more straight forward if that number is consistent. Here's one answer to do that building off of Andrie's strsplit answer:
dat <- read.csv("yourData.csv", header=FALSE, stringsAsFactors = FALSE)
#If always going to be a consistent number of separators
dat <- cbind(dat, do.call("rbind", strsplit(dat[, 3], ":")))
V1 V2 V3 V4 1 2 3
1 Deamon Host 1:02:04 aaa.03 1 02 04
2 Pixe Paradigm 1:03:05 11.us 1 03 05
Note that the above is essentially how colsplit.character from package reshape is implemented and may be a better option for you as it forces you to give proper names.
If the number of separators is different, then using rbind.fill is an option from package plyr. rbind.fill expects data.frames which was a bit annoying, and I couldn't figure out how to get a one row data.frame without first converting to a matrix, so I imagine this can be made more efficient, but here's the basic idea:
library(plyr)
x <- c("1:2:4", "1:3:5:6:7")
rbind.fill(
lapply(
lapply(strsplit(x, ":"), matrix, nrow = 1)
, as.data.frame)
)
V1 V2 V3 V4 V5
1 1 2 4 <NA> <NA>
2 1 3 5 6 7
Which can then be cbinded as shown above.
Try using gsub to replace that character:
R> str <- "1:2:4"
R> str
[1] "1:2:4"
R> gsub(":", ",", str)
[1] "1,2,4"
Make sure the column is a string not a factor beforehand.