I have created a list like the following one that contains all combinations of a specific character inside a string. The code that creates the list is as follows :
library(stringr)
test = str_locate_all("TTEST" , "T")
ind1 = lapply( lapply(1:nrow(test[[1]]), combn , x=test[[1]][,1]) , t )
ind1[[1]] = rbind(ind1[[1]], 0 )
and the list that I'm getting looks like
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 5
[4,] 0
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 5
[3,] 2 5
[[3]]
[,1] [,2] [,3]
[1,] 1 2 5
what I want now is to combine/collapse the columns (where ever are more than one) and unlist the whole object in order to create a final vector that will look like c(1, 2, 5, 0, 1:2, 1:5, 2:5, 1:2:5 ) and be able to use it with expand.grid() function later.
Tried to solve it with the following code partially but ":" character went on different position than the wanted.
do.call(paste, c( as.data.frame(ind1[[2]]) ,collapse=":") )
[1] "1 2:1 5:2 5"
Here is an idea via base R where we convert the list elements to data frames and use do.call to paste them, i.e.
unlist(lapply(ind1, function(i) do.call(paste, c(as.data.frame(i), sep = ':'))))
#[1] "1" "2" "5" "0" "1:2" "1:5" "2:5" "1:2:5"
Related
Hi I'm trying to extract some information from a chemical formula and add them to a pre-existing table on r. Currently I have a column that have chemical formulas as shown (C4H8O2). I have no problem extracting each element and its corresponding number. However I have a problem when brackets are involved in the formula, such as C3[13]C1H8O2. I want the title to say 13[C] and the input be 1. However my code doesn't recognize '[13]C1' so it gives me an error.
Any suggestions would be great.
#First manipuation - extracting information out of the "Composition" column, into seperated columns for each element
data2 <- dataframe%>%mutate(Composition=gsub("\\b([A-Za-z]+)\\b","\\11",Composition),
name=str_extract_all(Composition,"[A-Za-z]+"),
value=str_extract_all(Composition,"\\d+"))%>%
unnest()%>%spread(name,value,fill=0)
I already have a pre-made csv file that has the table organized and I made that into a data frame, so now I'm just trying to parce out the elements with the the 'C' column and '[13]C' column and their corresponding number.
The following regular expression should extract the isotope number, the element, and the number of atoms.
library(stringr)
str_match_all( "C3[13]C1H8O2", "(\\[[0-9]+\\])?([A-Za-z]+)([0-9]+)" )
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] "C3" NA "C" "3"
## [2,] "[13]C1" "[13]" "C" "1"
## [3,] "H8" NA "H" "8"
## [4,] "O2" NA "O" "2"
With a data.frame:
library(tidyr)
library(dplyr)
d <- data.frame( Composition = c( "H2O1", "C3[13]C1H8O2" ) )
pattern <- "(\\[[0-9]+\\])?([A-Za-z]+)([0-9]+)"
d %>%
mutate( Details = lapply( str_match_all( Composition, pattern ), as.data.frame ) ) %>%
unnest() %>%
transmute(
Composition,
element = paste0( ifelse(is.na(V2),"",V2), V3 ),
number = V4
) %>%
spread(key="element", value="number") %>%
replace(., is.na(.), 0)
## Composition [13]C C H O
## 1 C3[13]C1H8O2 1 3 8 2
## 2 H2O1 0 0 2 1
I have a list of lists of strings as follows:
> ll
[[1]]
[1] "2" "1"
[[2]]
character(0)
[[3]]
[1] "1"
[[4]]
[1] "1" "8"
The longest list is of length 2, and I want to build a data frame with 2 columns from this list. Bonus points for also converting each item in the list to a number or NA for character(0). I have tried using mapply() and data.frame to convert to a data frame and fill with NA's as follows.
# Find length of each list element
len = sapply(awards2, length)
# Number of NAs to fill for column shorter than longest
len = 2 - len
df = data.frame(mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len))
However, I do not get a data frame with 2 columns (and NA's as fillers) using the code above.
Thanks for the help.
We can use stri_list2matrix from stringi. As the list elements are all character vectors, it seems okay to use this function
library(stringi)
t(stri_list2matrix(ll))
# [,1] [,2]
#[1,] "2" "1"
#[2,] NA NA
#[3,] "1" NA
#[4,] "1" "8"
If we need to convert to data.frame, wrap it with as.data.frame
I read these:
https://stackoverflow.com/a/5159049/1175496
Matrices are for data of the same type.
https://stackoverflow.com/q/29732279/1175496
Vectors (and so matrix) can accept only one type of data
If matrix can only accept one data type, why can I do this:
> m_list<-matrix(list('1',2,3,4),2,2)
> m_list
[,1] [,2]
[1,] "1" 3
[2,] 2 4
The console output looks like I am combining character and integer data types.
The console output looks similar to this matrix:
> m_vector<-matrix(1:4,2,2)
> m_vector
[,1] [,2]
[1,] 1 3
[2,] 2 4
When I assign to m_list, it doesn't coerce the other values (as in https://stackoverflow.com/q/29732279/1175496 )
> m_list[2,2] <-'4'
> m_list
[,1] [,2]
[1,] "1" 3
[2,] 2 "4"
OK here is what I gather from replies so far:
Question
How can I have a matrix with different types?
Answer
You cannot; the elements are not different types; all (4) elements of this matrix are lists
all(
is.list(m_list[1,1]),
is.list(m_list[2,1]),
is.list(m_list[1,2]),
is.list(m_list[2,2]))
#[1] TRUE
Question
But I constructed matrix like this: matrix(list('1',2,3,4),2,2), how did this become a matrix of (4) lists, rather than a matrix of (4) characters, or even (4) integers?
Answer
I'm not sure. Even though the documentation says re: the first argument to matrix:
Non-atomic classed R objects are coerced by as.vector and all
attributes discarded.
It seems these are identical
identical(as.vector(list('1',2,3,4)), list('1',2,3,4))
#[1] TRUE
Question
But I assign a character ('4') to an element of m_list, how does that work?
m_list[2,2] <-'4'
Answer
It is "coerced", as if you did this:
m_list[2,2] <- as.list('4')
Question
If the elements in m_list are lists, is m_list equivalent to matrix(c(list('1'),list(2),list(3),list(4)),2,2)?
Answer
Yes, these are equivalent:
m_list <- matrix(list('1',2,3,4),2,2)
m_list2 <- matrix(c(list('1'),list(2),list(3),list(4)),2,2)
identical(m_list, m_list2)
#[1] TRUE
Question
So how can I retrieve the typeof the '1' hidden in m_list[1,1]?
Answer
At least two ways:
typeof(m_list[1,1][[1]])
#[1] "character"
...or, can directly do this (thanks, Frank) (since indexing has this "is applied in turn to the list, the selected component, the selected component of that component, and so on" behavior)...
typeof(m_list[[1,1]])
#[1] "character"
Question
How can I tell the difference between these two
m1 <- matrix(c(list(1), list(2), list(3), list(4)), 2, 2)
m2 <- matrix(1:4, 2, 2)
Answer
If you are using RStudio,
m1 is described as List of 4
m2 is described as int [1:2, 1:2] 1 2 3 4
..or else, just use typeof(), which for vectors and matrices, identifies the type of their elements... (thanks, Martin)
typeof(m1)
#[1] "list"
typeof(m2)
#[1] "integer"
class can also help distinguish, but you must wrap the matrices in vectors first:
#Without c(...)
class(m1)
#[1] "matrix"
class(m2)
#[1] "matrix"
#With c(...)
class(c(m1))
#[1] "list"
class(c(m2))
#[1] "integer"
...you could tell a subtle difference in the console output; notice how the m2 (containing integers) right-aligns its elements (because numerics are usually right-aligned)...
m1
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
m2
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
Short-Answer: Matrices in R cannot contain different data types. All data have to or will be transformed into either logical, numerical, character or list.
Matrices always contain the same type. If input data to matrix() have different data types, they will automatically transformed into the same type. Thus, all data will be either logical, numerical, character or list. And here is your case, in your example all elements are being transformed into individual lists.
> myList <- list('1',2,3,4)
> myMatrix <- matrix( myList ,2,2)
> myMatrix
[,1] [,2]
[1,] "1" 3
[2,] 2 4
> typeof(myMatrix)
"list"
If you want to transformed completely your data from a list, you need to unlist the data.
> myList <- list('1',2,3,4)
> myMatrix <- matrix( unlist(myList) ,2,2)
> myMatrix
[,1] [,2]
[1,] "1" "3"
[2,] "2" "4"
> typeof(myMatrix)
"character"
Picking up the comments, verify yourself:
typeof(m_list)
typeof(m_list[2,2])
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
i have a column of strings that looks like this :
Topology=o5-22i34-56o74-96i117-139o159-181i210-232o247-269i
Topology=o4-26i35-57o77-99i119-138o161-183i216-238o248-270i
Topology=o4-21i32-54o69-91i112-134o156-178i215-237o252-271i
Topology=i20-42o65-84i105-127o158-180i212-234o249-271i
Topology=o5-27i39-61o76-98i118-140o151-173i194-213o
I want to get the first number after equal sign and the last number in the string. the output should be something like
5,269
4,270
4,271
20,271
5,213
v <- c("Topology=o5-22i34-56o74-96i117-139o159-181i210-232o247-269i",
"Topology=o4-26i35-57o77-99i119-138o161-183i216-238o248-270i",
"Topology=o4-21i32-54o69-91i112-134o156-178i215-237o252-271i",
"Topology=i20-42o65-84i105-127o158-180i212-234o249-271i",
"Topology=o5-27i39-61o76-98i118-140o151-173i194-213o")
sub("^Topology=.(\\d+)-.*-(\\d+).*$", "\\1,\\2", v)
# [1] "5,269" "4,270" "4,271" "20,271" "5,213"
or
r <- regexec("^Topology=.(\\d+)-.*-(\\d+).*$", v)
m <- regmatches(v, m)
(mat <- do.call(rbind, lapply(m, "[", 2:3)))
# [,1] [,2]
# [1,] "5" "269"
# [2,] "4" "270"
# [3,] "4" "271"
# [4,] "20" "271"
# [5,] "5" "213"
Finally, if you want numeric data (instead of character/string data):
apply(mat, 2, as.numeric)
# [,1] [,2]
# [1,] 5 269
# [2,] 4 270
# [3,] 4 271
# [4,] 20 271
# [5,] 5 213
1) This assumes s is a character vector with one such string per component. The following one-liner extracts all strings of digits, turning each such string to numeric and then takes the first and last of each line. Finally it reshapes it into a matrix which is transposed. fn$sapply allows us to use the formula notation for the function at the end:
> library(gsubfn)
> t(fn$sapply(strapply(s, "\\d+", as.numeric), ~ c(head(x, 1), tail(x, 1))))
[,1] [,2]
[1,] 5 269
[2,] 4 270
[3,] 4 271
[4,] 20 271
[5,] 5 213
2) If we want exactly a vector of comma separated strings then modify it to be:
> fn$sapply(strapply(s, "\\d+"), ~ sprintf("%s,%s", head(x, 1), tail(x, 1)))
[1] "5,269" "4,270" "4,271" "20,271" "5,213"
3) Here is yet another variation. It gives a matrix of character strings:
> strapplyc(s, "(\\d+).*\\D(\\d+)", simplify = rbind)
[,1] [,2]
[1,] "5" "269"
[2,] "4" "270"
[3,] "4" "271"
[4,] "20" "271"
[5,] "5" "213"
4) Here is a variation of the second solution that does not use gsubfn. (A non-gsubfn solution could be derived from the first solution in a similar manner.)
> sapply(strsplit(s,"\\D+"),
+ function(x) sprintf("%s,%s", head(Filter(nzchar, x), 1), tail(x, 1)))
[1] "5,269" "4,270" "4,271" "20,271" "5,213"
The first 3 solutions use the gsubfn package and all but the third use only simple regular expressions "\\d+" or "\\D+".
A slightly more general solution:
sub("^\\D*(\\d+)-.*-(\\d+)\\D*$", "\\1,\\2", v)
I've got a list called res that looks like this:
[[1]]
[,1] [,2]
[1,] 275.0637 273.9386
[2,] 5.707791 5.755798
[[2]]
[,1] [,2]
[1,] 126.8435 59.08806
[2,] 4.867521 3.258545
[[3]]
[,1] [,2]
[1,] 23.50188 60.96321
[2,] 2.036354 3.737291
The list contains results from a simulation run a total of 6 times. I set a parameter of interest at three different values, '0' (ie., [[1]]), '25' (i.e.,[[2]]), and '50' (i.e.,[[3]]). Since the model includes a great deal of randomness I ran the model twice for each value (i.e., [,1], [,2]). I asked the model to record two results, 'time feeding' (i.e., [1,] and 'distance traveled' (i.e., [2,]) for each iteration. Ultimately I will iterate the model 30 times for each variable setting. I'd like to use ggplot to create a boxplot showing 'time feeding' and 'distance traveled' for each of the three simulation settings (i.e., 0,25,50). I believe ggplot can't plot a list so I tried to convert res to a dataframe using res2 <- data.frame(res) which looked like:
X1 X2 X1.1 X2.1 X1.2 X2.2
1 275.0637 273.9386 126.8435 59.08806 23.50188 60.96321
2 5.707791 5.755798 4.867521 3.258545 2.036354 3.737291
This doesn't quite look right to me because now the results from all three simulations are on the same row. Any help on bringing this data into ggplot to create a boxplot with would be really helpful. Thanks in advance!
--Neil
Assuming ll is your list , you can use do.call and rbind like this :
do.call(rbind,lapply(seq_along(ll),
function(x)data.frame(ll[[x]],iter=x)))
X..1. X..2. iter
[1,] 275.063700 273.938600 1
[2,] 5.707791 5.755798 1
[1,]1 126.843500 59.088060 2
[2,]1 4.867521 3.258545 2
[1,]2 23.501880 60.963210 3
[2,]2 2.036354 3.737291 3
EDIT after op clarication:
interest <- c(0,25,50)
do.call(rbind,lapply(seq_along(ll),
function(x)data.frame(x= unlist(ll[[x]]),interst=interest[x])))
interst=interest[x] .... [TRUNCATED]
x interst
X..1.1 275.063700 0
X..1.2 5.707791 0
X..2.1 273.938600 0
X..2.2 5.755798 0
X..1.11 126.843500 25
X..1.21 4.867521 25
X..2.11 59.088060 25
X..2.21 3.258545 25
X..1.12 23.501880 50
X..1.22 2.036354 50
X..2.12 60.963210 50
X..2.22 3.737291 50
EDIT since OP don't provide data here ll :
res <- list(read.table(text='
[,1] [,2]
[1,] 275.0637 273.9386
[2,] 5.707791 5.755798'),
read.table(text='
[,1] [,2]
[1,] 126.8435 59.08806
[2,] 4.867521 3.258545'),
read.table(text='
[,1] [,2]
[1,] 23.50188 60.96321
[2,] 2.036354 3.737291'))
I would do
names(res) = c("0", "25", "50")
m = reshape2::melt(res, id = 1)
but maybe it doesn't work, I tried it in my head because you didn't provide data in usable form.