select a part of string in R [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
i have a column of strings that looks like this :
Topology=o5-22i34-56o74-96i117-139o159-181i210-232o247-269i
Topology=o4-26i35-57o77-99i119-138o161-183i216-238o248-270i
Topology=o4-21i32-54o69-91i112-134o156-178i215-237o252-271i
Topology=i20-42o65-84i105-127o158-180i212-234o249-271i
Topology=o5-27i39-61o76-98i118-140o151-173i194-213o
I want to get the first number after equal sign and the last number in the string. the output should be something like
5,269
4,270
4,271
20,271
5,213

v <- c("Topology=o5-22i34-56o74-96i117-139o159-181i210-232o247-269i",
"Topology=o4-26i35-57o77-99i119-138o161-183i216-238o248-270i",
"Topology=o4-21i32-54o69-91i112-134o156-178i215-237o252-271i",
"Topology=i20-42o65-84i105-127o158-180i212-234o249-271i",
"Topology=o5-27i39-61o76-98i118-140o151-173i194-213o")
sub("^Topology=.(\\d+)-.*-(\\d+).*$", "\\1,\\2", v)
# [1] "5,269" "4,270" "4,271" "20,271" "5,213"
or
r <- regexec("^Topology=.(\\d+)-.*-(\\d+).*$", v)
m <- regmatches(v, m)
(mat <- do.call(rbind, lapply(m, "[", 2:3)))
# [,1] [,2]
# [1,] "5" "269"
# [2,] "4" "270"
# [3,] "4" "271"
# [4,] "20" "271"
# [5,] "5" "213"
Finally, if you want numeric data (instead of character/string data):
apply(mat, 2, as.numeric)
# [,1] [,2]
# [1,] 5 269
# [2,] 4 270
# [3,] 4 271
# [4,] 20 271
# [5,] 5 213

1) This assumes s is a character vector with one such string per component. The following one-liner extracts all strings of digits, turning each such string to numeric and then takes the first and last of each line. Finally it reshapes it into a matrix which is transposed. fn$sapply allows us to use the formula notation for the function at the end:
> library(gsubfn)
> t(fn$sapply(strapply(s, "\\d+", as.numeric), ~ c(head(x, 1), tail(x, 1))))
[,1] [,2]
[1,] 5 269
[2,] 4 270
[3,] 4 271
[4,] 20 271
[5,] 5 213
2) If we want exactly a vector of comma separated strings then modify it to be:
> fn$sapply(strapply(s, "\\d+"), ~ sprintf("%s,%s", head(x, 1), tail(x, 1)))
[1] "5,269" "4,270" "4,271" "20,271" "5,213"
3) Here is yet another variation. It gives a matrix of character strings:
> strapplyc(s, "(\\d+).*\\D(\\d+)", simplify = rbind)
[,1] [,2]
[1,] "5" "269"
[2,] "4" "270"
[3,] "4" "271"
[4,] "20" "271"
[5,] "5" "213"
4) Here is a variation of the second solution that does not use gsubfn. (A non-gsubfn solution could be derived from the first solution in a similar manner.)
> sapply(strsplit(s,"\\D+"),
+ function(x) sprintf("%s,%s", head(Filter(nzchar, x), 1), tail(x, 1)))
[1] "5,269" "4,270" "4,271" "20,271" "5,213"
The first 3 solutions use the gsubfn package and all but the third use only simple regular expressions "\\d+" or "\\D+".

A slightly more general solution:
sub("^\\D*(\\d+)-.*-(\\d+)\\D*$", "\\1,\\2", v)

Related

R: preserving 1-row / -column matrix [duplicate]

This question already has an answer here:
Is there anything wrong with using T & F instead of TRUE & FALSE?
(1 answer)
Closed 4 years ago.
Given a matrix with one row, one column, or one cell, I need to reorder the rows while keeping the matrix structure. I tried adding drop=F but it doesn't work! What did I do?
test = matrix(letters[1:5]) # is a matrix
test[5:1,,drop=F] # not a matrix
test2 = matrix(letters[1:5],nrow=1) # is a matrix
test2[1:1,,drop=F] # not a matrix
test3 = matrix(1) # is a matrix
test3[1:1,,drop=F] # not a matrix
I'd guess it was an overwritten F; F can be set as a variable, in which case it's no longer false. Always write out FALSE fully, it can't be set as a variable.
See Is there anything wrong with using T & F instead of TRUE & FALSE?
Also the R Inferno, section 8.1.32, is a good reference.
> F <- 1
> test = matrix(letters[1:5]) # is a matrix
> test[5:1,,drop=F] # not a matrix
[1] "e" "d" "c" "b" "a"
> test[5:1,,drop=FALSE] # but this is a matrix
[,1]
[1,] "e"
[2,] "d"
[3,] "c"
[4,] "b"
[5,] "a"
> rm(F)
> test[5:1,,drop=F] # now a matrix again
[,1]
[1,] "e"
[2,] "d"
[3,] "c"
[4,] "b"
[5,] "a"
The code in your question works fine in a fresh R session:
test = matrix(letters[1:5]) # is a matrix
result = test[5:1,,drop=F]
result
# [,1]
# [1,] "e"
# [2,] "d"
# [3,] "c"
# [4,] "b"
# [5,] "a"
class(result) # still a matrix
# [1] "matrix"
dim(result)
# [1] 5 1
Even on the 1x1 matrix:
test3 = matrix(1) # is a matrix
result3 = test3[1:1,,drop=F]
class(result3)
# [1] "matrix"
dim(result3)
# [1] 1 1
Maybe you've loaded other packages that are overriding the default behavior? What makes you think you don't end up with a matrix?
The following works:
test <- matrix(test[5:1,, drop = F], nrow = 5, ncol = 1)
When you use is.matrix to test it, the output is a matrix. At the same time, you specify the number of rows (nrow) and number of columns (ncol) to coerce it to the number of rows and columns you require.

merging matrix columns that exists inside a numerical list

I have created a list like the following one that contains all combinations of a specific character inside a string. The code that creates the list is as follows :
library(stringr)
test = str_locate_all("TTEST" , "T")
ind1 = lapply( lapply(1:nrow(test[[1]]), combn , x=test[[1]][,1]) , t )
ind1[[1]] = rbind(ind1[[1]], 0 )
and the list that I'm getting looks like
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 5
[4,] 0
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 5
[3,] 2 5
[[3]]
[,1] [,2] [,3]
[1,] 1 2 5
what I want now is to combine/collapse the columns (where ever are more than one) and unlist the whole object in order to create a final vector that will look like c(1, 2, 5, 0, 1:2, 1:5, 2:5, 1:2:5 ) and be able to use it with expand.grid() function later.
Tried to solve it with the following code partially but ":" character went on different position than the wanted.
do.call(paste, c( as.data.frame(ind1[[2]]) ,collapse=":") )
[1] "1 2:1 5:2 5"
Here is an idea via base R where we convert the list elements to data frames and use do.call to paste them, i.e.
unlist(lapply(ind1, function(i) do.call(paste, c(as.data.frame(i), sep = ':'))))
#[1] "1" "2" "5" "0" "1:2" "1:5" "2:5" "1:2:5"

Can an R matrix contain different datatypes? Does this hacked-up matrix-of-lists work?

I read these:
https://stackoverflow.com/a/5159049/1175496
Matrices are for data of the same type.
https://stackoverflow.com/q/29732279/1175496
Vectors (and so matrix) can accept only one type of data
If matrix can only accept one data type, why can I do this:
> m_list<-matrix(list('1',2,3,4),2,2)
> m_list
[,1] [,2]
[1,] "1" 3
[2,] 2 4
The console output looks like I am combining character and integer data types.
The console output looks similar to this matrix:
> m_vector<-matrix(1:4,2,2)
> m_vector
[,1] [,2]
[1,] 1 3
[2,] 2 4
When I assign to m_list, it doesn't coerce the other values (as in https://stackoverflow.com/q/29732279/1175496 )
> m_list[2,2] <-'4'
> m_list
[,1] [,2]
[1,] "1" 3
[2,] 2 "4"
OK here is what I gather from replies so far:
Question
How can I have a matrix with different types?
Answer
You cannot; the elements are not different types; all (4) elements of this matrix are lists
all(
is.list(m_list[1,1]),
is.list(m_list[2,1]),
is.list(m_list[1,2]),
is.list(m_list[2,2]))
#[1] TRUE
Question
But I constructed matrix like this: matrix(list('1',2,3,4),2,2), how did this become a matrix of (4) lists, rather than a matrix of (4) characters, or even (4) integers?
Answer
I'm not sure. Even though the documentation says re: the first argument to matrix:
Non-atomic classed R objects are coerced by as.vector and all
attributes discarded.
It seems these are identical
identical(as.vector(list('1',2,3,4)), list('1',2,3,4))
#[1] TRUE
Question
But I assign a character ('4') to an element of m_list, how does that work?
m_list[2,2] <-'4'
Answer
It is "coerced", as if you did this:
m_list[2,2] <- as.list('4')
Question
If the elements in m_list are lists, is m_list equivalent to matrix(c(list('1'),list(2),list(3),list(4)),2,2)?
Answer
Yes, these are equivalent:
m_list <- matrix(list('1',2,3,4),2,2)
m_list2 <- matrix(c(list('1'),list(2),list(3),list(4)),2,2)
identical(m_list, m_list2)
#[1] TRUE
Question
So how can I retrieve the typeof the '1' hidden in m_list[1,1]?
Answer
At least two ways:
typeof(m_list[1,1][[1]])
#[1] "character"
...or, can directly do this (thanks, Frank) (since indexing has this "is applied in turn to the list, the selected component, the selected component of that component, and so on" behavior)...
typeof(m_list[[1,1]])
#[1] "character"
Question
How can I tell the difference between these two
m1 <- matrix(c(list(1), list(2), list(3), list(4)), 2, 2)
m2 <- matrix(1:4, 2, 2)
Answer
If you are using RStudio,
m1 is described as List of 4
m2 is described as int [1:2, 1:2] 1 2 3 4
..or else, just use typeof(), which for vectors and matrices, identifies the type of their elements... (thanks, Martin)
typeof(m1)
#[1] "list"
typeof(m2)
#[1] "integer"
class can also help distinguish, but you must wrap the matrices in vectors first:
#Without c(...)
class(m1)
#[1] "matrix"
class(m2)
#[1] "matrix"
#With c(...)
class(c(m1))
#[1] "list"
class(c(m2))
#[1] "integer"
...you could tell a subtle difference in the console output; notice how the m2 (containing integers) right-aligns its elements (because numerics are usually right-aligned)...
m1
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
m2
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
Short-Answer: Matrices in R cannot contain different data types. All data have to or will be transformed into either logical, numerical, character or list.
Matrices always contain the same type. If input data to matrix() have different data types, they will automatically transformed into the same type. Thus, all data will be either logical, numerical, character or list. And here is your case, in your example all elements are being transformed into individual lists.
> myList <- list('1',2,3,4)
> myMatrix <- matrix( myList ,2,2)
> myMatrix
[,1] [,2]
[1,] "1" 3
[2,] 2 4
> typeof(myMatrix)
"list"
If you want to transformed completely your data from a list, you need to unlist the data.
> myList <- list('1',2,3,4)
> myMatrix <- matrix( unlist(myList) ,2,2)
> myMatrix
[,1] [,2]
[1,] "1" "3"
[2,] "2" "4"
> typeof(myMatrix)
"character"
Picking up the comments, verify yourself:
typeof(m_list)
typeof(m_list[2,2])

Print out matrix in R within function for which each column has a specified number of digits defined by a function parameter

I have been thinking about this for some time already but I cannot find the solution. Here is the problem.
I have a function that iteratively calculated the root for a function that I plug in there. So for every iteration I come closer to the final solution (Newton procedure). Within the function I build a matrix that stores the number of the iteration (i), the value for x (x) and the value for f(x) (y).
matrix <- rbind(matrix, c(i,x,y))
The function itself works perfectly fine. But I want to print out the result in a specific way.
I want to return the matrix that is built in the function like this:
[,1] [,2] [,3]
[1,] "1" "0.000" "3.000"
[2,] "2" "-299999.975" "89999985109.735"
[3,] "3" "-150000.381" "22500114442.253"
[4,] "4" "-75000.123" "5625014307.234"
[5,] "5" "-37500.048" "1406253577.781"
[6,] "6" "-18750.030" "351563619.088"
[7,] "7" "-9375.093" "87890906.234"
[8,] "8" "-4687.507" "21972727.599"
[9,] "9" "-2343.753" "5493182.588"
What I am doing at the moment is:
return(matrix(sprintf(c("%.0f","%.3f","%.3f"),matrix),nrow=N))
But this yields
[,1] [,2] [,3]
[1,] "1" "0" "3"
[2,] "2.000" "-299999.975" "89999985109.735"
[3,] "3.000" "-150000.381" "22500114442.253"
[4,] "4" "-75000" "5625014307"
[5,] "5.000" "-37500.048" "1406253577.781"
[6,] "6.000" "-18750.030" "351563619.088"
[7,] "7" "-9375" "87890906"
[8,] "8.000" "-4687.507" "21972727.599"
[9,] "9.000" "-2343.753" "5493182.588"
So the digits are somehow specified by column and not by row.
In a next step - to make it even more complicated - my function is supposed to have a parameter that allows users to specify the number of digits of column 2 and 3.
so something like:
newton <- function(fx, p=0)
Where p is the number of digits and by default 0.
Can somebody help me with this? Thank you!
If your matrix has always 3 columns you can simply do:
x.digits = 3
y.digits = 4
mxStr <-
cbind(sprintf('%d',mx[,1]),
sprintf(paste('%.',x.digits,'f',sep=''),mx[,2]),
sprintf(paste('%.',y.digits,'f',sep=''),mx[,3])
)
Of course you can wrap this code in a function and pass x.digits and y.digits as parameters...

combn unclasses factor variables

UPDATE: FIXED
This is fixed in the upcoming release of R 3.1.0. From the CHANGELOG:
combn(x, simplify = TRUE) now gives a factor result for factor input
x (previously user error).
Related to PR#15442
I just noticed a curious thing. Why does combn appear to unclass factor variables to their underlying numeric values for all except the first combination?
x <- as.factor( letters[1:3] )
combn( x , 2 )
# [,1] [,2] [,3]
#[1,] "a" "1" "2"
#[2,] "b" "3" "3"
This doesn't occur when x is a character:
x <- as.character( letters[1:3] )
combn( x , 2 )
# [,1] [,2] [,3]
#[1,] "a" "a" "b"
#[2,] "b" "c" "c"
Reproducible on R64 on OS X 10.7.5 and Windows 7.
I think it is due to the conversion to matrix done by the simplify parameter. If you don't use it you get:
combn( x , 2 , simplify=FALSE)
[[1]]
[1] a b
Levels: a b c
[[2]]
[1] a c
Levels: a b c
[[3]]
[1] b c
Levels: a b c
The fact that the first column is OK is due to the way combn works: the first column is specified separately and the other columns are then changed from the existing matrix using [<-. Consider:
m <- matrix(x,3,3)
m[,2] <- sample(x)
m
[,1] [,2] [,3]
[1,] "a" "1" "a"
[2,] "b" "3" "b"
[3,] "c" "2" "c"
I think the offending function is therefore [<-.
As Konrad said, the treatment of factors is often odd, or at least inconsistent. In this case I think the behaviour is weird enough to constitute a bug. Try submitting it, and see what the response is.
Since the result is a matrix, and there is no factor matrix type, I think that the correct behaviour would be to convert factor inputs to character somewhere near the start of the function.
I had the same problem. Coercing back to a character vector inside the combn command seems to work:
> combn(as.character(x),2)
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "b" "c" "c"

Resources