I am using R and want to covert following:
"A,B"
to
"A","B" OR 'A','B'
I tried str_replace(), but that's not working out.
Please suggest, thanks.
Update
I tried the suggested answer by d.b. Though it works, but I didn't realize that I should have shared that, I am going to use above solution for vector. I need the values in data with "A,B" to split in order to use it as a vector.
Using strsplit
> data
[1] "A,B"
> test <- strsplit(x = data, split = ",")
> test
[[1]]
[1] "A" "B"
above test won't be useful because I can't use it for following:
> output_1 <- c(test)
> outputFinalData <- outputFinal[outputFinal$Column %in% output_1,]
outputFinalData is empty with above process. But is not empty when I do:
> output_2 <- c("A", "B")
> outputFinalData <- outputFinal[outputFinal$Column %in% output_2,]
Also, output_1 and output_2 are not same:
> output_1
[[1]]
[1] "Bin_14" "Bin_15"
> output_2
[1] "Bin_14" "Bin_15"
> output_1 == output_2
[1] FALSE FALSE
Use strsplit:
> data = "A,B"
> strsplit(x=data,split=",")
[[1]]
[1] "A" "B"
Note that it returns a list with a vector. The list is length one because you asked it to split one string. If you ask it to split two strings you get a list of length 2:
> data = c("A,B","Foo,bar")
> strsplit(x=data,split=",")
[[1]]
[1] "A" "B"
[[2]]
[1] "Foo" "bar"
So if you know you are only going to have one thing to split you can get a vector of the parts by taking the first element:
> data = "A,B"
> strsplit(x=data,split=",")[[1]]
[1] "A" "B"
However it might be more efficient to do a load of splits in one go and put the bits in a matrix. As long as you can be sure everything splits into the same number of parts, then something like:
> data = c("A,B","Foo,bar","p1,p2")
> do.call(rbind,(strsplit(x=data,split=",")))
[,1] [,2]
[1,] "A" "B"
[2,] "Foo" "bar"
[3,] "p1" "p2"
>
Gets you the two parts in columns of a matrix that you can then add to a data frame if that's what you need.
Related
I have a data frame consisting of records like the following. A typical row of the data frame, df[1,] looks as follows
84745,"F",70,7,"Single",2,"N",4,9,1,1,3,4,4,"2 day","<120 and <80",0,8,0,1,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1
I want to convert it into a variable like myvar below which is of the following type
myvar = list( list(84745,"F",70,7,"Single",2,"N",4,9,1,1,3,4,4,"2 day","<120 and <80",0,8,0,1,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1))
I have tried doing the following, but it doesn't work to convert it to a list of lists.
myvar <- as.list(as.list(as.data.frame(t(df[1,]))))
How can I do that?
EDIT : I have tried myvar = list(unclass(df[14,])). It however fails the call since the formatting of the output myvar is slightly different.
Format of the original line of code
[[1]]
[[1]][[1]]
[1] 21408
[[1]][[2]]
[1] "M"
[[1]][[3]]
[1] 69
[[1]][[4]]
[1] 3
[[1]][[5]]
[1] "Widowed"
Format of myvar = list(unclass(df[14,]))
[[1]]
[[1]]$ID
[1] "21408"
[[1]]$GenderCD
[1] "M"
[[1]]$Age
[1] "69"
[[1]]$LOS
[1] "3"
[[1]]$MaritalStatus
[1] "Widowed"
Try this:
myvar <- list( unclass( df[1,] )
Explanation: df[1,] is actually still a list but with a "data.frame" class attribute. If you remove its class it's now just an ordinary list. When you conducted the t(df[1,]-operation you forced that row to become a column vector which in a dataframe needed to all be the came class so coercion occurred.
If the goal is a row-by-row solution then do this:
myvar <- list()
for (i in seq(nrow(df)) ) { myvar[[i]] <- unclass( df[i,] )}
If it also needs to be unnamed which I rather doubt but I suppose it's possible then:
myvar <- list()
for (i in seq(nrow(df)) ) { myvar[[i]] <- unname( unclass( df[i,] )) }
I tested the unname strategy with:
> unname(unclass( data.frame(a=345,b="tyt")[1,]))
[[1]]
[1] 345
[[2]]
[1] tyt
Levels: tyt
attr(,"row.names")
[1] 1
I am facing the following problem. I try to import a cell of strings with the readMat function in R.
Matlab Code:
Names = {'A', 'B', 'C', 'D'};
save('RDataIn.mat', 'Names');
Now i want to use the set of strings in R. I run to following R script
R Code:
library('R.matlab')
Names <- readMat("RDataIn.mat")
readMat can for apparently not handle cell type .mat data, it creates some strange list. Anyone a solution to this problem? Thanks.
Yeah.... it's pretty weird like that. I wouldn't say it "fails", but it's in a format that requires some work. This is what I get when I save the above cell array and load it into R:
> library("R.matlab")
> Names <- readMat("RDataIn.mat")
> Names
$Names
$Names[[1]]
$Names[[1]][[1]]
[,1]
[1,] "A"
$Names[[2]]
$Names[[2]][[1]]
[,1]
[1,] "B"
$Names[[3]]
$Names[[3]][[1]]
[,1]
[1,] "C"
$Names[[4]]
$Names[[4]][[1]]
[,1]
[1,] "D"
attr(,"header")
attr(,"header")$description
[1] "MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Sat Mar 28 13:12:31 2015 "
attr(,"header")$version
[1] "5"
attr(,"header")$endian
[1] "little"
As you can see, Names contains a nested list where each string is stored in a 1 x 1 matrix. What you can do is access the only element of this list, then within this list, go through all of the elements and extract out the first element of each nested element. This contains each "name" or string you're looking for. You can use a standard sapply call for that and for each element in the list, apply a custom function that would extract out the first element of each nested element for you.
x <- sapply(Names[[1]], function(n) n[[1]])
x would be a vector of names, and I get:
> x
[1] "A" "B" "C" "D"
You can access each "name" by standard vector indexing:
> x[1]
[1] "A"
> x[2]
[1] "B"
> x[3]
[1] "C"
> x[4]
[1] "D"
I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[[1]]
[1] "b"
[[2]]
[1] "f"
##or
[1] "b"
[1] "f"
##or
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
##or
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[[1]]
[1] "b" "c"
[[2]]
[1] "e" "f"
##or
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Scott
Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
Map("[",my.list,my.index)
#[[1]]
#[1] "b"
#
#[[2]]
#[1] "f"
..or:
mapply("[",my.list,my.index)
#[1] "b" "f"
This is my current work around:
> df=data.frame(a=1)
> df$b = list(list(2,'a'))
> df
a b
1 1 2, a
It works, and I don't really mind that df=data.frame(a=1,b=list(list(1,'a'))) doesn't work .
But referencing b requires [[]] notation, like this: df$b[[1]].
I'm looking for a solution that would allow simply df$b.
What you're looking for is not possible. If you use lists, you will need to use the [[]] notation. Otherwise, you can transform the elements of df$b using the command unlist().
> df$b
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[1] "a"
> unlist(df$b)
[1] "2" "a"
> unlist(df$b)[1]
[1] "2"
> unlist(df$b)[2]
[1] "a"
i am try split method and i want to have the second element of a string containing only 2 elemnts. The size of the string is 2.
examples :
string= "AC"
result shouldbe a split after the first letter ("A"), that I get :
res= [,1] [,2]
[1,] "A" "C"
I tryed it with split, but I have no idea how to split after the first element??
strsplit() will do what you want (if I understand your Question). You need to split on "" to split the string on it's elements. Here is an example showing how to do what you want on a vector of strings:
strs <- rep("AC", 3) ## your string repeated 3 times
next, split each of the three strings
sstrs <- strsplit(strs, "")
which produces
> sstrs
[[1]]
[1] "A" "C"
[[2]]
[1] "A" "C"
[[3]]
[1] "A" "C"
This is a list so we can process it with lapply() or sapply(). We need to subset each element of sstrs to select out the second element. Fo this we apply the [ function:
sapply(sstrs, `[`, 2)
which produces:
> sapply(sstrs, `[`, 2)
[1] "C" "C" "C"
If all you have is one string, then
strsplit("AC", "")[[1]][2]
which gives:
> strsplit("AC", "")[[1]][2]
[1] "C"
split isn't used for this kind of string manipulation. What you're looking for is strsplit, which in your case would be used something like this:
strsplit(string,"",fixed = TRUE)
You may not need fixed = TRUE, but it's a habit of mine as I tend to avoid regular expressions. You seem to indicate that you want the result to be something like a matrix. strsplit will return a list, so you'll want something like this:
strsplit(string,"",fixed = TRUE)[[1]]
and then pass the result to matrix.
If you sure that it's always two char string (check it by all(nchar(x)==2)) and you want only second then you could use sub or substr:
x <- c("ab", "12")
sub(".", "", x)
# [1] "b" "2"
substr(x, 2, 2)
# [1] "b" "2"