Split a column based on a space between numbers [duplicate]

Split a column based on a space between numbers [duplicate] - r

I have the vector
length
# [1] 15,34, 12,24, 225,
# Levels: 12,24, 15,34, 225,
and I want to separate them by the comma to eventually make a list of these values
Tried:
strsplit(length, ",")
but keep getting the error message
Error in strsplit(length, ",") : non-character argument

Your "length" object is a factor:
As the error message indicates, strsplit expects a character vector as the input.
Try:
strsplit(as.character(length), ",")
Demo
x <- factor(c("1,2", "3,4", "5,6"))
strsplit(x, ",")
# Error in strsplit(x, ",") : non-character argument
strsplit(as.character(x), ",")
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "3" "4"
#
# [[3]]
# [1] "5" "6"

You could also use: (x from #Ananda Mahto's post)
library(stringr)
str_split(x, ",")
#[[1]]
# [1] "1" "2"
#[[2]]
#[1] "3" "4"
#[[3]]
#[1] "5" "6"
Or
str_extract_all(x, "[0-9]+")
Or
library(stringi)
stri_extract_all_regex(x, "[0-9]+")

Related

Split a character to letters and numbers

I have a unique character, each letter follows a number. For instance: A1B10C5
I would like to split it into letter <- c(A, B, C) and number <- c(1, 10, 5) using R.

We can use regex lookarounds to split between the letters and numbers
v1 <- strsplit(str1, "(?<=[A-Za-z])(?=[0-9])|(?<=[0-9])(?=[A-Za-z])", perl = TRUE)[[1]]
v1[c(TRUE, FALSE)]
#[1] "A" "B" "C"
as.numeric(v1[c(FALSE, TRUE)])
#[1] 1 10 5
data
str1 <- "A1B10C5"

str_extract_all is another way to do this:
library(stringr)
> str <- "A1B10C5"
> str
[1] "A1B10C5"
> str_extract_all(str, "[0-9]+")
[[1]]
[1] "1" "10" "5"
> str_extract_all(str, "[aA-zZ]+")
[[1]]
[1] "A" "B" "C"

To extract letters and numbers at same time, you can use str_match_all to get letters and numbers in two separate columns:
library(stringr)
str_match_all("A1B10C5", "([a-zA-Z]+)([0-9]+)")[[1]][,-1]
# [,1] [,2]
#[1,] "A" "1"
#[2,] "B" "10"
#[3,] "C" "5"

You can also use the base R regmatches with gregexpr:
regmatches(this, gregexpr("[0-9]+", "A1B10C5"))
[[1]]
[1] "1" "10" "5"
regmatches(this, gregexpr("[A-Z]+", "A1B10C5"))
[[1]]
[1] "A" "B" "C"
These return lists with a single element, a character vector. As akrun does, you can extract the list item using [[1]] and can also convert the vector of digits to numeric like this:
as.numeric(regmatches(this, gregexpr("[0-9]+", this))[[1]])

R: Removing blanks from the list

I'm wondering if there is any way to remove blanks from the list.
As far as I've searched, I found out that there are many Q&As for removing
the whole element from the list, but couldn't find the one regarding
a specific component of the element.
To be specific, the list now I'm working with looks like this:
[[1]]
[1] "1" "" "" "2" "" "" "3"
[[2]]
[1] "weak"
[[3]]
[1] "22" "33"
[[4]]
[1] "44" "34p" "45"
From above, you can find " ", which should be removed.
I've tried different commands like
text.words.bl <- text.words.ll[-which(text.words.ll==" ")]
text.words.bl <- text.words.ll[!sapply(text.words.ll, is.null)]
etc, but seems like " "s in [[1]] of the list still remains.
Is it impossible to apply commands to small pieces in each element of the list?
(e.g. 1, 2, weak, 22, 33... respectively)
I've used "lapply" function to run specific commands to each elements,
and it seemed like those lapply commands all worked....
JY

Use %in%, but negate it with !:
## Sample data:
L <- list(c(1, 2, "", "", 4), c(1, "", "", 2), c("", "", 3))
L
# [[1]]
# [1] "1" "2" "" "" "4"
#
# [[2]]
# [1] "1" "" "" "2"
#
# [[3]]
# [1] "" "" "3"
The replacement:
lapply(L, function(x) x[!x %in% ""])
# [[1]]
# [1] "1" "2" "4"
#
# [[2]]
# [1] "1" "2"
#
# [[3]]
# [1] "3"
Obviously, assign the output to "L" if you want to overwrite the original dataset:
L[] <- lapply(L, function(x) x[!x %in% ""])

Another way would be to use nchar(). I borrowed L from #Ananda Mahto.
lapply(L, function(x) x[nchar(x) >= 1])
#[[1]]
#[1] "1" "2" "4"
#
#[[2]]
#[1] "1" "2"
#
#[[3]]
#[1] "3"

Iterating over characters of string R

Could somebody explain me why this does not print all the numbers separately in R.
numberstring <- "0123456789"
for (number in numberstring) {
print(number)
}
Aren't strings just arrays of chars? Whats the way to do it in R?

In R "0123456789" is a character vector of length 1.
If you want to iterate over the characters, you have to split the string into
a vector of single characters using strsplit.
numberstring <- "0123456789"
numberstring_split <- strsplit(numberstring, "")[[1]]
for (number in numberstring_split) {
print(number)
}
# [1] "0"
# [1] "1"
# [1] "2"
# [1] "3"
# [1] "4"
# [1] "5"
# [1] "6"
# [1] "7"
# [1] "8"
# [1] "9"

Just for fun, here are a few other ways to split a string at each character.
x <- "0123456789"
substring(x, 1:nchar(x), 1:nchar(x))
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
regmatches(x, gregexpr(".", x))[[1]]
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
scan(text = gsub("(.)", "\\1 ", x), what = character())
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"

Possible with tidyverse::str_split
numberstring <- "0123456789"
str_split(numberstring,boundary("character"))
1. '0''1''2''3''4''5''6''7''8''9'

Here's a naive approach for iterating a string using a for loop and substring. This isn't any better than existing answers for the common case, but it might be useful if you want to break out of the loop early instead of always traversing the entire string once up front, as str_split/scan/substring(x, 1:nchar(x), 1:nchar(x))/regmatches requires.
s <- "0123456789"
if (s != "") {
for (i in 1:nchar(s)) {
print(substring(s, i, i))
}
}
The if is needed to avoid looping backwards from 1 to 0, inclusive of both ends.

Your question is not 100% clear as to the desired outcome (print each character individually from a string, or store each number in a way that the given print loop will result in each number being produced on its own line).
To store numberstring such that it prints using the loop you included:
numberstring<-c(0,1,2,3,4,5,6,7,8,9)
for(number in numberstring){print(number);}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
>

How to get empty last elements from strsplit() in R?

I need to process some data that are mostly csv. The problem is that R ignores the comma if it comes at the end of a line (e.g., the one that comes after 3 in the example below).
> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"
I'd like it to be read in as [1] "1" "2" "3" NA instead. How can I do this? Thanks.

Here are a couple ideas
scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1] 1 2 3 NA
unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1] 1 2 3 NA
Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:
as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA
Or, you could specify what="character" in scan, or colClasses="character" in read.csv for slightly different output
scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" ""
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" ""
You could also specify na.strings="" along with colClasses="character"
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""),
use.names=FALSE)
#[1] "1" "2" "3" NA

Hadley's stringi (and previously stringr) libraries are a huge improvement on base string functions (fully vectorized, consistent function interface):
require(stringr)
str_split("1,2,3,", ",")
[1] "1" "2" "3" ""
as.integer(unlist(str_split("1,2,3,", ",")))
[1] 1 2 3 NA

Using stringi package:
require(stringi)
> stri_split_fixed("1,2,3,",",")
[[1]]
[1] "1" "2" "3" ""
## you can directly specify if you want to omit this empty elements
> stri_split_fixed("1,2,3,",",",omit_empty = TRUE)
[[1]]
[1] "1" "2" "3"

error in strsplit when trying to separate by a comma

I have the vector
length
# [1] 15,34, 12,24, 225,
# Levels: 12,24, 15,34, 225,
and I want to separate them by the comma to eventually make a list of these values
Tried:
strsplit(length, ",")
but keep getting the error message
Error in strsplit(length, ",") : non-character argument

Your "length" object is a factor:
As the error message indicates, strsplit expects a character vector as the input.
Try:
strsplit(as.character(length), ",")
Demo
x <- factor(c("1,2", "3,4", "5,6"))
strsplit(x, ",")
# Error in strsplit(x, ",") : non-character argument
strsplit(as.character(x), ",")
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "3" "4"
#
# [[3]]
# [1] "5" "6"

You could also use: (x from #Ananda Mahto's post)
library(stringr)
str_split(x, ",")
#[[1]]
# [1] "1" "2"
#[[2]]
#[1] "3" "4"
#[[3]]
#[1] "5" "6"
Or
str_extract_all(x, "[0-9]+")
Or
library(stringi)
stri_extract_all_regex(x, "[0-9]+")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split a column based on a space between numbers [duplicate] - r

I have the vector length # [1] 15,34, 12,24, 225, # Levels: 12,24, 15,34, 225, and I want to separate them by the comma to eventually make a list of these values Tried: strsplit(length, ",") but keep getting the error message Error in strsplit(length, ",") : non-character argument

You could also use: (x from #Ananda Mahto's post) library(stringr) str_split(x, ",") #[[1]] # [1] "1" "2" #[[2]] #[1] "3" "4" #[[3]] #[1] "5" "6" Or str_extract_all(x, "[0-9]+") Or library(stringi) stri_extract_all_regex(x, "[0-9]+")

Related

Split a character to letters and numbers

R: Removing blanks from the list

Iterating over characters of string R

How to get empty last elements from strsplit() in R?

error in strsplit when trying to separate by a comma

Categories

Resources