I have a dataframe but need to convert it to a large character. Here is an example of the dataframe structure:
texts <- c("TEXT 1", "TEXT 2", "TEXT 3")
data <- data.frame(texts)
I need this structure:
[1] "TEXT 1" "TEXT 2" "TEXT 3"
I already tried using function as.character() , but it does not work as it converts all the lines to a single line.
You can transpose and concatenate, i.e.
c(t(data))
#[1] "TEXT 1" "TEXT 2" "TEXT 3"
Related
I have a matrix like so
A = matrix(
c("2 (1-3)", "4 (2-6)", "3 (2-4)", "1 (0.5-1.5)", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
I want to replace all strings where the first element is less than 5 (ie "0" or "1" or "2" or "3" or "4") with "< 5". It should be:
B = matrix(
c("< 5", "< 5", "< 5", "< 5", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
Any ideas?
Extract the 1st number, convert it into numeric and replace the numbers which are less than 5 with "<5".
A[as.numeric(sub('(\\d+).*', '\\1', A)) < 5] <- '< 5'
A
# [,1] [,2]
#[1,] "< 5" "< 5"
#[2,] "< 5" "5 (2.5-7.5)"
#[3,] "< 5" "7 (5-9)"
A shortcut to extract the first number and to convert it to numeric is using readr::parse_number.
A[readr::parse_number(A) < 5] <- '< 5'
Use substr() to etract the 1st chcaracter of each matrix element. As long as that is a number you can convert it to one via as.numeric()
A[as.numeric(substr(A,1,1))<5] <- "<5"
We don't need to extract and convert to numeric if there are only 5 options:
ie "0" or "1" or "2" or "3" or "4"
A[grep("^[0-4]", A)] <- "< 5"
Or
replace(A, grep("^[0-4]", A), "< 5")
Or
replace(A, startsWith("[0-4]", A), "< 5")
Result
# [,1] [,2]
# [1,] "< 5" "< 5"
# [2,] "< 5" "5 (2.5-7.5)"
# [3,] "< 5" "7 (5-9)"
1) read.table
Use read.table to get the first number in each cell giving vector firstNo. Then use replace to replace those cells with < 5.
The original input A is preserved which is generally desirable to make it easier to test and debug but if you prefer to overwrite it anyways then replace the left hand side of the second line of code with A.
No regular expressions and no packages are used.
firstNo <- read.table(text = A)[[1]]
B <- replace(A, firstNo < 5, "< 5")
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"
Although not needed for the sample input in the question, if it is possible that the text after the left parenthesis is irregular then you might need to add the fill=TRUE or comment.char = "(" arguments to read.table.
2) gsubfn
gsubfn is like gsub except it inputs the capture groups in the regular expression, i.e. the parenthesized portions of the regular expression, into the function expressed in formula notation in the second argument and then replaces the match with the output of the function.
library(gsubfn)
B <- replace(A,
TRUE,
gsubfn("^(\\d) (.*)", ~ if (as.numeric(x) < 5) "< 5" else paste(x, y), A)
)
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"
I would like to order the following vector of chr:
x=c("class 1", "class 2", "class 4", "class 7", "class 5", "class 3", "class 6",
"class 10", "class 9", "class 11", "class 8", "class 12", "class 21")
according to the numbers that appear in the characters. E.g., in this case, the desired result is:
class 1, class 2, class 3, class 4, class 5, class 6, class 7, class 8, class 9, class 10, class 11
class 12, class 21
I tried with:
x[order(x)]
but obtaining a different result:
> x[order(x)]
[1] "class 1" "class 10" "class 11" "class 12" "class 2" "class 21" "class 3"
[8] "class 4" "class 5" "class 6" "class 7" "class 8" "class 9"
As mentioned, it is sorting alphabetically, and not considering the numeric value contained within the string.
There are a number of options to address this:
library(stringr)
str_sort(x, numeric = TRUE)
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
Or
library(gtools)
mixedsort(x)
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
Or without using another package, strip away "class" and use the numeric result to sort:
values <- as.numeric(gsub("class", "", x))
x[order(values)]
[1] "class 1" "class 2" "class 3" "class 4" "class 5" "class 6" "class 7" "class 8" "class 9" "class 10" "class 11" "class 12" "class 21"
That's because x is a vector of class "character" and elements (strings) are ordered alphabetically. Extract number from the strings an convert them to numeric type
y <- as.integer(substr(x, 7,8))
# y has the same order that x
# sort integers (numeric order) and match positions of unordered intergers
# match returns indexes of y ordered by sort(y)
x[match(y, sort(y))]
# Output is:
# [1] "class 1" "class 2" "class 7" "class 6" "class 5" "class 4" "class 3" "class 11" "class 9" "class 8" "class 10" "class 12"
# [13] "class 21"
I have a data column called "Health" with the following four levels: "0 0", "0 1", "1 0" and "1 1"
How do I create a new column where I:
combine the three levels "0 1", "1 0" and "1 1" and rename it as "1"
rename the fourth level "0 0" as "0"
Thank you
Here is my sample:
a = c("a","b","c")
b = c("1","2","3")
I need to concatenate a and b automatically. The result should be "a 1","a 2","a 3","b 1","b 2","b 3","c 1","c 2","c 3".
For now, I am using the paste function:
paste(a[1],b[1])
I need an automatic way to do this. Besides writing a loop, is there any easier way to achieve this?
c(outer(a, b, paste))
# [1] "a 1" "b 1" "c 1" "a 2" "b 2" "c 2" "a 3" "b 3" "c 3"
Other options are :
paste(rep.int(a,length(b)),b)
or :
with(expand.grid(b,a),paste(Var2,Var1))
You can do:
c(sapply(a, function(x) {paste(x,b)}))
[1] "a 1" "a 2" "a 3" "b 1" "b 2" "b 3" "c 1" "c 2" "c 3"
edited paste0 into paste to match OP update
I have two different character vectors in R, that I want to combine to use for column names:
groups <- c("Group A", "Group B")
label <- c("Time","Min","Mean","Max")
When I try using paste I get the result:
> paste(groups,label)
[1] "Group A Time" "Group B Min" "Group A Mean" "Group B Max"
Is there a simple function or setting that can paste these together to get the following output?
[1] "Group A Time" "Group A Min" "Group A Mean" "Group A Max" "Group B Time"
[6] "Group B Min" "Group B Mean" "Group B Max"
Probably outer helps your work. Try this:
> c(t(outer(groups, label, paste)))
[1] "Group A Time" "Group A Min" "Group A Mean" "Group A Max" "Group B Time" "Group B Min"
[7] "Group B Mean" "Group B Max"
outer
outer(groups, labels, FUN=paste)
Since it's two element array, I would do
c(paste(groups[1],label),paste(groups[2],label))