I want to combine a concatenation of two lists [duplicate] - r

This question already has answers here:
Creating a sequential list of numbers and letters with R
(2 answers)
Closed 2 years ago.
I need to create a Vector combining the numbers c(1:10) and the Terms c("-KM","-COX"), so that it would turn out like this:
c("1-KM", "1-COX", "2-KM", "2-COX", "3-KM", "3-COX", ...)
I have tried using expand.grid to do that, however it returns a data frame, and I would need it to return a vector. Any help in how I could do that?

Try this version:
apply(expand.grid(v1, v2), 1, function(x) trimws(paste0(x[1], x[2])))
[1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
[9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
[17] "7-COX" "8-COX" "9-COX" "10-COX"
Data:
v1 <- c(1:10)
v2 <- c("-KM", "-COX")

After expand.grid you can use paste to get a vector from the returned data.frame.
do.call(paste0, expand.grid(1:10, c("-KM","-COX")))
# [1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
# [9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
#[17] "7-COX" "8-COX" "9-COX" "10-COX"

Related

Create an alternating series of vector in R

I have a vector that looks like this:
tt <- c("chr1:110363793:G:C", "chr1:110363823:A:G", "chr1:110363849:A:G")
How do I create a vector with POS and NEG characters appended alternatively to get this below?
"chr1:110363793:G:C_POS", "chr1:110363823:A:G_NEG", "chr1:110363849:A:G_POS", "chr1:110363793:G:C_NEG", "chr1:110363823:A:G_POS", "chr1:110363849:A:G_NEG"
Just make use of the recyling of vector with rep and paste
paste0(rep(tt, 2), c("_POS", "_NEG"))
-output
[1] "chr1:110363793:G:C_POS" "chr1:110363823:A:G_NEG" "chr1:110363849:A:G_POS" "chr1:110363793:G:C_NEG" "chr1:110363823:A:G_POS"
[6] "chr1:110363849:A:G_NEG"

Padding lost zeros not universally in a column [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 1 year ago.
I have a list of US postal zip codes of 5 digits, but some lost their leading zeros. How do I add those zeros back in, while keeping others without the leading 0s intact? I tried formatC, springf, str_pad, and none of them worked, because I am not adding 0s to all values.
We can use sprintf
sprintf('%05d', as.integer(zipcodes))
In which way did str_pad not work?
https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_pad
df<-data.frame(zip=c(1,22,333,4444,55555))
df$zip <- stringr::str_pad(df$zip, width=5, pad = "0")
[1] "00001" "00022" "00333" "04444" "55555"
Update:
As of the valuable comment of r2evans:
My solution is not very efficient and to get leading 0 we have to modify the paste0 part slightly see here with a dataframe example:
sapply(df$zip, function(x){if(nchar(x)<5){paste0(0,x)}else{x}})
data:
df <- tribble(
~zip,
7889,
2345,
45567,
4394,
34566,
4392,
4599)
df
Output:
[1] "07889" "02345" "45567" "04394" "34566" "04392" "04599"
Fist answer:
This will add a trailing zero to each integer < 5 digits
Where zip is a vector:
sapply(zip, function(x){if(nchar(x)<5){paste0(x,0)}else{x}})
If they start as strings and you don't want to (or cannot) convert to integers first, then an alternative to sprintf is
vec <- c('1','11','11111')
paste0(strrep('0', pmax(0, 5 - nchar(vec))), vec)
# [1] "00001" "00011" "11111"
This will handle strings of any length, and is a no-op for strings of 5 or greater characters.
In a frame, that would be
dat$colname <- paste0(strrep('0', pmax(0, 5 - nchar(dat$colname))), dat$colname)

How to add leading zeros in a dataframe [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 2 years ago.
I'm trying to change the format of my data. I have a centre-number which is going from 1-15 and a participant-number which is going from 1-~3000
I would like them to start with zeros, so that the centre-number will have two digits and the participant-number will have 4 digits. (For example participant number 1 would then be 0001).
Thank you!
You can use the str_pad function in the 'stringr' package.
library(stringr)
values <- c(1, 5, 23, 123, 43, 7)
str_pad(values, 3, pad='0')
Output:
[1] "001" "005" "023" "123" "043" "007"
In your case as you have two parts to your strings, you can apply the function like this to pad your strings correctly.
# dummy data
centre_participants <- c('1-347', '13-567', '9-7', '15-2507')
# split the strings on "-"
centre_participants <- strsplit(centre_participants, '-')
# apply the right string padding to each component and join together
centre_participants <- sapply(centre_participants, function(x)
paste0(str_pad(x[1], 2, pad='0'),'-',str_pad(x[2], 4, pad='0')))
Output:
[1] "01-0347" "13-0567" "09-0007" "15-2507"

change numbers in string vector [duplicate]

This question already has answers here:
R: gsub, pattern = vector and replacement = vector
(6 answers)
Closed 3 years ago.
I have a string Vector including numbers like this:
x <- c("abc122", "73dj", "lo7833ll")
x
[1] "abc122" "73dj" "lo7833ll"
I want to Change the numbers of the x Vector and replace them with numbers I have stored in another Vector:
right_numbers <- c(500, 700, 23)
> right_numbers
[1] 500 700 23
How can I do this even if the numbers are in different positions in the string(some are at the beginning, some at the end..)?
This is how the x Vector should look like after the changes:
> x
[1] "abc500" "700dj" "lo23ll"
A vectorized solution with stringr -
str_replace(x, "[0-9]+", as.character(right_numbers))
[1] "abc500" "700dj" "lo23ll"
Possibly a more efficient version with stringi package, thanks to #sindri_baldur -
stri_replace_first_regex(x, '[0-9]+', right_numbers)
[1] "abc500" "700dj" "lo23ll"
Here is an idea,
mapply(function(i, y)gsub('[0-9]+', y, i), x, right_numbers)
# abc122 73dj lo7833ll
#"abc500" "700dj" "lo23ll"

R: Replacing rownames of data frame by a substring[2]

I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:
> rownames(test)
[1] "U2OS.EV.2.7.9" "U2OS.PIM.2.7.9" "U2OS.WDR.2.7.9" "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9" "U2OS.EV.18.6.9" "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX" "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM" "X5.U2OS...EV" "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC" "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV" "EXP2.U2OS.MYC" "EXP2.U2OS.PIM1" "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"
In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string
The answer is a very nice solution. The function gsub is used in this way:
transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)
Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question is, is there another way to get to the same solution without using this |?
Thanks!
If you don't want to use the "|" character, you can try something like :
Rnames <-
c( "U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9" ,
"U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9" ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9" )
Rlevels <- c("MYC","EV","PIM","WDR","OBX")
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV" "PIM" "WDR" "MYC" "OBX" "EV" "PIM" "WDR"
But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...
I wouldn't recommend doing this in general in R as it is far less efficient than the solution #csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for "MYN" and replace only in those rownames that match "MYN".
Here is an example using the x data from #csgillespie's Answer:
x <- c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
"U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
"U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")
Copy the data so we have something to compare with later (this just for the example):
x2 <- x
Then create a list of strings you want to match on:
matches <- c("MYC","EV","PIM","WDR","OBX")
Then we loop over the values in matches and do three things (numbered ##X in the code):
Create the regular expression by pasting together the current match string i with the other bits of the regular expression we want to use,
Using grepl() we return a logical indicator for those elements of x2 that contain the string i
We then use the same style gsub() call as you were already shown, but use only the elements of x2 that matched the string, and replace only those elements.
The loop is:
for(i in matches) {
rgexp <- paste(".*(", i, ").*", sep = "") ## 1
ind <- grepl(rgexp, x) ## 2
x2[ind] <- gsub(rgexp, "\\1", x2[ind]) ## 3
}
x2
Which gives:
> x2
[1] "EV" "PIM" "WDR" "MYC" "OBX" "EV" "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"

Resources