How to create a vector of character strings using a loop? - r

I am trying to create a vector of character strings in R using a loop, but am having some trouble. I'd appreciate any help anyone can offer.
The code I'm working with is a bit more detailed, but I've tried to code a reproducible example here which captures all the key bits:
vector1<-c(1,2,3,4,5,6,7,8,9,10)
vector2<-c(1,2,3,4,5,6,7,8,9,10)
thing<-character(10)
for(i in 1:10) {
line1<-vector1[i]
line2<-vector2[i]
thing[i]<-cat(line1,line2,sep="\n")
}
R then prints out the following:
1
1
Error in thing[i] <- cat(line1, line2, sep = "\n") :
replacement has length zero
What I'm trying to achieve is a character vector where each character is split over two lines, such that thing[1] is
1
1
and thing[2] is
2
2
and so on. Does anyone know how I could do this?

cat prints to the screen, but it returns NULL- to concatenate to a new character vector, you need to use paste:
thing[i]<-paste(line1,line2,sep="\n")
For example in an interactive terminal:
> line1 = "hello"
> line2 = "world"
> paste(line1,line2,sep="\n")
[1] "hello\nworld"
> ret <- cat(line1,line2,sep="\n")
hello
world
> ret
NULL
Though note that in your case, the entire for loop could just be replaced with the more concise and efficient line:
thing <- paste(vector1, vector2, sep="\n")
# [1] "1\n1" "2\n2" "3\n3" "4\n4" "5\n5" "6\n6" "7\n7" "8\n8"
# [9] "9\n9" "10\n10"

Related

How do I want to write a function that will run based on the first letter of a line?

#R
I've been trying to come up with an idea logic please help me.
if first character of line = "H" run function Header{
if first character of line = "P" run function Patient{
.....other
}
}
function in line
If "H" is first character of line =
Header <- string::str_split(Hline,fixed('|')) # Hline = String to set to collect H line.
ID_H <- Header[1] # [1] = The first value is extracted from the line by '|'.
Del_H <- Header[2]
There are five types of out-of-the-box functions based on the first letter of the line read.
If you have an example that would be great.
please help me
Best Regards
I would just create a wrapper around readline(), where you extract the first character of the input and use switch() to pull in the relevant function. Note that you can't use 'str'[1] to get 's' in R, that will just be vector indexing that returns 'str'. We can use substr().
reader <- function() {
x <- readline()
x1 <- tolower(substr(x, 1, 1))
switch(
x1,
h = cumsum(1:4),
p = cumprod(1:4)
)
}
reader()
#> hardy
#> [1] 1 3 6 10
reader()
#> plenty
#> [1] 1 2 6 24

how to add index to the character vector in a for loop in R

My original data is in a data.frame with 311 rows and 1 column. The data in the row is a url link. I am writing a for loop to extract one section of the url link
new_data_git <- data.frame(readLines("new_data.txt", skip = 0));
colnames(new_data_git) <- "url_link_new_git"
new_data_git$url_link_zip <- paste0(new_data_git$url_link_new_git,"/archive/master.zip");
for (i in 1:nrow(new_data_git)){
split_file = strsplit(new_data_git$url_link_zip[i], split = "/", fixed = TRUE)[[1]]
name_of_saved_projs_zip = paste0(split_file[length(split_file)-3 +1], ".zip")
print(name_of_saved_projs_zip)
}
My output is:
[1] "abc.zip"
[1] "def.zip"
[1] "gef.zip"
[1] "hdgadg.zip"
[1] "model.zip"
[1] "delays.zip"
[1] "recipe.zip"
[1] "food.zip"
[1] "Recipe.zip"
But when I try to print name_of_saved_projs_zip outside the for loop I only get the last value. And I m not sure how to refer the index to name_of_saved_projs_zip.
Unable to post the code in the code format on stack, hence have spaced out every line here.
Can someone please tell me what is wrong in my for loop and how can I see the right index for name_of_saved_projs_zip within and outside the loop. The index is crucial for a following loop function I need to write. Appreciate your help. Thanks!
As #Rui Barradas mentioned, need to add the following before the for loop
name_of_saved_projs_zip <- 1: nrow(new_data_git)
and in the for loop,
name_of_saved_projs_zip[i] <- paste(split_file[length(split_file)-3 +1], ".zip")
That did the trick !

R: For loop works on list, not individual element

I'm trying to learn by writing a function. It should convert the UOM (unit of measure) into a fraction of the standard UOM. In this case, 1/10 or 0.1
I'm trying to loop through a list generated from strsplit, but I only get the whole list, not each element in the list. I can't figure out what I'm doing wrong. Is strsplit the wrong function? I don't think the problem is in strsplit, but I can't figure out what I'm doing wrong in the For loop:
qty<-0
convf<-0
uom <- "EA"
std <- "CA"
pack <-"1EA/10CA"
if(uom!=std){
s<-strsplit(pack,split = '/')
for (i in s){
print(i)
if(grep(uom,i)){
qty<- regmatches(i,regexpr('[0-9]+',i))
}
if(grep(std,i)){
convf<-regmatches(i, regexpr('[0-9]+',i))
}
} #end for
qty<-as.numeric(qty)
convf<-as.numeric(convf)
}
return(qty/convf)
maybe is a problem with the indexing of the list. Have you tried to use [[1]] after the strsplit function?
Example:
string <- "Hello/world"
mylist <- strsplit(string, "/")
## [[1]]
## [1] "Hello" "World"
But if we explicit say that we want the first "element" of the list with [[1]] we will have the entire array of the string.
Example:
string <- "Hello/World"
mylist <- strsplit(string, "/")[[1]]
## [1] "Hello" "World"
Hope this can help you in your problem.
There are a few issues here. The main problem you are having is that s is a list of length 1. Within that list, the first (only) element is a vector of length 2. Consequently, you would need to set i in s[[1]].
However, we can go one step further. Try the following code:
library(stringr)
lapply(strsplit(pack,split = '/'), # works within the list, can handle larger vectors for `pack`
function(x, uom, std) {
reg_expr <- paste(uom,std, sep = "|") # call this on its own, it's just searching for the text saved in uom or std
qty <- as.numeric(str_remove(x, reg_expr)) # removes that text and converts the string to a number
names(qty) <- str_extract(x, reg_expr) # extracts the text and uses it to name elements in qty
qty[uom] / qty[std] # your desired result.
},
uom = uom, # since these are part of the function call, we need to specify what they are. This is where you should change them.
std = std)
I don't know if this is what you're trying to practice, but I'd avoid loops while extracting the digits from a string like "1EA/10CA". If it helps, the column lst is actually a list inside of a dataset.
library(magrittr)
ds <- data.frame(pack = c("1EA/10CA", "1EA/4CA", "2EA/2CA"))
pattern <- "^(\\d+)EA/(\\d+)CA$"
ds %>%
dplyr::mutate(
qty = as.numeric(sub(pattern, "\\1", pack)),
convf = as.numeric(sub(pattern, "\\2", pack)),
ratio = qty / convf,
lst = purrr::map2(qty, convf, ~list(qty=.x[[1]], convf=.y[[1]]))
)
Result:
pack qty convf ratio lst
1 1EA/10CA 1 10 0.10 1, 10
2 1EA/4CA 1 4 0.25 1, 4
3 2EA/2CA 2 2 1.00 2, 2

storing long strings (DNA sequence) in R

I have written a function that finds the indices of subsequences in a long DNA sequence. It works when my longer DNA sequence is < about 4000 characters. However, when I try to apply the same function to a much longer sequence, the console gives me a + instead of a >... which leads me to believe that it is the length of the string that is the problem.
for example: when the longer sequence is: "GATATATGCATATACTT", and the subsequence is: "ATAT", I get the indices "1, 3, 9" (0-based)
dnaMatch <- function(dna, sequence) {
ret <- list()
k <- str_length(sequence)
c <- str_length(dna) - k
for(i in 1:(c+1)) {
ret[i] = str_sub(dna, i, i+k-1)
}
ret <- unlist(ret)
TFret <- lapply (ret, identical, sequence)
TFret <- which(unlist(TFret), arr.ind = TRUE) -1
print(TFret)
}
Basically, my question is... is there any way around the character-limitation in the string class?
I can replicate nrussell's example, but this assigns correctly x<-paste0(rep("abcdef",1000),collapse="") -- a potential workaround is writing the character string to a .txt file and reading the .txt file into R directly:
test.txt is a 6,000 character long string.
`test<-read.table('test.txt',stringsAsFactors = FALSE)
length(class(test[1,1]))
[1] 1
class(test[1,1])
[1] "character"
nchar(test[1,1])
[1] 6000`
Rather than write your own function, why not use the function words.pos in package seqinr. It seems to work even for strings up to a million base pairs.
For example,
library(seqinr)
data(ec999)
myseq <- paste(ec999[[1]], collapse="")
myseq <- paste(rep(myseq,100), collapse="")
words.pos("atat", myseq)

Replace non-ascii chars with a defined string list without a loop in R

I want to replace non-ascii characters (for now, only spanish), by their ascii equivalent. If I have "á", I want to replace it with "a" and so on.
I built this function (works fine), but I don't want to use a loop (including internal loops like sapply).
latin2ascii<-function(x) {
if(!is.character(x)) stop ("input must be a character object")
require(stringr)
mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
for(y in 1:length(mapL)) {
x<-str_replace_all(x,mapL[y],mapA[y])
}
x
}
Is there an elegante way to solve it? Any help, suggestion or modification is appreciated
gsubfn() in the package of the same name is really nice for this sort of thing:
library(gsubfn)
# Create a named list, in which:
# - the names are the strings to be looked up
# - the values are the replacement strings
mapL <- c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA <- c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
# ll <- setNames(as.list(mapA), mapL) # An alternative to the 2 lines below
ll <- as.list(mapA)
names(ll) <- mapL
# Try it out
string <- "ÍÓáÚ"
gsubfn("[áéíóúÁÉÍÓÚñÑüÜ]", ll, string)
# [1] "IOaU"
Edit:
G. Grothendieck points out that base R also has a function for this:
A <- paste(mapA, collapse="")
L <- paste(mapL, collapse="")
chartr(L, A, "ÍÓáÚ")
# [1] "IOaU"
I like the version by Josh, but I thought I might add another 'vectorized' solution. It returns a vector of unaccented strings. It also only relies on the base functions.
x=c('íÁuÚ','uíÚÁ')
mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
split=strsplit(x,split='')
m=lapply(split,match,mapL)
mapply(function(split,m) paste(ifelse(is.na(m),split,mapA[m]),collapse='') , split, m)
# "iAuU" "uiUA"

Resources