I use RStudio. Within a loop, I want to display in a single console line a string of variable length. I am using cat(). If I use \n, different lines are written (not what I want):
A <- c("AAAAA","BBB","C")
for (i in 1:3){cat(A[i],"\n"); Sys.sleep(1)}
AAAAA
BBB
C
The use of \r works well when names are of the (nearly) same length, but in this case, the result is again not what I want:
for (i in 1:3){cat(A[i],"\r"); Sys.sleep(1)}
C B A
as it should be only the string "C" when the loop is finished.
I have also tried deleting many spaces with \b, but the length difference is large and many times the information is written one line above the current console line.
Is there a simple way to do this? (base R preferred)
Edit: What I want is that, in a single line, first the string "AAAAA" appears. After one second, only the string "BBB" should appear (not "BBB A"). After one second, only the string "C" should appear (not "C B A").
Your current method works if you first pad all the strings to the length of the longest one:
A <- c("AAAAA","BBB","C")
max_length = max(nchar(A))
A_filled = stringr::str_pad(A, max_length, side = "right")
for (i in 1:3){cat(A_filled[i],"\r"); Sys.sleep(1)}
To pad the strings in base R you can use sprintf:
max_length = max(nchar(A))
pad_format = paste0("%-", max_length, "s")
A_filled = sprintf(pad_format, A)
I tend to believe you want all the strings printed: This is a base R solution
A <- c("AAAAA","BBB","C")
x <-formatC(A, width = -max(nchar(A)))
for (i in 1:3){cat("\r",x[i]); Sys.sleep(1)}
I hope, just a simple cat works fine.
> for (i in 1:3){cat(A[i], " "); Sys.sleep(1)}
AAAAA BBB C
> for (i in 1:3){cat(A[i]); Sys.sleep(1)}
AAAAABBBC
Related
Consider this simple vector:
x <- c(1,2,3,4,5)
\Sexpr{x} will print in LaTeX 1,2,3,4,5 but I often I need to report some vectors in text as a human, including "and" before the last number.
I tried todo automatically with this function:
x <- c(1,2,3,4,5)
nicevector <- function(x){
a <- head(x,length(x)-1)
b <- tail(x,1)
cat(a,sep=", ");cat(" and ");cat(b)}
nicevector(x)
That seem to work in the console \Sexpr{nicevector(x)} but failed miserably in the .Rnw file (while \Sexpr{x} works). Some ideas?
You can use knitr::combine_words(x).
Using cat() is only for its side-effect: printing in the console. cat() won't return a character string, so you won't see anything in the output. By comparison, knitr::combine_words() returns a character string.
There is also a function for this in glue package
glue::glue_collapse(1:4, ",", last = " and ")
#> 1, 2, 3 and 4
See help of function
Could anyone please help to achieve the following with gsub in R?
input string: a=5.00,b=120,c=0.0003,d=0.02,e=5.20, f=1200.0,g=850.02
desired output: a=5,b=120,c=0.0003,d=0.02,e=5.2, f=1200, g=850.02
Practically, removing the redundant 0s after the decimal point if they are all just 0s, don't remove if real fractions exist.
I couldn't get this to work using gsub alone, but we can try splitting your input vector on comma, and then using an apply function with gsub:
x <- "a=5.00,b=120,c=0.0003,d=0.02,e=5.20, f=1200.0,g=850.02"
input <- sapply(unlist(strsplit(x, ",")), function(x) gsub("(?<=\\d)\\.$", "", gsub("(\\.[1-9]*)0+$", "\\1", x), perl=TRUE))
input <- paste(input, collapse=",")
input
[1] "a=5,b=120,c=0.0003,d=0.02,e=5.2, f=1200,g=850.02"
Demo
I actually make two calls to gsub. The first call strips off all trailing zeroes appearing after a decimal point, should the number have one. And the second call removes stray decimal points, in the case of a number like 5.00, which the first call would leave as 5. and not 5, the latter which we want.
To remove trailing 0s after the decimal, try this:
EDIT Forgot 5.00
x = c('5.00', '0.500', '120', '0.0003', '0.02', '5.20', '1200', '850.02')
gsub("\\.$" "", gsub("(\\.(|[1-9]+))0+$", "\\1", x))
# [1] "5" "0.5" "120" "0.0003" "0.02" "5.2" "1200" "850.02"
HT #TimBiegeleisen: I misread input as a vector of strings. For a single-string input, convert to vector of strings, which you can call gsub on, then collapse output back to a single string:
paste(
gsub("\\.$", "", gsub("(\\.(|[1-9]+))0+$", "\\1",
unlist(strsplit(x, ", ")))),
collapse=", ")
[1] "a=5, b=0.5, c=120, d=0.0003, e=0.02, f=5.2, g=1200, h=850.02"
gsub is a text processing tool that works on character level. It’s ignorant of any semantic interpretation.
However, you are specifically interested in manipulating this semantic interpretation, namely, the precision of numbers encoded in your text.
So use that: parse the numbers in the text, and write them out with the desired precision:
parse_key_value_pairs = function (text) {
parse_pair = function (pair) {
pair = strsplit(pair, "\\s*=\\s*")[[1]]
list(key = pair[1], value = as.numeric(pair[2]))
}
pairs = unlist(strsplit(text, "\\s*,\\s*"))
structure(lapply(pairs, parse_pair), class = 'kvp')
}
as.character.kvp = function (x, ...) {
format_pair = function (pair) {
sprintf('%s = %g', pair[1], pair[2])
}
pairs = vapply(x, format_pair, character(1))
paste(pairs, collapse = ", ")
}
And use it as follows:
text = "a=5.00,b=120,c=0.0003,d=0.02,e=5.20, f=1200.0,g=850.02"
parsed = parse_key_value_pairs(text)
as.character(parsed)
This uses several interesting features of R:
For text processing, it still uses regular expressions (inside strsplit).
To process multiple values, use lapply to apply a parsing function to parts of the string in turn
To reconstruct a key–value pair, format the string using sprintf. sprintf is a primitive text formatting tool adapted from C. But it’s fairly universal and it works OK in our case.
The parsed value is tagged with an S3 class name. This is how R implements object orientation.
Provide an overload of the standard generic as.character for our type. This means that any existing function that takes an object and displays it via as.character can deal with our parsed data type. In particular, this works with the {glue} library:
> glue::glue("result: {parsed}")
result: a = 5, b = 120, c = 0.0003, d = 0.02, e = 5.2, f = 1200, g = 850.02
This is probably not the most ideal solution, but for educational purposes, here is one way to call gsub only once using conditional regex:
x = 'a=5.00,b=120,c=0.0003,d=0.02,e=5.20, f=1200.0,g=850.02'
gsub('(?!\\d+(?:,|$))(\\.[0-9]*[1-9])?(?(1)0+\\b|\\.0+(?=(,|$)))', '\\1', x, perl = TRUE)
# [1] "a=5,b=120,c=0.0003,d=0.02,e=5.2, f=1200,g=850.02"
Notes:
(?!\\d+(?:,|$)) is a negative lookbehind that matches a digit one or more times following a comma or end of string. This effectively excludes the pattern from the overall regex match.
(\\.[0-9]*[1-9])? matches a literal dot, a digit zero or more times and a digit (except zero). The ? makes this pattern optional, and is crucial to how the conditional handles the back reference.
(?(1)0+\\b|\\.0+(?=(,|$))) is a conditional with the logic (?(IF)THEN|ELSE)
(1) is the (IF) part which checks if capture group 1 matched. This refers to (\\.[0-9]*[1-9])
0+\\b is the (THEN) part which matches only if (IF) is TRUE. In this case, only if (\\.[0-9]*[1-9]) matched, will the regex try to match a zero one or more times following a word boundary
\\.0+(?=(,|$)) is the (ELSE) part which matches only if (IF) is FALSE. In this case only if (\\.[0-9]*[1-9]) didn't match, will the regex try to match a literal dot, a zero one or more times following a comma or end of string
If we put 2. and 3. together, we get either (\\.[0-9]*[1-9])0+\\b or \\.0+(?=(,|$))
\\1 as a replacement therefore turns either (\\.[0-9]*[1-9])0+\\b to the pattern matched by (\\.[0-9]*[1-9]) or \\.0+(?=(,|$)) to blank. which translates to:
5.20 to 5.2 for the former
5.00 to 5 and 1200.0 to 1200 for the latter
I am trying to understand names, lists and lists of lists in R. It would be convenient to have a way to dynamically label them like this:
> ll <- list("1" = 2)
> ll
$`1`
[1] 2
But this is not working:
> ll <- list(as.character(1) = 2)
Error: unexpected '=' in "ll <- list(as.character(1) ="
Neither is this:
> ll <- list(paste(1) = 2)
Error: unexpected '=' in "ll <- list(paste(1) ="
Why is that? Both paste() and as.character() are returning "1".
The reason is that paste(1) is a function call that evaluates to a string, not a string itself.
The The R Language Definition says this:
Each argument can be tagged (tag=expr), or just be a simple expression.
It can also be empty or it can be one of the special tokens ‘...’, ‘..2’, etc.
A tag can be an identifier or a text string.
Thus, tags can't be expressions.
However, if you want to set names (which are just an attribute), you can do so with structure, eg
> structure(1:5, names=LETTERS[1:5])
A B C D E
1 2 3 4 5
Here, LETTERS[1:5] is most definitely an expression.
If your goal is simply to use integers as names (as in the question title), you can type them in with backticks or single- or double-quotes (as the OP already knows). They are converted to characters, since all names are characters in R.
I can't offer a deep technical explanation for why your later code fails beyond "the left-hand side of = is not evaluated in that context (of enumerating items in a list)". Here's one workaround:
mylist <- list()
mylist[[paste("a")]] <- 2
mylist[[paste("b")]] <- 3
mylist[[paste("c")]] <- matrix(1:4,ncol=2)
mylist[[paste("d")]] <- mean
And here's another:
library(data.table)
tmp <- rbindlist(list(
list(paste("a"), list(2)),
list(paste("b"), list(3)),
list(paste("c"), list(matrix(1:4,ncol=2))),
list(paste("d"), list(mean))
))
res <- setNames(tmp$V2,tmp$V1)
identical(mylist,res) # TRUE
The drawbacks of each approach are pretty serious, I think. On the other hand, I've never found myself in need of richer naming syntax.
I have a data frame sp which contains several species names but as they come from different databases, they are written in different ways.
For example, one specie can be called Urtica dioica and Urtica dioica L..
To correct this, I use the following code which extracs only the two first words from a row:
paste(strsplit(sp[i,"sp"]," ")[[1]][1],strsplit(sp[i,"sp"]," ")[[1]][2],sep=" ")
For now, this code is integrated in a for loop, which works but takes ages to finish:
for (i in seq_along(sp$sp)) {
sp[i,"sp2"] = paste(strsplit(sp[i,"sp"]," ")[[1]][1],
strsplit(sp[i,"sp"]," ")[[1]][2],
sep=" ")
}
If there a way to improve this basic code using vectors or an apply function?
You could just use vectorized regular expression functions:
library(stringr)
x <- c("Urtica dioica", "Urtica dioica L.")
> str_extract(string = x,"\\w+ \\w+")
[1] "Urtica dioica" "Urtica dioica"
I happen to have found stringr convenient here, but with the right regex for your specific data you could do this just as well with base functions like gsub.
You might want to check to see if there are more than 2 words in the string before doing each extraction:
if((sapply(gregexpr("\\W+", i), length) + 1) > 2){
...
}
There's a function for that.
Also from stringr, the word function
> choices <- c("Urtica dioica", "Urtica dioica L..")
> library(stringr)
> word(choices, 1:2)
# [1] "Urtica" "dioica"
> word(choices, rep(1:2, 2))
# [1] "Urtica" "dioica" "Urtica" "dioica"
These return individual strings. For two strings containing the first and last names,
> word(choices, 1, 2)
# [1] "Urtica dioica" "Urtica dioica"
The final line gets the first two words from each string in the vector choices
I am trying to create a vector of character strings in R using a loop, but am having some trouble. I'd appreciate any help anyone can offer.
The code I'm working with is a bit more detailed, but I've tried to code a reproducible example here which captures all the key bits:
vector1<-c(1,2,3,4,5,6,7,8,9,10)
vector2<-c(1,2,3,4,5,6,7,8,9,10)
thing<-character(10)
for(i in 1:10) {
line1<-vector1[i]
line2<-vector2[i]
thing[i]<-cat(line1,line2,sep="\n")
}
R then prints out the following:
1
1
Error in thing[i] <- cat(line1, line2, sep = "\n") :
replacement has length zero
What I'm trying to achieve is a character vector where each character is split over two lines, such that thing[1] is
1
1
and thing[2] is
2
2
and so on. Does anyone know how I could do this?
cat prints to the screen, but it returns NULL- to concatenate to a new character vector, you need to use paste:
thing[i]<-paste(line1,line2,sep="\n")
For example in an interactive terminal:
> line1 = "hello"
> line2 = "world"
> paste(line1,line2,sep="\n")
[1] "hello\nworld"
> ret <- cat(line1,line2,sep="\n")
hello
world
> ret
NULL
Though note that in your case, the entire for loop could just be replaced with the more concise and efficient line:
thing <- paste(vector1, vector2, sep="\n")
# [1] "1\n1" "2\n2" "3\n3" "4\n4" "5\n5" "6\n6" "7\n7" "8\n8"
# [9] "9\n9" "10\n10"