I'm looking for the functions get_output_content or at least get_output_length below, that would tell me how many characters were printed in the console.
test <- function(){
cat("ab")
cat("\b")
cat("cd")
c <- get_output_content() # "acd" (I'd be happy with "ab\bcd" as well)
l <- get_output_length() # 3
return(list(c,l))
}
test()
In this example obviously I could easily count the characters in the input, but If I'm using other functions I may not. Can you help me build one or both of these functions ?
EDIT to clarify:
in my real situation, I cannot work upstream and count before, like in the proposed solutions, I need to count the displayed output at a given time without monitoring what's before.
here's a reproducible example looking more like what I want to achieve
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.001)
}
report <- pbsapply(my_files,work_on_pdf) # the simple version, but I want to add the pdf name next to the bar to have more info about my progress
# so I tried this but it's not satisfying because it "eats" some of the current output of pbapply
report <- pbsapply(my_files,function(x){
buffer_length <- 25
work_on_pdf(x)
catmsg <- paste0(c( # my additional message, which is in 3 parts:
rep("\b",buffer_length), # (1) eat 25 characters
x, # (2) print filename
rep(" ",buffer_length-nchar(x))), # (3) print spaces to cover up what may have been printed before
collapse="")
cat(catmsg)
})
if I was able to count what's in the console I could easily tweak my function to get something satisfying.
NEW EDIT : FYI solution to example but not to general question:
I could solve my precise issue with this, though it doesn't solve the general question, which is measuring the current output of the console when you don't have any other info.
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.01)
}
pbsapply2 <- function(X,FUN,FUN2){
# FUN2 will give the additional message
pbsapply(X,function(x){
msg <- FUN2(x)
cat(msg)
output <- FUN(x)
eraser <- paste0(c(
rep("\b",nchar(msg)), # go back to position before additional message
rep(" ",nchar(msg)), # cover with blank spaces
rep("\b",nchar(msg))), # go back again to initial position
collapse="")
cat(eraser)
return(output)
})
}
report <- pbsapply2(my_files,work_on_pdf,function(x) paste("filename:",x))
Something like this (?):
test <- function(){
c <- paste0(capture.output(cat("ab")),
capture.output(cat("\b")),
capture.output(cat("cd")))
n <- nchar(c)
l <- length(c)
return(list(c,n,l))
}
test()
Related
I have a list of URLs (more than 4000) from a specific domain (pixilink.com) and what I want to do is to figure out if the provided domain is a picture or a video. To do this, I used the solutions provided here: How to write trycatch in R and Check whether a website provides photo or video based on a pattern in its URL and wrote the code shown below:
#Function to get the value of initial_mode from the URL
urlmode <- function(x){
mycontent <- readLines(x)
mypos <- grep("initial_mode = ", mycontent)
if(grepl("0", mycontent[mypos])){
return("picture")
} else if(grepl("tour", mycontent[mypos])){
return("video")
} else{
return(NA)
}
}
Also, in order to prevent having error for URLs that don't exist, I used the code below:
readUrl <- function(url) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
return(1)
},
error=function(cond) {
return(NA)
},
warning=function(cond) {
return(NA)
},
finally={
message( url)
}
)
return(out)
}
Finally, I separated the list of URLs and pass it into the functions (here for instance, I used 1000 values from URL list) described above:
a <- subset(new_df, new_df$host=="www.pixilink.com")
vec <- a[['V']]
vec <- vec[1:1000] # only chose first 1000 rows
tt <- numeric(length(vec)) # checking validity of url
for (i in 1:length(vec)){
tt[i] <- readUrl(vec[i])
print(i)
}
g <- data.frame(vec,tt)
g2 <- g[which(!is.na(g$tt)),] #only valid url
dd <- numeric(nrow(g2))
for (j in 1:nrow(g2)){
dd[j] <- urlmode(g2[j,1])
}
Final <- cbind(g2,dd)
Final <- left_join(g, Final, by = c("vec" = "vec"))
I ran this code on a sample list of URLs with 100, URLs and it worked; however, after I ran it on whole list of URLs, it returned an error. Here is the error : Error in textConnection("rval", "w", local = TRUE) : all connections are in use Error in textConnection("rval", "w", local = TRUE) : all connections are in use
And after this even for sample URLs (100 samples that I tested before) I ran the code and got this error message : Error in file(con, "r") : all connections are in use
I also tried closeAllConnection after each recalling each function in the loop, but it didn't work.
Can anyone explain what this error is about? is it related to the number of requests we can have from the website? what's the solution?
So, my guess as to why this is happening is because you're not closing the connections that you're opening via tryCatch() and via urlmode() through the use of readLines(). I was unsure of how urlmode() was going to be used in your previous post so it had made it as simple as I could (and in hindsight, that was badly done, my apologies). So I took the liberty of rewriting urlmode() to try and make it a little bit more robust for what appears to be a more expansive task at hand.
I think the comments in the code should help, so take a look below:
#Updated URL mode function with better
#URL checking, connection handling,
#and "mode" investigation
urlmode <- function(x){
#Check if URL is good to go
if(!httr::http_error(x)){
#Test cases
#x <- "www.pixilink.com/3"
#x <- "https://www.pixilink.com/93320"
#x <- "https://www.pixilink.com/93313"
#Then since there are redirect shenanigans
#Get the actual URL the input points to
#It should just be the input URL if there is
#no redirection
#This is important as this also takes care of
#checking whether http or https need to be prefixed
#in case the input URL is supplied without those
#(this can cause problems for url() below)
myx <- httr::HEAD(x)$url
#Then check for what the default mode is
mycon <- url(myx)
open(mycon, "r")
mycontent <- readLines(mycon)
mypos <- grep("initial_mode = ", mycontent)
#Close the connection since it's no longer
#necessary
close(mycon)
#Some URLs with weird formats can return
#empty on this one since they don't
#follow the expected format.
#See for example: "https://www.pixilink.com/clients/899/#3"
#which is actually
#redirected from "https://www.pixilink.com/3"
#After that, evaluate what's at mypos, and always
#return the actual URL
#along with the result
if(!purrr::is_empty(mypos)){
#mystr<- stringr::str_extract(mycontent[mypos], "(?<=initial_mode\\s\\=).*")
mystr <- stringr::str_extract(mycontent[mypos], "(?<=\').*(?=\')")
return(c(myx, mystr))
#return(mystr)
#So once all that is done, check if the line at mypos
#contains a 0 (picture), tour (video)
#if(grepl("0", mycontent[mypos])){
# return(c(myx, "picture"))
#return("picture")
#} else if(grepl("tour", mycontent[mypos])){
# return(c(myx, "video"))
#return("video")
#}
} else{
#Valid URL but not interpretable
return(c(myx, "uninterpretable"))
#return("uninterpretable")
}
} else{
#Straight up invalid URL
#No myx variable to return here
#Just x
return(c(x, "invalid"))
#return("invalid")
}
}
#--------
#Sample code execution
library(purrr)
library(parallel)
library(future.apply)
library(httr)
library(stringr)
library(progressr)
library(progress)
#All future + progressr related stuff
#learned courtesy
#https://stackoverflow.com/a/62946400/9494044
#Setting up parallelized execution
no_cores <- parallel::detectCores()
#The above setup will ensure ALL cores
#are put to use
clust <- parallel::makeCluster(no_cores)
future::plan(cluster, workers = clust)
#Progress bar for sanity checking
progressr::handlers(progressr::handler_progress(format="[:bar] :percent :eta :message"))
#Website's base URL
baseurl <- "https://www.pixilink.com"
#Using future_lapply() to recursively apply urlmode()
#to a sequence of the URLs on pixilink in parallel
#and storing the results in sitetype
#Using a future chunk size of 10
#Everything is wrapped in with_progress() to enable the
#progress bar
#
range <- 93310:93350
#range <- 1:10000
progressr::with_progress({
myprog <- progressr::progressor(along = range)
sitetype <- do.call(rbind, future_lapply(range, function(b, x){
myprog() ##Progress bar signaller
myurl <- paste0(b, "/", x)
cat("\n", myurl, " ")
myret <- urlmode(myurl)
cat(myret, "\n")
return(c(myurl, myret))
}, b = baseurl, future.chunk.size = 10))
})
#Converting into a proper data.frame
#and assigning column names
sitetype <- data.frame(sitetype)
names(sitetype) <- c("given_url", "actual_url", "mode")
#A bit of wrangling to tidy up the mode column
sitetype$mode <- stringr::str_replace(sitetype$mode, "0", "picture")
head(sitetype)
# given_url actual_url mode
# 1 https://www.pixilink.com/93310 https://www.pixilink.com/93310 invalid
# 2 https://www.pixilink.com/93311 https://www.pixilink.com/93311 invalid
# 3 https://www.pixilink.com/93312 https://www.pixilink.com/93312 floorplan2d
# 4 https://www.pixilink.com/93313 https://www.pixilink.com/93313 picture
# 5 https://www.pixilink.com/93314 https://www.pixilink.com/93314 floorplan2d
# 6 https://www.pixilink.com/93315 https://www.pixilink.com/93315 tour
unique(sitetype$mode)
# [1] "invalid" "floorplan2d" "picture" "tour"
#--------
Basically, urlmode() now opens and closes connections only when necessary, checks for URL validity, URL redirection, and also "intelligently" extracts the value assigned to initial_mode. With the help of future.lapply(), and the progress bar from the progressr package, this can now be applied quite conveniently in parallel to as many pixilink.com/<integer> URLs as desired. With a bit of wrangling thereafter, the results can be presented very tidily as a data.frame as shown.
As an example, I've demonstrated this for a small range in the code above. Note the commented out 1:10000 range in the code in this context: I let this code run the last couple of hours over this (hopefully sufficiently) large range of URLs to check for errors and problems. I can attest that I encountered no errors (only the regular warnings In readLines(mycon) : incomplete final line found on 'https://www.pixilink.com/93334'). For proof, I have the data from all 10000 URLs written to a CSV file that I can provide upon request (I don't fancy uploading that to pastebin or elsewhere unnecessarily). Due to oversight on my part, I forgot to benchmark that run, but I suppose I could do that later if performance metrics are desired/would be considered interesting.
For your purposes, I believe you can simply take the entire code snippet below and run it verbatim (or with modifications) by just changing the range assignment right before the with_progress(do.call(...)) step to a range of your liking. I believe this approach is simpler and does away with having to deal with multiple functions and such (and no tryCatch() messes to deal with).
I have the following code and for some reason I can not get it to go to the next record. I put the print(innallURL) before the if(!file.exists) command and I get all the urls, I put it after where it is no, and nothing shows up, so that makes me think the files are failing the file.exists part. I've test 5 of the URLs and they work/exist in the web directory. I've used the next in other loops and have not had any problems. I also tried adding url= in fron of the string. I've searched here, but I can only find the formula I'm using in the examples. Please point out what idiotic thing I'm overlooking. ha! Thanks!
gidURLs <- as.list(as.data.frame(t(allGids)))
#get pitches
#set filename
for (thisgidURL in gidURLs) {
innallURL = paste(thisgidURL, "inning/inning_all.xml", sep="")
if (!file.exists(innallURL)) next
print(innallURL)
pitches <- read_xml(innallURL)
# get all at-bats ---------------------------------------------------------
atbat <- xml_find_all(pitches, "//atbat")
# make a giant data frame -------------------------------------------------
bind_rows(lapply(atbat, function(x) {
pitches <- xml_find_all(x, "./pitch")
bind_rows(lapply(pitches, function(y) {
data.frame(t(xml_attrs(y)), stringsAsFactors=FALSE)
})) -> pitch_dat
pitch_dat$atbat_num <- xml_attr(x, "num")
pitch_dat
})) -> bats
game_id = substr(url, 66, 95)
bats$game_id <- game_id
allbatsdf <- data.frame(bats)
allbats <- rbind(bats, allbats)
}
example innallURLs from the list when its printed before the file.exists command.
"http://gd2.mlb.com/components/game/win/year_2015/month_01/day_30/gid_2015_01_30_magwin_oriwin_1/inning/inning_all.xml"
"http://gd2.mlb.com/components/game/win/year_2015/month_01/day_31/gid_2015_01_31_sydwin_prtwin_1/inning/inning_all.xml"
We can use httr package to use GET method.
library(httr)
if(GET(innallURL)$status_code!=200) next
So my title is wordy mostly because I don't actually know what I'm talking about. I've been struggling for two days now with trying to use the package memisc so that I can discriminate between different types of NAs in my data. (Sidenote: This is already a compromise I'm unhappy about making, but there weren't really any other good options. If you think you have one, I could make another separate question for that.)
Skipping through all of what brought me to this point, here's where we are right now.
#install.packages("memisc")
library(memisc)
df <- data.frame('a' = 1:4, 'b' = 2:5, 'c' = 3:6)
ds <- data.set(df)
descs <- c("This is a", "This is b", "This is c")
Obviously my data is much larger or I wouldn't be bothering with this at all, but maybe that needs to be said just in case.
Here's what happens when I try to give an 'item' a description:
Things that work
ds <- within(ds, description(df.a) <- "test") # The way the package suggests
description(ds$df.a) == "test" # TRUE, as expected
description(ds$df.a) <- "test2" # Calling it with a name
description(ds$df.a) == "test2" # TRUE too
The end goal is to have all 176 columns in my "data.set" described by the already-existing, long-winded descriptions I have in a vector. So I need to have it accept a vector or use apply or iterate over it somehow, and I'm comfortable with my options there. But everything I've tried thus far to get the replacement function description() to work in a way that will accept references to objects somewhere along the line has failed.
Things that don't work
description(ds[, 1]) <- "test" # Calling it by number doesn't wirj
description(ds$df.a) == "test" # FALSE
test_name <- "df.a"
ds <- within(ds, description(get(test_name)) <- "test") # No.
test_name <- quote("df.a")
ds <- within(ds, description(eval(test_name)) <- "test") # No.
Whether I use get(), quote()/eval() or some similar setup, I get the same style of error:
> ds <- within(ds, description(get(test_name)) <- "test")
Error in description(get(test_name)) <- "test" :
could not find function "get<-"
So I thought I would be creative and call the function itself with both values...
ds <- within(ds, 'description<-'(test_name, "test3"))
description(ds$df.a) == "test3" # FALSE
And that also fails, presumably because description() is a wrapper (I think?) for another function (method?), annotate(), which has the same thing going on.
Not mine, obv, but just for reference
#### Description function ####
"description<-" <- function(x,value){
annotation(x)["description"] <- value
x
}
#### I'd paste the stupid method code, but stackoverflow doesn't ####
#### think it's properly formatted as code when I do, so pfft. ####
Actual question
How the heck do I take my data.set, ds, and assign the vector of descriptions, descs, to the apprpopriate items?
Try:
for (i in seq_along(descs) ) description(ds[[i]]) <- descs[i]
Simple problem, but I can't find an answer that works anywhere. When I use readline() (for example, as demonstrated here - http://www.rexamples.com/4/Reading%20user%20input) it works perfectly:
readinteger <- function()
{
n <- readline(prompt="Enter an integer: ")
return(as.integer(n))
}
a <- print(readinteger())
However, if I add any code after this, readline() is skipped and the code just continues:
readinteger <- function()
{
n <- readline(prompt="Enter an integer: ")
return(as.integer(n))
}
a <- print(readinteger())
b <- 7
Any solutions (and/or easier ways to get user input)?
The problem here is that as soon as a <- print(readinteger()) is entered, it is evaluated, and b <- 7 is interpreted as the input to readline. A solution is to wrap your code in a function or a block:
{
a <- print(readinteger())
b <- 7
}
By putting everything into a block, the whole block is read as code and only after, as it is evaluated, you will be prompted for an integer.
put multiple outputs if processed by your function or the print function into "ONE object"
you name it: myOput <- list(abandoned.b,that.c)
print(myoutput)
I have a BATCH script (on a Windows machine, would like this to be generalised in time), that opens and runs the following code in the background:
library(svDialogs)
library(ggplot2)
library (XML)
sharesID <- display(guiDlg("SciViews-R", "Please Enter Shares ID:"))
test.df <- xmlToDataFrame(sharesID)
test.df
sapply(test.df, class)
test.df$timeStamp <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
test.df$Price <- as.numeric(as.character(test.df$Price))
sapply(test.df, class)
options("digits.secs"=3)
summary (test.df)
with(test.df, plot(timeStamp, Price))
sd (test.df$Price)
mean(test.df$timeStamp)
test.df$timeStamp <- test.df[1,"timeStamp"] + cumsum(runif(7)*60)
summary(test.df)
qplot(timeStamp,Price,data=test.df,geom=c("point","line"))
Price <- summary(test.df$Price)
print (Price)
When it gets to the
sharesID <- display(guiDlg("SciViews-R", "Please Enter Shares ID:"))
It brings up a dialogue box asking the user to Enter Shares ID. At present you have to use the full path of the file you want the rest of the code to execute on. Is there a way that you can enter a file number from a list of files kept in a database or such.
The other question I have is that it generates a pdf file of the both plots only. While I like this is there a way to specify the output type and location (ie as a graph on a webpage).
I want to include a print out of the summary of Price in the output but this is not achieved using the above commands.
I've never seen the svDialogs package before, but it looks pretty awesome. Staying in base R, then maybe something like this is what you're after (or at least maybe it'll spark an idea); just copy and paste, it's a self contained example:
# set up the relations between file paths and the file names you want
f.names <- c("file_01", "file_02", "file_03")
f.paths <- c("C:\\text01.txt", "C:\\text02.txt", "C:\\text03.txt")
# ask the user to select one of your specified file names
choice <- select.list(choices = f.names, title = "Please Select Shares ID:")
# return the full file path based on the file name selected by the user
index <- grep(choice, f.names, fixed = TRUE)
sharesID <- f.paths[index]
The above will bring up a dialogue box with file choices as defined by you. The user then selects one of the choices and eventually you'll get the full file path:
> sharesID
[1] "C:\\text01.txt"
Hope that helps a little mate,
Tony Breyal
I'm addressing the second desire -- a web page of the graph. Here is a template for how you can use gWidgetsWWW to provide that. You can use this package locally through the localServerStart() command (it uses the help page web server). If you save the following to some file, say "makeGraph.R", then you load it from within R with localServerStart("makeGraph.R") (assuming, you are in the right directory, otherwise add your info):
require(ggplot2)
## a simple web page
w <- gwindow("Make a neat graph")
g <- ggroup(cont=w, horizontal=FALSE)
glabel("Select a data frame to produce a graph", cont=g)
cb <- gcombobox(names(mapNameToFile), selected=-1, cont=g)
f <- gframe("Summary", cont=g)
t <- ghtml("", cont=g)
f <- gframe("Plot", cont=g)
ourDevice <- gsvg(width=500, height=500, cont=f)
addHandlerChanged(cb, handler=function(h,...) {
makePlot(svalue(h$obj))
})
visible(w) <- TRUE
## Below here you must change to suit your application:
## example of map from names to some other object
mapNameToFile <- list("mtcars"=mtcars,
"CO2" = CO2)
## your main function
makePlot <- function(nm) {
df <- mapNameToFile[[nm]]
if(is.null(df)) {
galert(sprintf("Can't find file %s", nm))
return()
}
## your plot
p <- qplot(df[,1], df[,2])
## put into svg device
f <- getStaticTmpFile(ext=".svg")
require(RSVGTipsDevice, quietly=TRUE, warn=FALSE)
devSVGTips(f)
print(p)
dev.off()
svalue(ourDevice) <- f
## write a summary
svalue(t) <- paste("<pre>",
paste(capture.output(summary(df)), collapse="<br>"),
"</pre>",
sep="")
}