Background
I want to show part of my bookmarks on my Hugo website. The bookmarks from Firefox can be saved in JSON format, this is the source. The result should represent the nested structure somehow, in a format of a nested list, treeview or accordion. The source files of contents on the website are written in markdown. I want to generate a markdown file from the JSON input.
As I searched for possible solutions:
treeview or accordion: HTML, CSS and Javascript needed. I could not nest accordions with the <details> tag. Also, seems like overkill at the moment.
unordered list: can be done with bare markdown.
I chose to generate an unordered nested list from JSON. I would like to do this with R.
Input/output
Input sample: https://gist.github.com/hermanp/c01365b8f4931ea7ff9d1aee1cbbc391
Preferred output (indentation with two spaces):
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript](https://www.javascript.com/)
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
Bonus preferred output: show favicons before links, like below (other suggestion welcomed, like loading them from the website's server instead of linking):
- ![https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a][Stack Overflow](https://stackoverflow.com/)
Attempt
generate_md <- function (file) {
# Encoding problem with tidyjson::read_json
bmarks_json_lite <- jsonlite::fromJSON(
txt = paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json"))
# This is the start point, a data frame
level1 <- bmarks_json_lite$children$children[[2]]
# Get the name of the variable to modify it.
# Just felt that some abstraction needed.
varname <- deparse(substitute(level1))
varlevel <- as.integer(substr(varname, nchar(varname), nchar(varname)))
# Get through the data frame by its rows.
for (i in seq_len(nrow(get(varname)))) {
# If the type of the element in the row is "text/x-moz-place",
# then get its title and create a markdown list element from it.
if (get(varname)["type"][i] == "text/x-moz-place"){
# The two space indentation shall be multiplied as many times
# as deeply nested in the lists (minus one).
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
# Otherwise do this and also get inside the next level.
} else if (get(varname)["type"][i] == "text/x-moz-place-container") {
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
# I know this is not good, just want to express my thought.
# Create the next, deeper level's variable, whoose name shall
# represent the depth in the nest.
# Otherwise how can I multiply the indentation for the markdown
# list elements? It depends on the name of this variable.
varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
varlevel + 1L)
varlevel <- varlevel + 1L
assign(varname, get(varname)["children"][[i]])
# The same goes on as seen at the higher level.
for (j in seq_len(nrow(get(varname)))){
if (get(varname)["type"][i] == "text/x-moz-place"){
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
} else if (get(varname)["type"][i] == "text/x-moz-place-container") {
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
varlevel + 1L)
varlevel <- varlevel + 1L
assign(varname, get(varname)["children"][[i]])
for (k in seq_len(nrow(get(varname)))){
# I don't know where this goes...
# Also I need to paste somewhere the md_title strings to get the
# final markdown output...
}
}
}
}
}
}
Question
How can I recursively grab and paste strings from this JSON file? I tried to search for tips in recursion, but it's quite a hard topic. Any suggestion, package, function, link will be welcomed!
I know you asked for a solution in R.
Just as a suggestion, here is a solution using jq, as it is very suitable for json transformations.
#!/bin/bash
BOOKMARKS='FirefoxBookmarks.json'
jq -r '
def bookmark($iconuri; $title; $uri):
if $iconuri != null then "![\($iconuri)]" else "" end +
"[\($title)](\($uri))";
def bookmarks:
(objects | to_entries[]
| if .value | type == "array" then (.value | bookmarks)
else .value end ) //
(arrays[] | [bookmarks] | " - \(.[0])", " \(.[1:][])" );
(.. | .children? | arrays)
|= map(if .uri != null then {bookmark: bookmark(.iconuri; .title; .uri)}
else {title} end +
{children})
| del(..| select(length == 0)) # remove empty children and empty titles
| del(..| select(length == 0)) # remove objects that got empty because of previous deletion
| del(..| objects | select(has("title") and (.children | length == 0))) # remove objects with title but no children
| .children # remove root level
| bookmarks
' < "$BOOKMARKS"
output:
- Könyvjelzők eszköztár
- Info
- Python
- ![fake-favicon-uri:https://www.freecodecamp.org/news/the-python-guide-for-beginners/][The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- ![https://github.githubassets.com/favicons/favicon.svg][Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- ![https://static.codepen.io/assets/favicon/favicon-touch-de50acbf5d634ec6791894eba4ba9cf490f709b3d742597c6fc4b734e6492a5a.png][CodePen](https://codepen.io/)
- ![https://www.javascript.com/etc/clientlibs/pluralsight/main/images/favicons/android-chrome-192x192.png][JavaScript](https://www.javascript.com/)
- ![https://css-tricks.com/apple-touch-icon.png][CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- ![https://www.smashingmagazine.com/images/favicon/app-icon-512x512.png][Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- ![https://info340.github.io/img/busy-spider-icon.png][Client-Side Web Development](https://info340.github.io/)
- ![https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a][Stack Overflow](https://stackoverflow.com/)
- ![https://hup.hu/profiles/hupper/themes/hup_theme/favicon.ico][HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
After I watched a few videos on recursion and saw a few code examples, I tried, manually stepped through the code and somehow managed to do it with recursion. This solution is independent on the nestedness of the bookmarks, therefore a generalized solution for everyone.
Note: all the bookmarks were in the Bookmarks Toolbar in Firefox. This is highlighted in the generate_md function. You can tackle with it there. If I improve the answer later, I will make it more general.
library(jsonlite)
# This function recursively converts the bookmark titles to unordered
# list items.
recursive_func <- function (level) {
md_result <- character()
# Iterate through the current data frame, which may have a children
# column nested with other data frames.
for (i in seq_len(nrow(level))) {
# If this element is a bookmark and not a folder, then grab
# the title and construct a list item from it.
if (level[i, "type"] == "text/x-moz-place"){
md_title <- level[i, "title"]
md_uri <- level[i, "uri"]
md_iconuri <- level[i, "iconuri"]
# Condition: the URLs all have schema (http or https) part.
# If not, filname will be a zero length character vector.
host_url <- regmatches(x = md_uri,
m = regexpr(pattern = "(?<=://)[[:alnum:].-]+",
text = md_uri,
perl = T))
md_link <- paste0("[", md_title, "]", "(", md_uri, ")")
md_listitem <- paste0("- ", md_link, "\n")
# If this element is a folder, then get into it, call this
# function over it. Insert two space (for indentation) in
# the generated sting before every list item. Paste this
# list of items to the folder list item.
} else if (level[i, "type"] == "text/x-moz-place-container") {
md_title <- level[i, "title"]
md_listitem <- paste0("- ", md_title, "\n")
md_recurs <- recursive_func(level = level[i, "children"][[1]])
md_recurs <- gsub("(?<!(\\w ))-(?= )", " -", md_recurs, perl = T)
md_listitem <- paste0(md_listitem, md_recurs)
}
# Collect and paste the list items of the current data frame.
md_result <- paste0(md_result, md_listitem)
}
# Return the (sub)list of the data frame.
return(md_result)
}
generate_md <- function (jsonfile) {
# Encoding problem with tidyjson::read_json
bmarks_json_lite <- fromJSON(txt = jsonfile)
# This is the start point, a data frame. It represents the
# elements inside the Bookmarks Toolbar in Firefox.
level1 <- bmarks_json_lite$children$children[[2]]
# Do not know how to make it prettier, but it works.
markdown_result <- recursive_func(level = level1)
return(markdown_result)
}
You can run the generate_md function with the example.
generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json"))
# Output
[1] "- Info\n - Python\n - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)\n - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)\n - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)\n - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)\n - Frontend\n - [CodePen](https://codepen.io/)\n - [JavaScript](https://www.javascript.com/)\n - [CSS-Tricks](https://css-tricks.com/)\n - [Butterick’s Practical Typography](https://practicaltypography.com/)\n - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)\n - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)\n - [Client-Side Web Development](https://info340.github.io/)\n - [Stack Overflow](https://stackoverflow.com/)\n - [HUP](https://hup.hu/)\n - [Hope in Source](https://hopeinsource.com/)\n"
You can cat it and write it to a file also with writeLines. But bevare! In Windows environments, you probably need to turn useBytes = TRUE to get the correct characters in the file. Reference: UTF-8 file output in R
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json")))
# Output
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript](https://www.javascript.com/)
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
There was a problem with the regex part. If there are bookmarks with some - title (space, hyphen, space) characters in their titles, these hyphens will also be "indented" as the list items.
# Input JSON
https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
"76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
"bookmarks-example-hyphen.json")))
# Output (two space indentation) markdown:
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript - Wikipedia](https://en.wikipedia.org/wiki/JavaScript) # correct
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
I posted another question about this problem. After some hint and try I answered my own question.
Related
I am working with an S4 object (PairwiseAlignmentsSingleSubject). When I type the name of an instance of this class into the console and hit enter, I get something like the following:
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: ATCGATCGATCGATCG
subject: -TCGATCG-TCGATC-
score: -16.23717
The actual class is much larger than just this, so I assume some print function for the class is being called. I really want to see the code used in this print function, but I cannot figure out how to pull it up. Could someone please offer some insights into this?
Thank you
Edit based on JDL's answer:
I was able to use selectMethod (I guess this class inherits its "print" function). However, the result is just:
> selectMethod("print",signature(x="BioStrings"))
Method Definition (Class "derivedDefaultMethod"):
function (x, ...)
UseMethod("print")
<bytecode: 0x5572111b6d58>
<environment: namespace:base>
Signatures:
x
target "BioStrings"
defined "ANY"
Which is still not very informative and definitely not the full code for the function. Does anyone know how I can take this further?
To get the code associated with a method, the function to use is getMethod. If you know the class you are interested in then the call is of the form below
getMethod("print",signature(x="numeric"))
which in your specific case is
getMethod("print",signature(x="PairwiseAlignmentsSingleSubject")).
If you have an object obj and want to see what code will be applied to it when you call print, then a slight modification allows this:
getMethod("print",signature(x=class(obj)))
This also works for any generic function, you can substitute "plot" etc. in place of "print".
Note that this only works for methods defined for the exact class you specified. If there isn't such a method (i.e. one based on inheritance was used instead), then replace getMethod with selectMethod.
For further information, consult the help page at ?getMethod which documents these and some similar functions.
Looking through their github is usually useful :)
They don't use 'print' but 'show'.
If you look at the source code for the class PairwiseAlignmentsSingleSubject you can see a show method, but it's only for the class PairwiseAlignmentsSingleSubjectSummary.
But in the last method for 'PairwiseAlignmentsSingleSubject' you see it creates a 'newPairwiseAlignments' and if you look a the source code for the class PairwiseAlignments you'll find the "show" method starting at line 398.
### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "show" method
###
### TODO: Maybe make the "show" method format the alignment in a SGD fashion
### i.e. split in 60-letter blocks and use the "|" character to highlight
### exact matches.
###
.show_PairwiseAlignments <- function(x)
{
x_len <- length(x)
if (x_len == 0L)
cat("Empty ")
cat(switch(type(x), "global"="Global", "overlap"="Overlap",
"local"="Local", "global-local" = "Global-Local",
"local-global"="Local-Global"),
" ", class(x), sep="")
if (x_len == 0L) {
cat("\n")
return()
}
cat(" (1 of ", x_len, ")\n", sep="")
x1 <- x[1L]
x_type <- type(x)
global.pattern <- x_type %in% c("global", "global-local")
global.subject <- x_type %in% c("global", "local-global")
p1start <- if (global.pattern)
""
else
paste0("[", start(x1#pattern#range), "]")
s1start <- if (global.subject)
""
else
paste0("[", start(x1#subject#range), "]")
width <- max(nchar(p1start), nchar(s1start))
if (width != 0L) {
width <- width + 1L
p1start <- format(p1start, justify="right", width=width)
s1start <- format(s1start, justify="right", width=width)
}
width <- getOption("width") - 9L - width
pattern1 <- toSeqSnippet(alignedPattern(x1)[[1L]], width)
subject1 <- toSeqSnippet(alignedSubject(x1)[[1L]], width)
cat("pattern:", p1start, " ", add_colors(pattern1), "\n", sep="")
cat("subject:", s1start, " ", add_colors(subject1), "\n", sep="")
cat("score:", score(x1), "\n")
}
setMethod("show", "PairwiseAlignments",
function(object) .show_PairwiseAlignments(object)
)
So I am trying to learn R on my own and am just working through the online tutorial. I am trying to code a recursive function that prints the first n terms of the Fibonacci sequence and can't get the code to run without the error:
Error in if (nterms <= 0) { : missing value where TRUE/FALSE needed
My code does ask me for input before entering the if else statement either which I think is odd as well. Below is my code any help is appreciated.
#Define the fibonacci sequence
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
return(n)
} else {
# define the rest of the terms of the sequence using recursion
return(recurse_fibonacci(n-1) + recurse_fibonacci(n-2))
}
}
#Take input from the user
nterms = as.integer(readline(prompt="How many terms? "))
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This is a problem of readline in non-interactive mode. readline does not wait for a keypress and immediately executes the next instruction. The solution below is the solution posted in this other SO post.
I post below a complete answer, with the Fibonnaci numbers function a bit modified.
recurse_fibonacci <- function(n) {
# Define the initial two values of the sequence
if (n <= 1){
n
} else{
# define the rest of the terms of the sequence using recursion
Recall(n - 1) + Recall(n - 2)
}
}
#Take input from the user
cat("How many terms?\n")
repeat{
nterms <- scan("stdin", what = character(), n = 1)
if(nchar(nterms) > 0) break
}
nterms <- as.integer(nterms)
# check to see if the number of terms entered is valid
if(nterms <= 0) {
print("please enter a positive integer")
} else {
# This part actually calculates and displays the first n terms of the sequence
print("Fibonacci Sequence: ")
for(i in 0:(nterms - 1)){
print(recurse_fibonacci(i))
}
}
This code is the contents of file fib.R. Running in a Ubuntu 20.04 terminal gives
rui#rui:~$ Rscript fib.R
How many terms?
8
Read 1 item
[1] "Fibonacci Sequence: "
[1] 0
[1] 1
[1] 1
[1] 2
[1] 3
[1] 5
[1] 8
[1] 13
rui#rui:~$
To make it work with Rscript replace
nterms = as.integer(readline(prompt="How many terms? "))
with
cat ("How many terms?")
nterms = as.integer (readLines ("stdin", n = 1))
Then you can run it as Rscript fib.R, assuming that the code is in the file fib.R in the current working directory.
Otherwise, execute it with source ("fib.R") within an R shell.
Rscript does not operate in interactive mode and does not expect any input from the terminal. Check what interactive () returns in both the cases. Rscript will return FALSE as it is non-interactive, but the same function when run within an R shell (with source ()) it will be true.
?readline mentions that it cannot be used in non-interactive mode. Whereas readLines explicitely connect to stdin.
The code works fine but you shouldn't enter it into the terminal as is. My suggestion: put the code into a script file (ending .R) and source it (get help about it with ?source but it's actually pretty straightforward).
In R-Studio you can simply hit the source button.
I'm doing a sequence analysis on a large data sample. What I want to do is to rewrite my old Stata code in R, so that all of my analysis is performed in one single environment.
However, I would also like to improve it a little bit - the code is pretty long, and I would like to rewrite it using loops, so that it becomes more readable. Unfortunately my loop-writing skills are questionable.
1st loop [I think it needs to include an if statement]
I would like to write a loop for the following commands:
dt$dur.ofA1 <-(dt$M2_3R_A_1 - dt$M2_2R_A_1)
dt$dur.ofB1<-(dt$M2_3R_B_1 - dt$M2_2R_B_1)
dt$dur.ofC1<-(dt$M2_3R_C_1 - dt$M2_2R_C_1)
dt$dur.ofD1<-(dt$M2_3R_D_1 - dt$M2_2R_D_1)
dt$dur.ofE1<-(dt$M2_3R_E_1 - dt$M2_2R_E_1)
dt$dur.ofF1<-(dt$M2_3R_F_1 - dt$M2_2R_F_1)
dt$dur.ofG1<-(dt$M2_3R_G_1 - dt$M2_2R_G_1)
dt$dur.ofH1<-(dt$M2_3R_H_1 - dt$M2_2R_H_1)
dt$dur.ofA2<-(dt$M2_3R_A_2 - dt$M2_2R_A_2)
dt$dur.ofB2<-(dt$M2_3R_B_2 - dt$M2_2R_B_2)
dt$dur.ofC2<-(dt$M2_3R_C_2 - dt$M2_2R_C_2)
dt$dur.ofD2<-(dt$M2_3R_D_2 - dt$M2_2R_D_2)
dt$dur.ofE2<-(dt$M2_3R_E_2 - dt$M2_2R_E_2)
dt$dur.ofF2<-(dt$M2_3R_F_2 - dt$M2_2R_F_2)
dt$dur.ofG2<-(dt$M2_3R_G_2 - dt$M2_2R_G_2)
dt$dur.ofH2<-(dt$M2_3R_H_2 - dt$M2_2R_H_2)
dt$dur.ofA3<-(dt$M2_3R_A_3 - dt$M2_2R_A_3)
dt$dur.ofB3<-(dt$M2_3R_B_3 - dt$M2_2R_B_3)
dt$dur.ofC3<-(dt$M2_3R_C_3 - dt$M2_2R_C_3)
dt$dur.ofD3<-(dt$M2_3R_D_3 - dt$M2_2R_D_3)
dt$dur.ofE3<-(dt$M2_3R_E_3 - dt$M2_2R_E_3)
dt$dur.ofF3<-(dt$M2_3R_F_3 - dt$M2_2R_F_3)
dt$dur.ofG3<-(dt$M2_3R_G_3 - dt$M2_2R_G_3)
dt$dur.ofH3<-(dt$M2_3R_H_3 - dt$M2_2R_H_3)
My attempt:
db1 <- paste(rep("M2_", 24), "2R_", rep(LETTERS[seq( from = 1, to = 8)],3), "_",
rep(seq(from=1, to =3), 8),
sep = "")
db2 <- paste(rep("M2_", 24), "3R_", rep(LETTERS[seq( from = 1, to = 8)],3), "_",
rep(seq(from=1, to =3), 8),
sep = "")
dur <- paste(rep("dur.of", 24), rep(LETTERS[seq( from = 1, to = 8)],3),
rep(seq(from=1, to =3), 8),
sep = "")
dur <- as.list(dur)
for(e in dur){
for (j in db1){
for (i in db2){
{
dt[,e] <- dt[,i] - dt[,j]
}
I think the loop needs an if statement in the middle, so that it stops at a single item (subtracts A1 from A1, A2 from A2 etc.) from the list.
2) The second case is a little bit more complicated, but essentially it is the same case as described above:
The M2_2R_A_1 (start) M2_3R_A_1 (finish) indicate the yearly dates in which an educational activity took place. I would like to generate 1948:2013 variables that indicate that an activity took place in a particular year (stedu==x). A part of my Stata code is as follows (it goes on like that up to 2013):
recode stedu1948(0=2) if M2_2R_A_1<=1948 & 1948<= M2_3R_A_1 | M2_2R_A_2<=1948 & 1948<= M2_3R_A_2 | M2_2R_A_3<=1948 & 1948<= M2_3R_A_3
recode stedu1949(0=2) if M2_2R_A_1<=1949 & 1949<= M2_3R_A_1 | M2_2R_A_2<=1949 & 1949<= M2_3R_A_2 | M2_2R_A_3<=1949 & 1949<= M2_3R_A_3
recode stedu1950(0=2) if M2_2R_A_1<=1950 & 1950<= M2_3R_A_1 | M2_2R_A_2<=1950 & 1950<= M2_3R_A_2 | M2_2R_A_3<=1950 & 1950<= M2_3R_A_3
So in order to write a loop I would also need to include some conditions in order to stop the loop at a given point.
For your first item, use #thelatemail's suggestion. For second item, consider the following for loop using the ifelse() function:
for (i in 1948:2013) {
dt[[paste0("stedu", i)]] <- ifelse((dt$M2_2R_A_1 <= i & dt$M2_3R_A_1 >= i) OR
(dt$M2_2R_A_2 <= i & dt$M2_3R_A_2 >= i) OR
(dt$M2_2R_A_3 <= i & dt$M2_3R_A_3 >= i),
2,
dt[[paste0("stedu", i)]]
}
Using R.exe or Rterm.exe, this gives an excellent progress meter.
page=getURL(url="ftp.wcc.nrcs.usda.gov", noprogress=FALSE)
In Rgui I am limited to:
page=getURL(url="ftp.wcc.nrcs.usda.gov",
noprogress=FALSE, progressfunction=function(down,up) print(down))
which gives a very limited set of download information.
Is there a way to improve this?
I start doubting that with standard R commands it is possible to reprint overwriting the current line, which is what RCurl does in non-GUI mode.
I am glad to tell that I was wrong. At least for a single line, \r can do the trick. In fact:
conc=function(){
cat(" abcd")
cat(" ABCD", '\n')
}
conc()
# abcd ABCD
But:
over=function(){
cat(" abcd")
cat("\r ABCD", "\n")
}
over()
# ABCD
That given, I wrote this progressDown function, which can monitor download status rewriting always on the same same line:
library(RCurl) # Don't forget
### Callback function for curlPerform
progressDown=function(down, up, pcur, width){
total=as.numeric(down[1]) # Total size as passed from curlPerform
cur=as.numeric(down[2]) # Current size as passed from curlPerform
x=cur/total
px= round(100 * x)
## if(!is.nan(x) && px>60) return(pcur) # Just to debug at 60%
if(!is.nan(x) && px!=pcur){
x= round(width * x)
sc=rev(which(total> c(1024^0, 1024^1, 1024^2, 1024^3)))[1]-1
lb=c('B', 'KB', 'MB', 'GB')[sc+1]
cat(paste(c(
"\r |", rep.int(".", x), rep.int(" ", width - x),
sprintf("| %g%s of %g%s %3d%%",round(cur/1024^sc, 2), lb, round(total/1024^sc, 2), lb, px)),
collapse = ""))
flush.console() # if the outptut is buffered, it will go immediately to console
return(px)
}
return(pcur)
}
Now we can use the callback with curlPerform
curlProgress=function(url, fname){
f = CFILE(fname, mode="wb")
width= getOption("width") - 25 # you can make here your line shorter/longer
pcur=0
ret=curlPerform(url=url, writedata=f#ref, noprogress=FALSE,
progressfunction=function(down,up) pcur<<-progressDown(down, up, pcur, width),
followlocation=T)
close(f)
cat('\n Download', names(ret), '- Ret', ret, '\n') # is success?
}
Running it with a small sample binary:
curlProgress("http://www.nirsoft.net/utils/websitesniffer-x64.zip", "test.zip")
the intermediate output at 60% is (no # protection):
|................................. | 133.74KB of 222.75KB 60%
where KB, will be adjusted to B, KB, MB, GB, based on total size.
Final output with success status, is:
|.......................................................| 222.61KB of 222.75KB 100%
Download OK - Ret 0
Note, the output line width is relative to R width option (which controls the maximum number of columns on a line) and can be customised changing the curlProgress line:
width= getOption("width") - 25
This is enough for my needs and solves my own question.
Here's a simple example using txtProgressBar. Basically, just do a HEAD request first to get the file size of the file you want to retrieve, then setup a txtProgressBar with that as its max size. Then you use the progressfunction argument to curlPerform to call setTxtProgressBar. It all works very nicely (unless there is no "content-length" header, in which case this code works by just not printing a progress bar).
url <- 'http://stackoverflow.com/questions/21731548/rcurl-display-progress-meter-in-rgui'
h <- basicTextGatherer()
curlPerform(url=url, customrequest='HEAD',
header=1L, nobody=1L, headerfunction=h$update)
if(grepl('Transfer-Encoding: chunked', h$value())) {
size <- 1
} else {
size <- as.numeric(strsplit(strsplit(h$value(),'\r\nContent-Type')[[1]][1],
'Content-Length: ')[[1]][2])
}
bar <- txtProgressBar(0, size)
h2 <- basicTextGatherer()
get <- curlPerform(url=url, noprogress=0L,
writefunction=h2$update,
progressfunction=function(down,up)
setTxtProgressBar(bar, down[2]))
h2$value() # return contents of page
The output is just a bunch of ====== across the console.
What about:
curlProgress=function(url, fname){
f = CFILE(fname, mode="wb")
prev=0
ret=curlPerform(url=url, writedata=f#ref, noprogress=FALSE,
progressfunction=function(a,b){
x=round(100*as.numeric(a[2])/as.numeric(a[1]))
if(!is.nan(x) && x!=prev &&round(x/10)==x/10) prev<<-x else x='.'
cat(x)
}, followlocation=T)
close(f)
cat(' Download', names(ret), '- Ret', ret, '\n')
}
?
It prints dots or percent download divisible by 10 and breaks line on 50%.
And with a small 223 KB file:
curlProgress("http://www.nirsoft.net/utils/websitesniffer-x64.zip", "test.zip")
it sounds like this:
................10...............20................30...............40...............50
..............................70...............80...............90...............100... Download OK - Ret 0
I start doubting that with standard R commands it is possible to reprint overwriting the current line, which is what RCurl does in non-GUI mode.
I'm a beginner with R (and coding in general). In January 14 I hopefully can begin and finish a R course, but I would like to learn before. I have understanding of the basics and have used functions like read.table,intersect,cbind,paste,write.table.
But I only was able to achieve partially what I want to do with two input files (shortened samples):
REF.CSV
SNP,Pos,Mut,Hg
M522 L16 S138 PF3493 rs9786714,7173143,G->A,IJKLT-M522
P128 PF5504 rs17250121,20837553,C->T,KLT-M9
M429 P125 rs17306671,14031334,T->A,IJ-M429
M170 PF3715 rs2032597,14847792,A->C,I-M170
M304 Page16 PF4609 rs13447352,22749853,A->C,J-M304
M172 Page28 PF4908 rs2032604,14969634,T->G,J2-M172
L228,7771358,C->T,J2-M172
L212,22711465,T->C,J2a-M410
SAMPLE.CSV
SNP,Chr,Allele1,Allele2
L16,Y,A,A
P128,Y,C,C
M170,Y,A,A
P123,Y,C,C
M304,Y,C,C
M172,Y,T,G
L212,Y,-0,-0
Description what I like to do:
A) Check if SAMPLE.SNP is in REF.SNP
B) if YES check SAMPLE.Allele status (first read, second read) vs REF.Mut (Ancestral->Derived)
B1) if both Alleles are the same and match Derived create output "+ Allele1-Allele2"
B2) if both Alleles are the same and match Ancestral create output "- Allele1-Allele2"
B3) if Alleles are not the same check if Allele2 is Derived and create output "+ Allele1-Allele2"
B4) if both Alleles are "-0" create output "? NC"
B5) else create output "? Allele1-Allele2"
B6) if NO create output "? NA"
C) Write REF.CSV + output in new row (Sample) and create OUTPUT file
OUTPUT.CSV (like wanted)
SNP,Pos,Mut,Hg,Sample
M522 L16 S138 PF3493 rs9786714,7173143,G->A,IJKLT-M522,+ A-A
P128 PF5504 rs17250121,20837553,C->T,KLT-M9,- C-C
M429 P125 rs17306671,14031334,T->A,IJ-M429,? NA
M170 PF3715 rs2032597,14847792,A->C,I-M170,- A-A
M304 Page16 PF4609 rs13447352,22749853,A->C,J-M304,+ C-C
M172 Page28 PF4908 rs2032604,14969634,T->G,J2-M172,+ T-G
L228,7771358,C->T,J2-M172,? NA
L212,22711465,T->C,J2a-M410,? NC
What functions I have found interesting and tried so far.
Variant1: A) is done, but I guess it is not possible to write C) with this?
Have not tried to code down B) here
GT <- read.table("SAMPLE.CSV",sep=',',skip=1)[,c(1,3,4)]
REF <- read.table("REF.CSV",sep=',')
rownames(REF) <- REF[,1]
COMMON <- intersect(rownames(GT),rownames(REF))
REF <- REF[COMMON,]
GT <- GT[COMMON,]
GT<-cbind(REF,paste(GT[,2],'-',X[,3],sep=','))
write.table(GT,file='OUTPUT.CSV',quote=F,row.names=F,col.names=F)
Variant2: This is probably a complete mess, forgive me. I was just rying to build a solution on for if looping functions, but I haven't understood R's syntax and logic in this probably.
I was not able to get this to run - A) and C)
Have not tried to code down B) here
GT<-read.table("SAMPLE.CSV",sep=',',skip=1)[,c(1,3,4)]
rownames(GT)<-GT[,1]
REF <- read.table("REF.CSV",sep=',')
rownames(REF)<-REF[,1]
for (i in (nrow(REF))) {
for (j in (nrow(GT))) {
if (GT[j,] %in% REF[i,]) {
ROWC[i,]<-cbind(REF[i,],paste(GT[j,2],"-",GT[j,3],sep=','))
} else {
ROWC[i,]<-cbind(REF[i,],"NA",sep=',')
}
}
}
write.table(ROWC,file='OUTPUT.CSV',quote=F,row.names=F,col.names=F)
I would be just happy if you can indicate what logic/functions would lead to reach the task I have described. I will then try to figure it out. Thx.