I'm trying to assign the result of a chain matrix multiplication in Maxima to a new variable. I'm not sure as a new user why line %o6 isn't the same as the previous and fully evaluate the chain. Also why when I enter the new variable name "B" I simply have "B" returned back to me and not ([32, 32], [32, 32]). Basic questions I know but I've searched the documentation for a number of hours, and tutorials, and the syntax that I'm supposed to use here to get what I guess I was expecting as output, is still unclear to me.
I can't tell for sure, but it appears that the problem is that B : A.A.A is entered holding the shift key for at least one of the spaces, and Shift+Space is interpreted as non-breaking space instead of ordinary space. This appears to be a known bug or at least a serious misfeature in wxMaxima; see: https://github.com/wxMaxima-developers/wxmaxima/issues/1031
(I say misfeature because Shift+Space --> non-breaking space is documented in the wxMaxima documentation, but it seems like a classic example of "bad affordance"; it is all too easy to do the wrong thing without knowing it. Anyway this is just my opinion.)
I built wxMaxima from current source code and it appears that Shift+Space is now not interpreted as non-breaking space in code, so B : A.A.A should have the expected effect even if shift key is held while typing space. The current version is 19.07.0-DevelopmentSnapshot. I poked through the commit log a bit, but I can't figure out which commit changed the behavior of Shift+Space, so it's possible that the problem is not fixed and it is just fortuitous that I am not encountering it.
There are two workarounds, if one doesn't want to hazard an upgrade. (1) Omit spaces. (2) Be careful to only type space without shift.
Hope this is helpful in some way.
I have a list of names in my dataframe and I want to find a way to query them in Wikipedia, although it's not as simple as just appending the name to "https://en.wikipedia.org/wiki/", I want to actually query Wikipedia so that there will be a suggestion even if its not spelt correctly. So for example if I were to put in Dick Dawkins, it'd come up with Richard Dawkins. I checked and that is actually the first hit on Wikipedia.
Ideally I'd want to use RVest but I don't want to manually get every url. Is this possible?
You are right. I, too, had a hard time getting Dick Dawkins out of the wikipedia. So much so that even searching for Dick Dawkins on the wikipedia search brought me straight to Richard Dawkins.
However, if you want to search for a term (say "Richard Dawkins") then Wikipedia has a proper API for you (https://www.mediawiki.org/wiki/API:Tutorial). You can play around and find the right parameters that work for you.
Just to get you started, I wrote a function (which is somewhat similar to rg255's post). You can change the parameter for MySearch function. Please make sure that spaces in search string are replaced by '%20' for every query from your dataframe. Simple gsub function should do the job. You will also have to install 'jsonlite' package for this to work.
library(jsonlite)
MySearch <- function(srsearch){
FullSearchString <- paste("http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=",srsearch,"&format=json",sep="")
Response <- fromJSON(FullSearchString)
return(Response)
}
Response <- MySearch("Richard%20Dawkins")
You can now use the parsed JSON to use the properties that you want. As I said, you will have to play with the parameters to get it right.
Please let me know if this is not what you wanted.
I've been playing around with Google's OCR recently using the default tutorial and was trying to parse numbers. I've seen previous issues dealing with numbers on license plates, but was wondering if there was a solution when special characters affect the results of OCR. Most notably, including the '#' character with a number, such as #1, #2, etc as shown below results in the output ##Z#T# and even occasionally gives me Chinese characters even after I set the language to/from settings to English.
Numbers with pound sign
For a similar comparison, the image below is easily read by the OCR:
Numbers without pound sign
Is there a setting that I'm missing that can improve the results or is this just a constraint by the model?
Whenever I try to use symbols in a plotmath expression in R, I get white squares. For example, when I run demo(plotmath), I get the following.
Does anyone have an idea where the problem may lie? I am using R 3.4.1 in Rstudio on Mac OS X 10.11.6.
Update:
As mentioned in the comments, it seems to be an issue with my fonts.
When I look at Symbol, I have two "Symbol Regular"s, and the second one appears as question marks when viewing both together. However, when I click on the second one individually, the fonts appear normally. I tried to validate fonts and remove duplicates, but Font Book did not detect any problems. What should I do?
If you go to Fontbook.app and examine the Symbol font, is it perhaps duplicated or can you see any other evidence of corruption? – 42- 25 mins ago
#42- Thanks very much, it does seem to be an issue with the Symbol font. I listed what I see above; do you know what I should do with the font to fix it? – angryavian 9 mins ago
Delete it. It will get replaced from some magic Apple storeroom buried deep in the bowels of the System.
I don't know how this happens, but it used to happen to me fairly often. Doesn't seem to be happening lately. I remain puzzled. I'm guessing there may be answers at Ask Different (but I didn't find an answer.) Whatever the mechanism it's been around for a long, long time:
http://hints.macworld.com/article.php?story=20031025010930633
I'm reading the R FAQ source in texinfo, and thinking that it would be easier to manage and extend if it was parsed as an R structure. There are several existing examples related to this:
the fortunes package
bibtex entries
Rd files
each with some desirable features.
In my opinion, FAQs are underused in the R community because they lack i) easy access from the R command-line (ie through an R package); ii) powerful search functions; iii) cross-references; iv) extensions for contributed packages. Drawing ideas from packages bibtex and fortunes, we could conceive a new system where:
FAQs can be searched from R. Typical calls would resemble the fortune() interface: faq("lattice print"), or faq() #surprise me!, faq(51), faq(package="ggplot2").
Packages can provide their own FAQ.rda, the format of which is not clear yet (see below)
Sweave/knitr drivers are provided to output nicely formatted Markdown/LaTeX, etc.
QUESTION
I'm not sure what is the best input format, however. Either for converting the existing FAQ, or for adding new entries.
It is rather cumbersome to use R syntax with a tree of nested lists (or an ad hoc S3/S4/ref class or structure,
\list(title = "Something to be \\escaped", entry = "long text with quotes, links and broken characters", category = c("windows", "mac", "test"))
Rd documentation, even though not an R structure per se (it is more a subset of LaTeX with its own parser), can perhaps provide a more appealing example of an input format. It also has a set of tools to parse the structure in R. However, its current purpose is rather specific and different, being oriented towards general documentation of R functions, not FAQ entries. Its syntax is not ideal either, I think a more modern markup, something like markdown, would be more readable.
Is there something else out there, maybe examples of parsing markdown files into R structures? An example of deviating Rd files away from their intended purpose?
To summarise
I would like to come up with:
1- a good design for an R structure (class, perhaps) that would extend the fortune package to more general entries such as FAQ items
2- a more convenient format to enter new FAQs (rather than the current texinfo format)
3- a parser, either written in R or some other language (bison?) to convert the existing FAQ into the new structure (1), and/or the new input format (2) into the R structure.
Update 2: in the last two days of the bounty period I got two answers, both interesting but completely different. Because the question is quite vast (arguably ill-posed), none of the answers provide a complete solution, thus I will not (for now anyway) accept an answer. As for the bounty, I'll attribute it to the answer most up-voted before the bounty expires, wishing there was a way to split it more equally.
(This addresses point 3.)
You can convert the texinfo file to XML
wget http://cran.r-project.org/doc/FAQ/R-FAQ.texi
makeinfo --xml R-FAQ.texi
and then read it with the XML package.
library(XML)
doc <- xmlParse("R-FAQ.xml")
r <- xpathSApply( doc, "//node", function(u) {
list(list(
title = xpathSApply(u, "nodename", xmlValue),
contents = as(u, "character")
))
} )
free(doc)
But it is much easier to convert it to text
makeinfo --plaintext R-FAQ.texi > R-FAQ.txt
and parse the result manually.
doc <- readLines("R-FAQ.txt")
# Split the document into questions
# i.e., around lines like ****** or ======.
i <- grep("[*=]{5}", doc) - 1
i <- c(1,i)
j <- rep(seq_along(i)[-length(i)], diff(i))
stopifnot(length(j) == length(doc))
faq <- split(doc, j)
# Clean the result: since the questions
# are in the subsections, we can discard the sections.
faq <- faq[ sapply(faq, function(u) length(grep("[*]", u[2])) == 0) ]
# Use the result
cat(faq[[ sample(seq_along(faq),1) ]], sep="\n")
I'm a little unclear on your goals. You seem to want all the R-related documentation converted into some format which R can manipulate, presumably so the one can write R routines to extract information from the documentation better.
There seem to be three assumptions here.
1) That it will be easy to convert these different document formats (texinfo, RD files, etc.) to some standard form with (I emphasize) some implicit uniform structure and semantics.
Because if you cannot map them all to a single structure, you'll have to write separate R tools for each type and perhaps for each individual document, and then the post-conversion tool work will overwhelm the benefit.
2) That R is the right language in which to write such document processing tools; suspect you're a little biased towards R because you work in R and don't want to contemplate "leaving" the development enviroment to get information about working with R better. I'm not an R expert, but I think R is mainly a numerical language, and does not offer any special help for string handling, pattern recognition, natural language parsing or inference, all of which I'd expect to play an important part in extracting information from the converted documents that largely contain natural language. I'm not suggesting a specific alternative language (Prolog??), but you might be better off, if you succeed with the conversion to normal form (task 1) to carefully choose the target language for processing.
3) That you can actually extract useful information from those structures. Library science was what the 20th century tried to push; now we're all into "Information Retrieval" and "Data Fusion" methods. But in fact reasoning about informal documents has defeated most of the attempts to do it. There are no obvious systems that organize raw text and extract deep value from it (IBM's Jeopardy-winning Watson system being the apparent exception but even there it isn't clear what Watson "knows"; would you want Watson to answer the question, "Should the surgeon open you with a knife?" no matter how much raw text you gave it) The point is that you might succeed in converting the data but it isn't clear what you can successfully do with it.
All that said, most markup systems on text have markup structure and raw text. One can "parse" those into tree-like structures (or graph-like structures if you assume certain things are reliable cross-references; texinfo certainly has these). XML is widely pushed as a carrier for such parsed-structures, and being able to represent arbitrary trees or graphs it is ... OK ... for capturing such trees or graphs. [People then push RDF or OWL or some other knoweldge encoding system that uses XML but this isn't changing the problem; you pick a canonical target independent of R]. So what you really want is something that will read the various marked-up structures (texinfo, RD files) and spit out XML or equivalent trees/graphs. Here I think you are doomed into building separate O(N) parsers to cover all the N markup styles; how otherwise would a tool know what the value markup (therefore parse) was? (You can imagine a system that could read marked-up documents when given a description of the markup, but even this is O(N): somebody still has to describe the markup). One this parsing is to this uniform notation, you can then use an easily built R parser to read the XML (assuming one doesn't already exist), or if R isn't the right answer, parse this with whatever the right answer is.
There are tools that help you build parsers and parse trees for arbitrary lanuages (and even translators from the parse trees to other forms). ANTLR is one; it is used by enough people so you might even accidentally find a texinfo parser somebody already built. Our DMS Software Reengineering Toolkit is another; DMS after parsing will export an XML document with the parse tree directly (but it won't necessarily be in that uniform representation you ideally want). These tools will likely make it relatively easy to read the markup and represent it in XML.
But I think your real problem will be deciding what you want to extract/do, and then finding a way to do that. Unless you have a clear idea of how to do the latter, doing all the up front parsers just seems like a lot of work with unclear payoff. Maybe you have a simpler goal ("manage and extend" but those words can hide a lot) that's more doable.