Permalink URLs in CSL Bibliography - permalinks

I'm writing a research paper in deep learning, so some of my citations are inevitably to things like Medium posts. To avoid link rot, I created perma.cc links to these posts; using my reference manager (Mendeley), I added both the original URL and the permalink URL.
In the exported BibTeX file, I see that both URLs are included in one URL field, separated by a space. However, the CSL processor I'm using only includes the first URL in the bibliography.
A previous question asked how to change a CSL style to include two URL fields, and the answer was "CSL can't do that", but there wasn't any discussion of what a use case would be for that. So, if I can't do that with CSL, what SHOULD I do? Is it okay (academically) to cite a perma.cc link but not the original URL? Is there another field I can abuse to store a permalink?
I don't want this citation to depend on Medium staying in service indefinitely. Especially since the page doesn't load on the Wayback machine (which apparently gets caught in reload loops with Medium articles).

On the CSL end, you can basically use any variable you want for an archived link -- most logically I'd suggest archive. You might have to adjust the citation style to print that -- that'd depend on the style.
Unfortunately, Mendeley doesn't have a field for archive, so you'd either have to use something less suitable (maybe Series mapping to CSL collection? -- no really good options I'm seeing) or, if using the Desktop version of Mendeley, add archive to the Notes in the form:
archive: perma.cc/9265-T4NB. That gets picked up by citation styles.

Related

Recipes PDF batch extraction

I am now working with 500 pdf recipes files, which I want to display in my website. How can I batch extract them and display information on PDF to my website? PDF has all the information for recipes. For each recipe, I need to display its description, image, ingredients, instructions, nutrition label and so on. Is there any way so that I don't need to work on it manually?
Do these all have the same basic template for how the information is structured? This isn't really specifically a WordPress issue. One thing you can do is use Go to loop through and process all the files. I played with Go and it's incredibly fast to parse large amounts of information. Maybe you can fiddle with it in this library here https://github.com/unidoc/unidoc.
There are a lot of library options to try in PHP also. Here's just one example https://www.pdfparser.org/. There's documentation here and you can install it via composer. https://www.pdfparser.org/documentation
If every recipe follows the same sort of template, and you want to extract specific details in specific sections of the PDF, it should be easy enough. If you don't mind extracting all the text from a PDF and just display that on your website, it should be easy enough using one of the libraries. If you go the Golang route, you could just parse all the text for each PDF, save them to a file, and just upload them using PHP and have the PHP code insert everything into custom post types or something.

How can I automatically add author and copyright info to newly created ipython notebooks?

I'd like to pre-pend a simple author and copyright clause the beginning of newly created ipython notebooks. Is this possible? If so how can it be done?
We want to support this in metadata at notebook top level, but nobody has taken time to write a proposal for metadata structure, how to edit it, and how to show it.
This would be usefull for view on nbviewer, but also for conversion to LaTeX, and other format. It might just be slightly more complicated that at first thought, as you probably want the Authors to be more that just first name/last name (like a full embeded vcard for example).
If you want to work on that you are welcomed, otherwise in the meantime I suggest adding a simple markdown cell at top with those info.
This should be easy to do on a buch on notebook at once as they are easy parsable json.

Bulk upload of Microsoft Word files to WordPress pages

I have been asked to upload 200 Microsoft Word documents — many of them containing lengthy, complex math problems or scientific notation — into a WordPress setting. Each Word file would become a separate WordPress post.
I would clearly prefer to not cut-and-paste each file one-by-one into a post and then save it . Does anyone know of a way to automate the process while ensuring the accuracy of the translation, or at least minimizing the number of issues we might find when converting from Word to WordPress? Or am I dreaming the impossible dream?
Thanks for any input you can offer.
Sounds like an interesting problem. I have an idea that might be worth exploring. There are a number of free or shareware tools that can convert Word docs to HTML.
If you can manage to convert them into decently clean markup with one of those tools, I would recommend using the HTML Import 2 WordPress plugin. It can take a batch of HTML files and create Posts / Pages out of them.
It's a two step process, but I bet it'll work. (And certainly be faster than copy/paste 200 times).
Hope that helps, have fun!
Well I got the solution which works for me, but its bit manual but still save a lots of time.
Here are the Steps.
Connect your Blog to Ms word 2007/2013
Make sure Remote writing is Enabled in WP
Copy all post in one Word document or use merge to make one single DOC.
Now Set Default posting category from WP and Save it.
Now from your MSWORD copy the post and start posting one by one.
Tips:
Make Shortcut key for publishing.
Use Ctrl+C for text before publishing.
Make shortcut for publishing to WP

Best way to design URIs when they are based on user generated content

In our system, we have URLs for pages where the content, including the title, is based on user generated content. I'm trying to figure out the best design that balances SEO, human readability and resiliency.
I've been reading a bunch of material on this, including Tim Berners-Lee's document from way back: Cool URIs don't change.
As an example, imagine I have a book review site where users are submitting content (a worded review) and the book's title and author.
So if they submitted a book review for A Tale of Two Cities (user unintentionally mispells it) with Author of Charles Dickens. The URL could be:
http://foo.com/charles-dickens/a-tale-of-two-cities
Later on, if another book by Dickens is added, it could be:
http://foo.com/charles-dickens/oliver-twist
Then http://foo.com/charles-dickens/ could be a list of all the reviewed books on the site.
However, the problem comes into play if a change is made to book title. Imagine the user mispelled something, like A Tale of Two City, then it's later corrected. This would also change the URL and would break any external links to that page, pagerank, etc.
What is the recommended way to handle this type of problem? Options I see:
First commit wins: No changes to URL are possible after it's initially established
Last commit wins: Always change the URL. So if there's a change to the User generated content, revise the URL. With this approach, either the old URL is dead or a trail is preserved of all the URL changes and all of them still function. Stackoverflow seems to to this.
Don't base URL on UGC: Ignore the user generated content and just come up with URLs not based on it. So url could be http://foo.com/reviews/1234.
What are people's thoughts on this?
You're slightly wrong; Stack Overflow combines #2 and #3. A question has a specific id, and that's all you need to locate the question. For example, this question's id is 11011252. You can access the question with https://stackoverflow.com/questions/11011252, no need to add the portion of the URL (or would you call it a URI in this case?) generated from the question title. In fact, that will get automatically tacked on (whether by redirect or some other method) when you use the titleless address.
Even better, you can append whatever you want (within reason, I suppose) to the end of the address. https://stackoverflow.com/questions/11011252/this-text-will-be-ignored will take you to the question without any problem.
Stack Overflow isn't the only website that does this, either; many other sites I've seen focused on user-generated content follow the same protocol/whatever you call it. It seems like the best method to go with, as it combines the advantages of #3 (underlying URI remains the same) with the advantages of #2 (the URL contains some information about its target, which users will like), and best of all means you won't get any URI conflicts if two people generate content with the same non-unique identifiers.

Storing content in multiple languages? E.g. English, French, German

How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.

Resources