Web scraping Obsidian Published vaults - web-scraping

I'm trying to download Obsidian public vaults like this: https://publish.obsidian.md/bryan-jenks/Z/INDEX
I would like to get in each folder all its .md (markdown) notes.
I have tried with Httrack and with wget without success, only some files are downloaded.
How should I do it?

Related

Download specific file from tar.gz

I want to extract and download a file from a tar.gz archive, from a web server, but without first downloading the entire archive, because it is large, about 3 GB. I am using a Unix-like environment. How can I achieve this with either a shell command or a with a Python module?
Related tools:
5 Ways to Preview ZIP and Download Selected Files in Archive, On Windows
Related questions:
Download specific folder from tar.gz file using wget command
How to download only a single file from an online ZIP archive via Powershell?

Download only *.Rmd files from a github repository using R or Rmd

I would like to download all of the *.Rmd files in a github repository.
For a simple example, say I wanted to use R or an Rmd file to download all of the *.Rmd files in this repo:
https://github.com/maelle/rmd-blogging-course
I tried using a bash chunk in my Rmd file and wget, but wasn't able to get the Rmd files:
#\```{bash}
wget -r -k --accept *.Rmd https://github.com/maelle/rmd-blogging-course
#\```
I've seen this previous question on how to download an entire repo, but I'm after only the files of a certain extension.
How to download entire repository from Github using R?
You should use Git to clone the repository, or if you only need one revision, you can download a tarball or a zip file, the latter of which you can access from the button that says “Code”. As far as just downloading the *.Rmd files, GitHub doesn't provide a way to recursively download a large amount of files without cloning or downloading a tarball or zip file.
While there are raw file endpoints, they won't work with wget --recursive because there are no directories. Trying to do so anyway would likely cause you to get rate-limited and possibly flagged, since those endpoints aren't intended for bulk download. A tarball or zip file will also likely be much faster as well.

R blogdown::serve_site() doesn't generate "public" folder when using Hugo "gesquive/slate" theme

I am trying to create a new blogdown project, and I've been using Yihui Xie's wonderful documentation to get started.
This works as expected:
Wen I run the below code, I see a live representation of the demo page, and the public folder is generated in my local directory.
library(blogdown)
blogdown::new_site(theme = "gcushen/hugo-academic")
blogdown::build_site()
blogdown::serve_site()
This does not work as expected:
But when I re-run using the "gesquive/slate" theme, I see a live representation of the demo page as expected, but no public folder is generated in my local directory. Without a public folder, I have nothing to send to Netlify.
library(blogdown)
blogdown::new_site(theme = "gesquive/slate")
blogdown::build_site()
blogdown::serve_site()
Why is a public folder generated when I use any Hugo theme other than "gesquive/slate"?
I expect that I'm misunderstanding something about how the package works with Hugo.
Look at the theme config.toml, the published folder (publishDir) is "docs", you can rename it to "public" if you wish.

How Unzip a Folder At Jelatics?

i'm new here. I would like to know how can i unzip a uploaded folder at apache / jelastic / Wordpress, like a folder with all my plugins or images.
You can deploy a project from an archive directly from the Jelastic dashboard - see Upload and Deploy your PHP Application in the official docs - but in this case the archive should contain your entire site (e.g. WordPress and all desired plugins, themes etc.).
Alternatively you can upload your archive (via the dashboard, FTP/S, or SFTP) and then connect to your node via SSH.
Then you can use a command like this to extract the archive in the desired location:
tar -xzvf archive.tar.gz
EDIT: Since you mentioned zip archive, the command to use would be:
unzip archive.zip

GitLab rendering the wrong README in Project & Public views

I'm using R markdown (README.Rmd) to knit/render to README.md at the top level of a project directory. GitLab (both in Project and Public views) chooses the .Rmd file to render vs the .md file and produces a visual mess instead of a nicely-formatted project description.
Is there any way to tell GitLab to ignore .Rmd files when picking the "right" one to use for the project/public view or am I left with a workflow that will mean keeping the README.Rmd in a separate directory then having the R project build process render and copy a knitted README.md to the top-level project directory?
This is a fresh install (this week) of a self-hosted instance of GitLab, but you can see it rendering the wrong README here.
I've reproduced this issue. I'll look into this tomorrow and make a PR for fixing this.

Resources