I want to crawl/scrape a wordpress website with wget.
Problem: wget will download documents/links despite them having a rel=nofollow attribute. And yes I do allow robots.txt.
Example:
wget --mirror --page-requisites --adjust-extension --convert-links --restrict-file-names=windows --no-parent --span-hosts --domains=randomascii.wordpress.com,wp.com https://randomascii.wordpress.com/about/
Now open the about folder and after some seconds you will see dozens of html files that stem from nofollow links: index.html#share=reddit.html, index.html#share=twitter.html, index.html#replytocom=74214.html ...
GNU Wget 1.20.3 built on msys.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/openssl
Wgetrc:
/etc/wgetrc (system)
Locale:
/usr/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib -DHAVE_LIBSSL
-DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe
Link:
gcc -DHAVE_LIBSSL -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe
-pipe -lmetalink -lexpat -lpcre2-8 -luuid -lssl -lcrypto -lz -lz
-lpsl -lidn2 -liconv -lunistring -lgpgme -lassuan -lgpg-error
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a -liconv -lintl
/usr/lib/libunistring.dll.a
Related
I am writing an example of an R-package that uses C++ to call a C-function from an external library (I used the chron.dll from the R-package chron).
The version of R is R-4.2.2. On my machine (Windows 10) rtools42 is installed in C:/Program Files aka C:/PROGRA~1.
Since this is not the standard location I am running
Sys.setenv(PATH = paste("C:/PROGRA~1/rtools42/bin/", Sys.getenv("PATH"), sep=";"))
Sys.setenv(BINPREF = "C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/")
prior to the build. Then I check the build with devtools::check() and it seems that compilation and linking works but the installation
fails. The 00install.out file is as follows:
* installing *source* package 'PackageExample' ...
** using staged installation
** libs
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-42~1.2/include" -DNDEBUG -I../chron/include -I. -I'C:/Users/JohnDoe/PROJECTS/R/libraries/Rcpp/include' -I"C:/Programme/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c RcppExports.cpp -o RcppExports.o
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-42~1.2/include" -DNDEBUG -I../chron/include -I. -I'C:/Users/JohnDoe/PROJECTS/R/libraries/Rcpp/include' -I"C:/Programme/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c callchron.cpp -o callchron.o
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-42~1.2/include" -DNDEBUG -I../chron/include -I. -I'C:/Users/JohnDoe/PROJECTS/R/libraries/Rcpp/include' -I"C:/Programme/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c matrix.cpp -o matrix.o
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-42~1.2/include" -DNDEBUG -I../chron/include -I. -I'C:/Users/JohnDoe/PROJECTS/R/libraries/Rcpp/include' -I"C:/Programme/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c rcpp_hello_world.cpp -o rcpp_hello_world.o
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-42~1.2/include" -DNDEBUG -I../chron/include -I. -I'C:/Users/JohnDoe/PROJECTS/R/libraries/Rcpp/include' -I"C:/Programme/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c vector.cpp -o vector.o
C:/PROGRA~1/rtools42/x86_64-w64-mingw32.static.posix/bin/g++ -shared -s -static-libgcc -o PackageExample.dll tmp.def RcppExports.o callchron.o matrix.o rcpp_hello_world.o vector.o -L../chron/libs/x64 -lchron -LC:/Programme/rtools42/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/Programme/rtools42/x86_64-w64-mingw32.static.posix/lib -LC:/PROGRA~1/R/R-42~1.2/bin/x64 -lR
installing to C:/Users/JohnDoe/AppData/Local/Temp/Rtmpg9Wn7W/file43d46e066097/PackageExample.Rcheck/00LOCK-PACKAG~1/00new/PackageExample/libs/x64
** R
** byte-compile and prepare package for lazy loading
Reading Tests.R.
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Error: loading of pakckage or namespace 'PackageExample' in inDL(x, as.logical(local), as.logical(now), ...): failed
cannot load shared object 'C:/Users/JohnDoe/AppData/Local/Temp/Rtmpg9Wn7W/file43d46e066097/PackageExample.Rcheck/00LOCK-PACKAG~1/00new/PackageExample/libs/x64/PackageExample.dll':
LoadLibrary failure: the referenced module was not found.
Error: loading failed
Execution terminated
ERROR: loading failed
* removing 'C:/Users/JohnDoe/AppData/Local/Temp/Rtmpg9Wn7W/file43d46e066097/PackageExample.Rcheck/PackageExample'
(Some error messages translatd from German).
Interestingly the folder
C:/Users/JohnDoe/AppData/Local/Temp/Rtmpg9Wn7W/
contains files foo.dll, foo.o.
It looks like the PackageExample.dll is not put into
'C:/Users/JohnDoe/AppData/Local/Temp/Rtmpg9Wn7W/file43d46e066097/PackageExample.Rcheck/00LOCK- PACKAG~1/00new/PackageExample/libs/x64/
and instead foo.dll and foo.o are generated. What might be the problem?
Project file:
CONFIG -= qt
QMAKE_CXXFLAGS += -std=c++98
QMAKE_CXXFLAGS -= -std=c++11
QMAKE_CXXFLAGS -= -std=gnu++11
CONFIG -= c++11
Result:
g++ -c -pipe -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-but-set-variable -Wno-reorder -Wno-missing-field-initializers -std=c++98 -DDEBUG -g -std=gnu++11 -Wall -W -fPIC -DQT_QML_DEBUG -I../../qqq -I. -I../src/jsoncpp -I../lib -I../../../../Qt/5.9.1/gcc_64/mkspecs/linux-g++ -o ../build/debug/obj/TaskManager.o ../src/TaskManager.cpp
g++ -c -pipe -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-but-set-variable -Wno-reorder -Wno-missing-field-initializers -std=c++98 -DDEBUG -g -std=gnu++11 -Wall -W -fPIC -DQT_QML_DEBUG -I../../qqq -I. -I../src/jsoncpp -I../lib -I../../../../Qt/5.9.1/gcc_64/mkspecs/linux-g++ -o ../build/debug/obj/Utils.o ../src/Utils.cpp
As you see option -std=gnu++11 still exist (no any reactoion to "QMAKE_CXXFLAGS -="/"CONFIG -=").
Qt:
Qt Creator 4.3.1
Based on Qt 5.9.1 (GCC 5.3.1 20160406 (Red Hat 5.3.1-6), 64 bit)
Built on Jun 29 2017 04:10:39
Thank you a lot.
P.S. I have compatibility troubles with gnu++11 (c++ with gcc extenstions) on clean system
./qqq: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by ./qqq)
./qqq: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./qqq)
./qqq: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found
and want to use c++11 (c++ without gcc extenstions) instead gnu++11 without any extenstions but I cannot to disable it :-(
I'm not new to R but can't figure out what went wrong. I'm just trying to install RcppEigen package using install.packages('RcppEigen') and receive the above error.
The command below (issued by the installer) fails:
g++ -m64 -I/usr/include/R -DNDEBUG -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include -std=c++11 -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c RcppEigen.cpp -o RcppEigen.o
I have installed both R-Rcpp and R-Rcpp-devel version 0.12.12 from EPEL repository as well as eigen3-devel v 3.2.5 (not sure whether it is required but anyway ...)
I cloned RcppEigen from Github and tried to build in RStudio - the same error.
Makevars has PKG_CXXFLAGS = -I../inst/include but the compiler is invoked as below:
g++ -m64 -I/usr/include/R -DNDEBUG -I"/home/zer0hedge/R/x86_64-redhat-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include -std=c++11 -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c RcppEigen.cpp -o RcppEigen.o
For some reason, I had PKG_CXXFLAGS defined in $HOME/.R/Makevars. It erroneously overrided PKG_CXXFLAGS in Makevars in src directory of the package and prevented compilation of C++ files
I am trying to install rgdal (a dependency of gstat) in R on a Calculate Linux (effectively a generic Gentoo) clean installation. I have sci-libs/gdal installed. Here are the last few lines of output:
x86_64-pc-linux-gnu-g++ -I/usr/lib64/R/include -DNDEBUG -I/usr/include/gdal -I"/home/wjc721/R/x86_64-pc-linux-gnu-library/3.2/sp/include" -fpic -O2 -march=x86-64 -pipe -c OGR_write.cpp -o OGR_write.o
x86_64-pc-linux-gnu-g++ -I/usr/lib64/R/include -DNDEBUG -I/usr/include/gdal -I"/home/wjc721/R/x86_64-pc-linux-gnu-library/3.2/sp/include" -fpic -O2 -march=x86-64 -pipe -c gdal-bindings.cpp -o gdal-bindings.o
x86_64-pc-linux-gnu-gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG -I/usr/include/gdal -I"/home/wjc721/R/x86_64-pc-linux-gnu-library/3.2/sp/include" -fpic -O2 -march=x86-64 -pipe -c init.c -o init.o
x86_64-pc-linux-gnu-gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG -I/usr/include/gdal -I"/home/wjc721/R/x86_64-pc-linux-gnu-library/3.2/sp/include" -fpic -O2 -march=x86-64 -pipe -c inverser.c -o inverser.o
inverser.c:3:22: fatal error: projects.h: No such file or directory
#include <projects.h>
^
compilation terminated.
Existing answers on Stackoverflow are for Linux distributions other than Gentoo. They suggest installation of packages (in Debian) such as libgdal1h, libgdal1-dev, libproj-dev and gdal-bin, none of which exist in Gentoo.
Any help would be very much appreciated! It was working fine on the previous version of Calculate :(
Thanks,
Bill
Edit: I upgraded R from v3.2.2 to 3.3.2 and gdal from 2.0.2-r3 to 2.0.3. This did not help, the error is the same.
I encountered the same problem. Upgrading to proj-4.9.2 did the trick.
I just stumbled over a linker error when trying to install some R packages which have Rcpp as a dependency. My setup is Mac OS X 10.9.1 (Mavericks), R 3.0.2 installed by Homebrew. Here's the error output:
> install.packages('Rcpp')
trying URL 'http://cran.fhcrc.org/src/contrib/Rcpp_0.10.6.tar.gz'
Content type 'application/x-gzip' length 1985569 bytes (1.9 Mb)
opened URL
==================================================
downloaded 1.9 Mb
* installing *source* package ‘Rcpp’ ...
** package ‘Rcpp’ successfully unpacked and MD5 sums checked
** libs
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c Date.cpp -o Date.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c Module.cpp -o Module.o
clang -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -c Rcpp_init.c -o Rcpp_init.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c Timer.cpp -o Timer.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c api.cpp -o api.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c attributes.cpp -o attributes.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c barrier.cpp -o barrier.o
clang++ -I/usr/local/Cellar/r/3.0.2/R.framework/Resources/include -DNDEBUG -I../inst/include/ -I/usr/local/include -fPIC -g -O2 -c exceptions.cpp -o exceptions.o
clang++ -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o Rcpp.so Date.o Module.o Rcpp_init.o Timer.o api.o attributes.o barrier.o exceptions.o -F/usr/local/Cellar/r/3.0.2/R.framework/.. -framework R -lintl -Wl,-framework -Wl,CoreFoundation
ld: library not found for -lintl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Apparently, libintl is part of the gettext package. I did the following, possibly redundant reinstall to make sure my copy was up-to-date:
$ brew install gettext
Warning: gettext-0.18.3.2 already installed
$ brew reinstall gettext
==> Reinstalling gettext
==> Downloading http://ftpmirror.gnu.org/gettext/gettext-0.18.3.2.tar.gz
Already downloaded: /Library/Caches/Homebrew/gettext-0.18.3.2.tar.gz
==> ./configure --prefix=/usr/local/Cellar/gettext/0.18.3.2 --with-included-gettext --with-included-glib --with-included-libcroco --with-included-libunistring --with-emac
==> make
==> make install
==> Caveats
This formula is keg-only, so it was not symlinked into /usr/local.
OS X provides the BSD gettext library and some software gets confused if both are in the library path.
Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:
LDFLAGS: -L/usr/local/opt/gettext/lib
CPPFLAGS: -I/usr/local/opt/gettext/include
It says in the above output that brew doesn't symlink the library, which might explain why install.packages can't find it. What did the trick was adding a library path into ~/.R/Makevars like so:
PKG_LIBS=-L/usr/local/Cellar/gettext/0.18.3.2/lib
This answer is to modify Giupo's answer as it contains a typo but I believe it is important enough to be more prominent than a comment. The solution is a very effective way to install the Rserve package from Homebrew without causing broader problems on OSX:
flags="CPPFLAGS=-I/usr/local/opt/gettext/lib LDFLAGS=-L/usr/local/opt/gettext/include"
install.packages('Rserve', configure.args=flags)
To reduce namespace pollution even more can wrap in local:
local({
flags="CPPFLAGS=-I/usr/local/opt/gettext/lib LDFLAGS=-L/usr/local/opt/gettext/include"
install.packages('Rserve', configure.args=flags)})
I wanna add my 2 cents to the quest by suggesting a less intrusive (meaning: no files/env changes for the user bringing unwanted side-effects in the future)
Take note of LDFLAGS and CPPFLAGS by reinstalling gettext as #cbare did and pass them to install.packages (inside R) with the configure.args param:
flags="LDFLAGS=-L/usr/local/opt/gettext/lib CPPFLAGS=-I/usr/local/opt/gettext/include"
install.packages('Rcpp', configure.args=flags)
This should do the trick (it worked for me while struggling with the same problem installing Rserve).
This worked fine for me:
brew link gettext --force