Installing the RODBC package on Ubuntu is a bit of a kludge. First I learned to install the following:
$ sudo apt-get install r-cran-rodbc
That wasn't good enough as the package was still looking for header files. I solved this issue by:
$ sudo apt-get install unixodbc-dev
Good, RODBC installed properly on the Ubuntu machine. But when I try to run the following script:
## import excel file from Dropbox
require("RODBC")
channel <- odbcConnectExcel("~/Dropbox/DATA/SAMPLE/petro.xls")
petro <- sqlFetch (channel, "weekly")
odbcClose(channel)
str(petro)
head(petro)
I get an error thrown that function odbcConnectExcel not found. I checked the case of each letter, making sure it was not a simple typo. Nope. Then I ran this same script on a Windows R installation (file path different, of course) and the script works.
Any idea of why Ubuntu R installation cannot find the odbcConnectExcel function and how I can get this to work?
That functionality is available where Excel is available. In other words: not on Ubuntu.
For reference, from the R Data Import / Export manual (with my highlighting):
4.3.2 Package RODBC
Package RODBC on CRAN provides an
interface to database sources
supporting an ODBC interface. This is
very widely available, and allows the
same R code to access different
database systems. RODBC runs on
Unix/Linux, Windows and Mac OS X, and
almost all database systems provide
support for ODBC. We have tested
Microsoft SQL Server, Access, MySQL,
PostgreSQL, Oracle and IBM DB2 on
Windows and MySQL, Oracle, PostgreSQL
and SQLite on Linux.
ODBC is a client-server system, and we
have happily connected to a DBMS
running on a Unix server from a
Windows client, and vice versa.
On Windows ODBC support is normally
installed, and current versions are
available from
http://www.microsoft.com/data/odbc/ as
part of MDAC. On Unix/Linux you will
need an ODBC Driver Manager such as
unixODBC (http://www.unixODBC.org) or
iOBDC (http://www.iODBC.org: this is
pre-installed in Mac OS X) and an
installed driver for your database
system.
Windows provides drivers not just for
DBMSs but also for Excel (.xls)
spreadsheets, DBase (.dbf) files and
even text files. (The named
applications do not need to be
installed. Which file formats are
supported depends on the the versions
of the drivers.) There are versions
for Excel 2007 and Access 2007 (go to
http://download.microsoft.com, and
search for Office ODBC, which will
lead to AccessDatabaseEngine.exe), the
`2007 Office System Driver'.
I've found RODBC to be a real pain in the Ubuntu. Maybe it's because I don't know the right incantations, but I switched to RJDBC and have had much better luck with it. As discussed here.
As Dirk says, that wont solve your Excel problem. For writing Excel I've had very good luck with the WriteXLS package. In Ubuntu I found it quite easy to set up. I had Perl and many of the packages already installed and had to simply install Text::CSV_XS which I installed with the GUI package manager. The reason I like WriteXLS is the ability to write data frames to different sheets in the Excel file. And now that I look at your question I see that you want to READ Excel files not WRITE them. Hell. WriteXLS doesn't do that. Stick with gdata, like Dirk said in his comments:
gdata on CRAN and you are going to want the read.xls() function:
read.xls("//path//to/excelfile.xls", sheet = 1, verbose=FALSE, pattern, ...,
method=c("csv","tsv","tab"), perl="perl")
you may need to run installXLSXsupport which installs the needed Perl modules.
read.xls expect sheet numbers, not names. The method parameter is simply the intermediate file format. If your data has tabs then don't use tab as the intermediate format. And likewise for commas and csv.
Related
I'm noticing an issue where the pdftools package in R seems to be performing differently when run locally on my Windows 7 machine versus when I run it on a shared Ubuntu server via ssh.
My code:
download.file("http://www.nber.org/lbid/docs/LinkCO95Guide.pdf",
"1995codebook.pdf",
mode = "wb",
method = "libcurl")
codebook <- pdf_text("1995codebook.pdf")
On my local windows 7 machine, the object codebook shows up as "Large character (258 elements, 710.2 Kb)", whereas on the Ubuntu server it shows up as "Large character (258 elements, 701.9 Kb)".
As you might imagine, this is causing problems for me downstream where code that works on my local machine is not producing the same results on the Ubuntu server. Looking at the text contained in codebook the first difference I notice right away is that where the version produced on Windows has "\r\n" the version produced on Ubuntu only has "\n" instead (I rely on "\r\n" downstream).
Why would that character series be different? Might it have something to do with encoding? Any help appreciated on what's causing this and how I can get the same results on both machines.
One last thing to mention: I had to install the poppler library to my home directory on the Ubuntu server (don't have sudo access) in order to get pdftools to install:
apt-get source poppler
cd poppler-0.24.5
./configure --prefix=$HOME/myapps
make
make install
export PKG_CONFIG_PATH=$HOME/myapps/lib/pkgconfig
After having done so, install.packages("pdftools") seems to run correctly. And pdftools loads without issue. So if it's a bad install I'm not sure what has gone wrong.
A few things:
Windows has different line endings, this is extensively documented. This alone accounts for the size difference
Even after the download, you can convert between both conventions. One tool to do so is dos2unix which you can get via apt-get install dos2unix
You are making your life too complicated by building poppler. As the configure script for pdftools says, just install the library via apt-get install libpoppler-cpp-dev
However: most sane programs, and R included, treat \r\n and n identically so your imported data should be the same. If yours does not,
use dos2unix or equivalent tools to convert as needed. In the longer run you want your code to not care.
In the R FAQ section 4.6 (Package TclTk does not work) I found the following sentence:
... although they [missing Tcl/tk packages] may be downloaded via the Teacup facility
What is "teacup"? How can I install and use it?
I am using RStudio running on Ubuntu Linux and Windows 7.
Teacup is a program that ships as part of ActiveTcl, a commercial zero-cost distribution of Tcl (and Tk and many other packages) for various platforms. It does package management, looking after the key part that is download, installation and upgrading of packages from a remote repository. It is not open source, though Tcl itself is (as are the majority of packages that aren't single-company-specific).
If you've got it installed, you use these commands from a shell:
teacup update-self
teacup update
Depending on where your Tcl installation is, you might need to elevate privileges to make these command calls work. How you do this is platform-dependent; on Unix it's usually simplest to use sudo for each of the commands, whereas on Windows it is probably easier to create an elevated command shell and run inside that.
Depending on your site, you might need to configure a web proxy with teacup proxy. Try without first.
If you're using a non-ActiveTcl installation but you have an ActiveTcl installation present, you can still use teacup. You just need to use teacup link to connect that Tcl installation to the teacup local repository. This is slightly more complex because you can have multiple repositories on the one system (though I've never needed that).
First, you find where the repository is:
teacup default
Then you need to link the shell to the repository:
teacup link make $PATH_FROM_TEACUP_DEFAULT $LOCATION_OF_TCLSH_TO_LINK
Making this work with R Studio will be a matter of determining which Tcl installation it is using. If it's already an ActiveTcl, you just need the first part of this answer. Otherwise, you need the second part as well. Also note that pretty much requires that you be using either Tcl 8.5 or 8.6; there are no guarantees for older, unsupported versions.
I'd like to use Julia on a computer which is disconnected from the Internet.
Is there simple procedure to download a package and then install it offline?
Surely, its possible.
Pkg.dir() # => get you the package installation path
check the pkg.julialang.org/ address to get the right package and click on its github link, then you can download a zip archive from github.com and extract it into Pkg.dir()
BUT you may taking yourself into trouble
because you must do many optional things manually, e.g.:
rename folder to remove .jl
build steps
install all related packages
I think a better way is to install Pkgs on a connected machine and then copy Pkg.dir() contents from that machine, to your system. this approach would works well only if both machines are of the same architecture (cpuX os julia-version).
I need to support an R environment on a Windows 7 PC that doesn't have internet access.
I'd like to download (to DVD, eventually) a current version of all ~ 5,000 packages to make available to users of R on this PC.
Is there an FTP script, or another good way, to download all of the zip files for the R packages?
I know there are daily updates to R, but one good day will be enough to get me started.
Presumably you have an installation somewhere that does have internet access. I would just set that installation to download everything. There's an example at http://www.r-bloggers.com/r-package-automated-download/. Start R, and try this:
pkg.list = available.packages()
download.packages(pkgs = pkg.list, destdir = "E:\MyRPackages")
Once you have these files, copy them to some kind of portable media (thumb drive, hard drive, whatever) or burn a CD / DVD and take that to the standalone machine.
Note: there may be a reason this other machine was not connected to the internet. So be careful! Make sure the virus protection is up to date on the non-connected machine, and that your IT folks won't come down on you like a ton of bricks for transferring data this way.
Next, you need to point the standalone machine at the portable media or the CD / DVD. A simple way to do this is to redefine where R looks for the repository. See e.g. Creating a local R package repository for examples.
In your case, try something like this in R:
update.packages(repos="complete-path-to-portable-media",repos = NULL, type = "source")
Use rsync to create a mirror and then install packages by pointing to your local mirror as the repos argument of install.packages. No need to make the repository publicly available. Specialize the path (e.g., to rsync based on /bin/windows/contrib/3.0/) to retrieve just the windows binaries (to a directory that you've created with similar structure repos/bin/windows/contrib/3.0/) if that's all that needs to be supported.
rsync -rtlzv --delete \
cran.r-project.org::CRAN/bin/windows/contrib/3.0/ \
repos/bin/windows/contrib/3.0/
RAndFriends, which includes all the items needed to run RExcel, includes just R 2.15.2.
I am currently using the latest version of R but rcom 2.3.1 and rscproxy 2.0.5, which I am currently using, do not allow me to start an R server within Excel.
Setting foreground R server whithin Excel returns a fatal error and R rejects any connection with Excel via rcom.
I get two error messages:
R Server not available
There seems to be no R process connected to Excel
The main difference with a running RExcel session is that in the latter you may see rscproxy and rcom to be loaded when you start a RExcel session.
Is there anyone who is currently using RExcel with R 3.0.1 who can explain me how he succeeded in running it (step by step)?
Found it on statconn's Wiki section.
Assuming you have a suitable version of R installed, the following steps are necessary to install RExcel and the infrastructure. You need to be logged into Windows with administrator privileges to do this!
You also need to follow these instructions if you upgrade R, i.e. you install a new release of R after you have installed RExcel.
Download the statconn DCOM server and execute the program you downloaded.
Start R as administrator (on Windows 7 you need to right-click the R icon and click the corresponding item).
In R, run the following commands (you must start R as administrator to do this).
Commands:
install.packages(c("rscproxy", "rcom"), repos = "http://rcom.univie.ac.at/download", lib = .Library)
library(rcom)
comRegisterRegistry()
Now you have rcom installed, but RExcel is not installed yet.
To install RExcel: download the RExcel installer and run this installation program. Installing RExcel this way will set the background server of R as the default R server for RExcel. You can change this in the configuration settings in R. If you want to set the foreground server as the default site wide server, there is an appropriate option in one of the dialogs of the installation.
The RExcel installer modifies one of the configuration files of R, the file Rprofile.site, usually found at a location like C:\Program Files\R\R-2.13.1\etc\Rprofile.site.
If you do not install RExcel and want the package rcom to be loaded into R each time you start it, you have to add the line
library(rcom)
to Rprofile.site. You have to start your editor as administrator to be able to modify this file.
I thint that focus of those instuctions are on "Assuming you have a suitable version of R installed". R 3.0.1 do not work with RExcel.
I have noticed all development of Rmetrics products have ceased development since Diethelm Wuertz untimely passing, some of his associates on the team maintain it, but further development stopped abruptly after Diethelm Wuertz the project leader and main inspiration behind the team, died in a car accident in 2015, that is how long it has been since any serious development has occurred on Rexcel and all the range of Rmetric products, it is a real tragedy they are still cutting edge 6 years after Diethelm's passing, he would be sad to see his legacy slowly die. I am looking at xlwings and converting not too difficult code to python for speed and power, as far as many great R libraries go, I have not the time to reinvent the wheel within python, it is not a labour of love for me.