How can I generate a GUID in R? - r

How can I generate GUIDs and UUIDs in R?
I would like to be able to generate GUIDs based on the hardware etc. of the machine running the rsession.
As a fallback, however, I would be happy to create UUIDs that comply with rfc4122.
Is there a package that can create GUIDs? Otherwise, does someone have some RFC4122 compatible UUID code lying about?

The optimal choice for this now is the uuid package. It consists of one function (UUIDgenerate) that doesn't rely on R's internal random number generators and so doesn't suffer any consequences from using set.seed in a session as #thelatemail's answer does. You can choose to have the UUID be generated either by the package's internal random number generator or based on time.

If you are using R in Unix environment, you can get UUID in R using system() command.
On Linux (Ubuntu 12.04 LTS):
my_uuid <- system("uuid",intern=T)
my_uuid
[1] 0f62f1de-418d-11e3-8a19-cb0ceccb58ec
On Mac OS X 10.8:
my_uuid <- system("uuidgen", intern=T)
my_uuid
[1] 9A9D64DF-EB01-47E7-B16E-DC0343951883
Far as I know, both uuid and uuidgen follows UUID Version 4 format.

I know nothing about the intricacies of UUID's, but would something like this do?
baseuuid <- paste(sample(c(letters[1:6],0:9),30,replace=TRUE),collapse="")
paste(
substr(baseuuid,1,8),
"-",
substr(baseuuid,9,12),
"-",
"4",
substr(baseuuid,13,15),
"-",
sample(c("8","9","a","b"),1),
substr(baseuuid,16,18),
"-",
substr(baseuuid,19,30),
sep="",
collapse=""
)
# result like: "f7bd11ed-fca9-42e5-8c3e-4464cd02e0fa"
This should be in line with http://en.wikipedia.org/wiki/Uuid#Version_4_.28random.29

Bioconductor had an Ruuid package that doesn't seem to be there anymore, but a google search on "Ruuid" will point to places that you can download it.

Related

Walsh-Hadamard Transform in r

I search for a command to compute Walsh-Hadamard Transform of an image in R, but I don't find anything. In MATLAB fwht use for this. this command implement Walsh-Hadamard Tranform to each row of matrix. Can anyone introduce a similar way to compute Walsh-Hadamard on rows or columns of Matrix in R?
I find a package here:
http://www2.uaem.mx/r-mirror/web/packages/boolfun/boolfun.pdf
But why this package is not available when I want to install it?
Packages that are not maintained get put in the Archive. They get put there when that aren't updated to match changing requirements or start making errors with changing R code base. https://cran.r-project.org/web/packages/boolfun/index.html
It's possible that you might be able to extract useful code from the archive version, despite the relatively ancient version of R that package was written under.
The R code for walshTransform calls an object code routine:
walshTransform <- function ( truthTable ) # /!\ should check truthTable values are in {0,1}
{
len <- log(length(truthTable),base=2)
if( len != round(len) )
stop("bad truth table length")
res <- .Call( "walshTransform",
as.integer(truthTable),
as.integer(len))
res
}
Installing the package succeeded on my Mac, but would require the appropriate toolchain on whatever OS you are working in.

running all examples in r package

I am developing a package in Rstudio. Many of my examples need updating so I am going through each one. The only way to check the examples is by running devtools::check() but of course this runs all the checks and it takes a while.
Is there a way of just running the examples so I don't have to wait?
Try the following code to run all examples
devtools::run_examples()
You can also do this without devtools, admittedly it's a bit more circuitous.
package = "rgl"
# this gives a key-value mapping of the various `\alias{}`es
# in each Rd file to that file's canonical name
aliases <- readRDS(system.file("help", "aliases.rds", package=package))
# or sapply(unique(aliases), example, package=package, character.only=TRUE),
# but I think the for loop is superior in this case.
for (topic in unique(aliases)) example(topic, package=package, character.only = TRUE)

Do you have to escape back slashes or do any special encoding with Rs Digest function within the Alteryx R tool

I'm using the Alteryx R tool to do some sha256 hash calculations, but I'm running into trouble with one of my inputs. I'm trying to produce the sha256 hash for the following input:
POST\n/\n\ncontent_type:\nhost:dynamodb.us-east-1.amazonaws.com\nx-amz-date:20150707T201951Z\nx-amz-target:DynamoDB_20120810.CreateTable\n\ncontent_type;host;x-amz-date;x-amz-target\n09a8bcdeea1d20631f887235820bbff0a614679080a2e74a89ceb1a1bcc71b44
My r function is:
digest('POST\n/\n\ncontent_type:\nhost:dynamodb.us-east-1.amazonaws.com\nx-amz-date:20150707T201951Z\nx-amz-target:DynamoDB_20120810.CreateTable\n\ncontent_type;host;x-amz-date;x-amz-target\n09a8bcdeea1d20631f887235820bbff0a614679080a2e74a89ceb1a1bcc71b44', algo='sha256', serialize = FALSE)
and the hash produced by R is:
7fe2c3fc70134481217952f27bb5f4af95193645903ba3a6d4d7ad45c3adade1
This value is not correct. The correct value is:
9a493c643eeb736decc195a8e0e84e08f45a00bdbc21feaafa94be5f0f299af0
You can see the correct value calculated below using Python;
Calculated With Python
I've also calculated the correct value using the R command line tool. This leads me to believe that Alteryx is somehow altering the input and as a result is producing the wrong output. Has anyone come across this or know a possible workaround.
My R tool script follows:
where c =
POST\n/\n\ncontent_type:\nhost:dynamodb.us-east-1.amazonaws.com\nx-amz-date:20150707T201951Z\nx-amz-target:DynamoDB_20120810.CreateTable\n\ncontent_type;host;x-amz-date;x-amz-target\n09a8bcdeea1d20631f887235820bbff0a614679080a2e74a89ceb1a1bcc71b44
If you replace all \ with \\ in your input string (in regular R or Python), it will digest to the value starting with "7fe..." so this suggests that indeed, Alteryx is doing something like that in the hand-off to R.
A work-around would be to run it through the Alteryx base64 tool prior to the R tool, then have the R tool base 64 decode, something like:
library("digest")
library("caTools")
df <- as.data.frame(read.Alteryx("#1", mode="data.frame"))
df$out <- digest(base64decode(as.character(df$b64),what="character"), algo='sha256', serialize = FALSE)
write.Alteryx(df, 1)
While not ideal, it will give the correct output starting with "9a493..."

Extract English words from a text in R

I have a text and I need to extract all English words from it. For instance I want to have a function which would analyse the vector
vector <- c("picture", "carpet", "lamp", "notaword", "anothernotaword")
And return only English words from this vector i.e. "picture", "carpet", "lamp"
I do understand that the definition of "English word" depends on the dictionary but I would be satisfied even with a basic dictionary.
You could use the package I maintain qdapDictionaries (no need for the parent package qdap to be installed). If your data is more complex you may need to use tools like tolower etc. to make it work. The idea here is basically to see where a known word list ?GradyAugmented intersects with your words. Here are two very similar approaches, the first is likely slightly faster depending on data:
vector <- c("picture", "carpet", "lamp", "notaword", "anothernotaword")
library(qdapDictionaries)
vector[vector %in% GradyAugmented]
## [1] "picture" "carpet" "lamp"
intersect(vector, GradyAugmented)
## [1] "picture" "carpet" "lamp"
The error you are receiving with installing qdap sounds like #Ben Bolker is correct. You will need a newer version (I'd suggest the latest version) of data.table installed (use packageVersion("data.table") to check this). That is an oversight on my part with not requiring a minimal version of data.table, I thought setDT (a function in the data.table package) was always around but it appears to not be in your version. But to solve this particular problem you wouldn't need to install the parent qdap package, just qdapDictionaries.

How to read data from Cassandra with R?

I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.
The Cassandra schema is defined like this:
create table chosen_samples (id bigint , temperature double, primary key(id))
I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)
> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")
The connection seems to succeed but the parsing of the table into data frame fails:
> cs
Error in data.frame(..dfd. = c("#\"ffffff", "#(<cc><cc><cc><cc><cc><cd>", :
duplicate row.names:
I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")
But this one fails like this:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Even though the location to the java driver is correct
$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
You have to download apache-cassandra-2.0.10-bin.tar.gz and cassandra-jdbc-1.2.5.jar and cassandra-all-1.1.0.jar.
There is no need to install Cassandra on your local machine; just put the cassandra-jdbc-1.2.5.jar and the cassandra-all-1.1.0.jar files in the lib directory of unziped apache-cassandra-2.0.10-bin.tar.gz. Then you can use
library(RJDBC)
drv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("D:/apache-cassandra-2.0.10/lib",
pattern="jar$",full.names=T))
That is working on my unix but not on my windows machine.
Hope that helps.
This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.
Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.
I mostly made it to demystify the config around how to make sparklyr load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).
Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.
I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.
Right, I found an (admittedly ugly) way, simply by calling python from R, parsing the NA manually and re-assigning the data-frames names in R, like this
# install.packages("rPython")
# (don't forget to "pip install cql")
library(rPython)
python.exec("import sys")
# adding libraries from virtualenv
python.exec("sys.path.append('/Users/svend/dev/pyVe/playground/lib/python2.7/site-packages/')")
python.exec("import cql")
python.exec("connection=cql.connect('192.168.33.10', cql_version='3.0.0')")
python.exec("cursor = connection.cursor()")
python.exec("cursor.execute('use poc1_samples')")
python.exec("cursor.execute('select * from chosen_samples' )")
# coding python None into NA (rPython seem to just return nothing )
python.exec("rep = lambda x : '__NA__' if x is None else x")
python.exec( "def getData(): return [rep(num) for line in cursor for num in line ]" )
data <- python.call("getData")
df <- as.data.frame(matrix(unlist(data), ncol=15, byrow=T))
names(df) <- c("temperature", "maxTemp", "minTemp",
"dewpoint", "elevation", "gust", "latitude", "longitude",
"maxwindspeed", "precipitation", "seelevelpressure", "visibility", "windspeed")
# and decoding NA's
parsena <- function (x) if (x=="__NA__") NA else x
df <- as.data.frame(lapply(df, parsena))
Anybody has a better idea?
I had the same error message when executing Rscript with RJDBC connection via batch file (R 3.2.4, Teradata driver).
Also, when run in RStudio it worked fine in the second run but not first.
What helped was explicitly call:
library(rJava)
.jinit()
It not enough to just download the driver, you have to also download the dependencies and put them into your JAVA ClassPath (MacOS: /Library/Java/Extensions) as stated on the project main page.
Include the Cassandra JDBC dependencies in your classpath : download dependencies
As of the RCassandra package, right now it's still too primitive compared to RJDBC.

Resources