How do I run a ldap query using R? - r

I want to make a query against a LDAP directory of how employees are distributed in departments and groups...
Something like: "Give me the department name of all the members of a group" and then use R to make a frequency analysis, but I can not find any examples on how to connect and run a LDAP query using R.
RCurl seems to have some kind of support ( http://cran.r-project.org/web/packages/RCurl/index.html ):
Additionally, the underlying implementation is robust and extensive,
supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet,
dict, ldap, and also supports cookies, redirects, authentication, etc.
But I am no expert in R and have not been able to find a single example using RCurl (or any other R library) to do this..
Right now I am using CURL like this to obtain the members of a group:
curl "ldap://ldap.replaceme.com/o=replaceme.com?memberuid?sub?(cn=group-name)"
Anyone here knows how to do the same in R with RCurl?

Found the answer myself:
First run this commands to make sure RCurl is installed (as described in http://www.programmingr.com/content/webscraping-using-readlines-and-rcurl/ ):
install.packages("RCurl", dependencies = TRUE)
library("RCurl")
And then user getURL with an ldap URL (as described in http://www.ietf.org/rfc/rfc2255.txt although I couldn't understand it until I read http://docs.oracle.com/cd/E19396-01/817-7616/ldurl.html and saw ldap[s]://hostname:port/base_dn?attributes?scope?filter):
getURL("ldap://ldap.replaceme.com/o=replaceme.com?memberuid?sub?(cn=group-name)")

I've written a function here to parse ldap output into a dataframe, and I used the examples provided as a reference for getting everything going.
I hope it helps someone!
library(RCurl)
library(gtools)
parseldap<-function(url, userpwd=NULL)
{
ldapraw<-getURL(url, userpwd=userpwd)
# seperate by two new lines
ldapraw<-gsub("(DN: .*?)\n", "\\1\n\n", ldapraw)
ldapsplit<-strsplit(ldapraw, "\n\n")
ldapsplit<-unlist(ldapsplit)
# init list and count
mylist<-list()
count<-0
for (ldapline in ldapsplit) {
# if this is the beginning of the entry
if(grepl("^DN:", ldapline)) {
count<-count+1
# after the first
if(count == 2 ) {
df<-data.frame(mylist)
mylist<-list()
}
if(count > 2) {
df<-smartbind(df, mylist)
mylist<-list()
}
mylist["DN"] <-gsub("^DN: ", "", ldapline)
} else {
linesplit<-unlist(strsplit(ldapline, "\n"))
if(length(linesplit) > 1) {
for(line in linesplit) {
linesplit2<-unlist(strsplit(line, "\t"))
linesplit2<-unlist(strsplit(linesplit2[2], ": "))
if(!is.null(unlist(mylist[linesplit2[1]]))) {
x<-strsplit(unlist(mylist[linesplit2[1]]), "|", fixed=TRUE)
x<-append(unlist(x), linesplit2[2])
x<-paste(x, sep="", collapse="|")
mylist[linesplit2[1]] <- x
} else {
mylist[linesplit2[1]] <- linesplit2[2]
}
}
} else {
ldaplinesplit<-unlist(strsplit(ldapline, "\t"))
ldaplinesplit<-unlist(strsplit(ldaplinesplit[2], ": "))
mylist[ldaplinesplit[1]] <- ldaplinesplit[2]
}
}
}
if(count == 1 ) {
df<-data.frame(mylist)
} else {
df<-smartbind(df, mylist)
}
return(df)
}

I followed this strategy:
run a Perl script with an LDAP query, write data to disc as JSON.
read in the json structure with R, create a dataframe.
For step (1), I used this script:
#use Modern::Perl;
use strict;
use warnings;
use feature 'say';
use Net::LDAP;
use JSON;
chdir("~/git/_my/R_one-offs/R_grabbag");
my $ldap = Net::LDAP->new( 'ldap.mydomain.de' ) or die "$#";
my $outfile = "ldapentries_mydomain_ldap.json";
my $mesg = $ldap->bind ; # an anonymous bind
# get all cn's (= all names)
$mesg = $ldap->search(
base => " ou=People,dc=mydomain,dc=de",
filter => "(cn=*)"
);
my $json_text = "";
my #entries;
foreach my $entry ($mesg->entries){
my %entry;
foreach my $attr ($entry->attributes) {
foreach my $value ($entry->get_value($attr)) {
$entry{$attr} = $value;
}
}
push #entries, \%entry;
}
$json_text = to_json(\#entries);
say "Length json_text: " . length($json_text);
open(my $FH, ">", $outfile);
print $FH $json_text;
close($FH);
$mesg = $ldap->unbind;
You might need check the a max size limit of entries returned by the ldap server.
See https://serverfault.com/questions/328671/paging-using-ldapsearch
For step (2), I used this R code:
setwd("~/git/_my/R_one-offs/R_grabbag")
library(rjson)
# read into R list, from file, created from perl script
json <- rjson::fromJSON(file="ldapentries_mydomain_ldap.json",method = "C")
head(json)
# create a data frame from list
library(reshape2)
library(dplyr)
library(tidyr)
# not really efficient, maybe thre's a better way to do it
df.ldap <- json %>% melt %>% spread( L2,value)
# optional:
# turn factors into characters
i <- sapply(df.ldap, is.factor)
df.ldap[i] <- lapply(df.ldap[i], as.character)

I wrote a R library for accessing ldap servers using the openldap library.
In detail, the function searchldap is a wrapper for the openldap method searchldap.
https://github.com/LukasK13/ldapr

Related

While loop for creating multiple resources with capacity

I need to create 52 resources with capacity 2 in the Simmer simulation package. I am trying to do this by using a while loop that creates these resources for me, instead of creating each resource myself.
The idea is that I have a while loop as given below. In each loop, a resource should be created called Transport_vehicle1, Transport_vehicle2, ..., Transport_vehicle52, with capacity 2.
Now I do not know how to insert the number i in the name of the resource that I am trying to create
i<-1
while (i<=52)
{ env %>%
add_resource("Transport_vehicle"[i],capacity = 2)
i <- i+1
}
Could someone please help me out? Thanks!
You can use the paste method to concatenate the string and the number:
i<-1
while (i<=52)
{ env %>%
add_resource(paste("Transport_vehicle", i),capacity = 2)
i <- i+1
}
If you do not want a space between the string and the number add the sep="" argument
paste("Transport_vehicle", i, sep="")
or use
paste0("Transport_vehicle", i)

Get the URL of an .url (Windows URL shortcut) file

I want to get the URL of an .url shortcut file (made in Windows) in R.
The file format looks like this:
[{000214A0-0000-0000-C000-000000000046}]
Prop4=31,Stack Overflow - Where Developers Learn, Share, & Build Careers
Prop3=19,11
[{A7AF692E-098D-4C08-A225-D433CA835ED0}]
Prop5=3,0
Prop9=19,0
[InternetShortcut]
URL=https://stackoverflow.com/
IDList=
IconFile=https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d
IconIndex=1
[{9F4C2855-9F79-4B39-A8D0-E1D42DE1D5F3}]
Prop5=8,Microsoft.Website.E7533471.CBCA5933
and has some documentation.
I have used file.info(). But it only shows the information of the first properties header, I guess.
I need to do this in R, because I have a long list of .url files, which addresses I need to convert.
Crude way (I'll update this in a sec):
ini::read.ini("https://rud.is/dl/example.url")$InternetShortcut$URL
## [1] "https://rud.is/b/2017/11/11/measuring-monitoring-internet-speed-with-r/"
Made slightly less crude:
read_url_shortcut <- function(x) {
require(ini)
x <- ini::read.ini(x)
x[["InternetShortcut"]][["URL"]]
}
Without the ini package dependency:
read_url_shortcut <- function(x) {
x <- readLines(x)
x <- grep("^URL", x, value=TRUE)
gsub("^URL[[:space:]]*=[[:space:]]*", "", x)
}
More "production-worthy" version:
#' Read in internet shortcuts (.url or .webloc) and extract URL target
#'
#' #param shortcuts character vector of file path+names or web addresses
#' to .url or .webloc files to have URL fields extracted from.
#' #return character vector of URLs
read_shortcut <- function(shortcuts) {
require(ini)
require(xml2)
require(purrr)
purrr::map_chr(shortcuts, ~{
if (!grepl("^http[s]://", .x)) {
.x <- path.expand(.x)
if (!file.exists(.x)) return(NA_character_)
}
if (grepl("\\.url$", .x)) {
.ini <- suppressWarnings(ini::read.ini(.x)) # get encoding issues otherwise
.ini[["InternetShortcut"]][["URL"]][1] # some evidence multiple are supported but not sure so being safe
} else if (grepl("\\.webloc$", .x)) {
.x <- xml2::read_xml(.x)
xml2::xml_text(xml2::xml_find_first(.x, ".//dict/key[contains(., 'URL')]/../string"))[1] # some evidence multiple are supported but not sure so being safe
} else {
NA_character_
}
})
}
Ideally, such a function would return a single data frame row with all relevant info that could be found (title, URL and icon URL, creation/mod dates, etc). I'd rather not keep my Windows VM up long enough to generate sufficient samples to do that.
NOTE: Said "production"-ready version still doesn't gracefully handle edge cases where the file or web address is not readable/reachable nor does it deal with malformed .url or .webloc files.

Loop works outside function but in functions it doesn't.

Been going around for hours with this. My 1st question online on R. Trying to creat a function that contains a loop. The function takes a vector that the user submits like in pollutantmean(4:6) and then it loads a bunch of csv files (in the directory mentioned) and binds them. What is strange (to me) is that if I assign the variable id and then run the loop without using a function, it works! When I put it inside a function so that the user can supply the id vector then it does nothing. Can someone help ? thank you!!!
pollutantmean<-function(id=1:332)
{
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
}
Your function is missing a return value. (#Roland)
pollutantmean<-function(id=1:332) {
#read files
allfiles<-data.frame()
id<-str_pad(id,3,pad = "0")
direct<-"/Users/ped/Documents/LearningR/"
for (i in id) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
return(allfiles)
}
Edit:
Your mistake was that you did not specify in your function what you want to get out from the function. In R, you create objects inside of function (you could imagine it as different environment) and then specify which object you want it to return.
With my comment about accepting my answer, I meant this: (...To mark an answer as accepted, click on the check mark beside the answer to toggle it from greyed out to filled in...).
Consider even an lapply and do.call which would not need return being last line of function:
pollutantmean <- function(id=1:332) {
id <- str_pad(id,3,pad = "0")
direct_files <- paste0("/Users/ped/Documents/LearningR/", id, ".csv")
# READ FILES INTO LIST AND ROW BIND
allfiles <- do.call(rbind, lapply(direct_files, read.csv))
}
ok, I got it. I was expecting the files that are built to be actually created and show up in the environment of R. But for some reason they don't. But R still does all the calculations. Thanks lot for the replies!!!!
pollutantmean<-function(directory,pollutant,id)
{
#read files
allfiles<-data.frame()
id2<-str_pad(id,3,pad = "0")
direct<-paste("/Users/pedroalbuquerque/Documents/Learning R/",directory,sep="")
for (i in id2) {
path<-paste(direct,"/",i,".csv",sep="")
file<-read.csv(path)
allfiles<-rbind(allfiles,file)
}
#averaging polutants
mean(allfiles[,pollutant],na.rm = TRUE)
}
pollutantmean("specdata","nitrate",23:35)

R code does not work when called from function

HI i just started learning R and finding this problem to be really interesting where I just run a code directly without wrapping in a function it works but when I place it inside a function it doesn't work, What can be possible reason?
fill_column<-function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
}
fill_column('volume')
Where I am creating new column named volume it this string exists in particulars column.
I have added a comment where solution given by another question does not work for me, Please look at my comment below.
Finally I got it working but reading another answer on SO, here is the solution:
fill_column <- function(colName){
count <- 0
for(i in fg_data$particulars) {
count <- count +1
if(grepl(colName, i) && fg_data$value[count] > 0.0){
fg_data[,colName][count] <- as.numeric(fg_data$value[count])
} else {
fg_data[,colName][count] <- 'NA'
}
}
return(fg_data)
}
fg_data = fill_column('volume')
Now reason, Usually in any language when we modify global object inside any function it reflects on global object immediately but in R we have to return the modified object from function and then assign it again to global object to see our changes. or another way for doing this is to assign local object from within the function to global context using envir=.GlobalEnv.

Use of variable in Unix command line

I'm trying to make life a little bit easier for myself but it is not working yet. What I'm trying to do is the following:
NOTE: I'm running R in the unix server, since the rest of my script is in R. That's why there is system(" ")
system("TRAIT=some_trait")
system("grep var.resid.anim rep_model_$TRAIT.out > res_var_anim_$TRAIT'.xout'",wait=T)
When I run the exact same thing in putty (without system(" ") of course), then the right file is read and right output is created. The script also works when I just remove the variable that I created. However, I need to do this many times, so a variable is very convenient for me, but I can't get it to work.
This code prints nothing on the console.
system("xxx=foo")
system("echo $xxx")
But the following does.
system("xxx=foo; echo $xxx")
The system forgets your variable definition as soon as you finish one call for "system".
In your case, how about trying:
system("TRAIT=some_trait; grep var.resid.anim rep_model_$TRAIT.out > res_var_anim_$TRAIT'.xout'",wait=T)
You can keep this all in R:
grep_trait <- function(search_for, in_trait, out_trait=in_trait) {
l <- readLines(sprintf("rep_model_%s.out", in_trait))
l <- grep(search_for, l, value=TRUE) %>%
writeLines(l, sprintf("res_var_anim_%s.xout", out_trait))
}
grep_trait("var.resid.anim", "haptoglobin")
If there's a concern that the files are read into memory first (i.e. if they are huge files), then:
grep_trait <- function(search_for, in_trait, out_trait=in_trait) {
fin <- file(sprintf("rep_model_%s.out", in_trait), "r")
fout <- file(sprintf("res_var_anim_%s.xout", out_trait), "w")
repeat {
l <- readLines(fin, 1)
if (length(l) == 0) break;
if (grepl(search_for, l)[1]) writeLines(l, fout)
}
close(fin)
close(fout)
}

Resources