How to pass R variables as input to SearchES method in ElasticSearch? - r

I am working with RElasticSearch package in R. I am able to connect to the proper index in ElasticSearch. Suppose my index contains two fields like id and name. Two of my R variables,say rid and rname contains the value i want to search. How should i use the searchES method to accomplish this? I have tried using like:
searchES(server=es.index,query="id":rid & "name":rname)
but it keeps throwing an error! Can someone please help me out?

You need to correctly build your as a character value in order for this to work. In order to concatenate strings in R, you should use paste(). For example
searchES(server=es.index,query=paste0("id:", rid, " AND name:", rname))

Related

How to use partial matches across multiple columns in R to set final value

I am new to R, moving over from Excel VBA. I would like to categorize a final value based on the text provided in multiple columns and 20k+ rows.
I've been semi-successful with "if" and "identical" but have struggled with partial matches through using "grep"
I'll share psuedo-code of what I'm trying to achieve:
If d$Removal_Reason_Code contains "SCH" AND
If d$Shop_Action_Code is an exact match to "Test" AND
If d$Repair_Summary contains "No Fault Found"
Then
set d$Category to "NFF"
Else
go back to row 1 and check against other keywords
I can post the working VBA code if that is helpful. I'm just getting my head round how R works, and was hoping it may be a quick and easy answer for one of you gurus!
Much appreciated :)
We can use grepl for partial matches
i1 <- with(d, grepl("SCH", Removal_Reason_Code) & Shop_Action_Code == "TEST" &
grepl("No Fault Found", Repair_Summary))
d$Category[i1] <- "NFF"

R - Update column by looping through it

I want to update a column value in such a way that only the part of a string after the last '.' is stored. I wrote a code that does this but it only works when it's given one input. How do I loop through all the rows of my dataframe?
One value of a row looks for example like this. I only want to store the last part ".gif"
GET /./enviro/gif/emcilogo.gif
I wrote the following code that succesfully does this.
tail(c(do.call(rbind, strsplit(as.character(sapply(strsplit("GET /./enviro/gif/emcilogo.gif", "\\s+"), `[`, 2)),"\\."))), n=1)
Output:
"gif"
However, I am using the string "GET /./enviro/gif/emcilogo.gif" as input. As soon as I change this to the column of my dataframe "df$request" I receive the error.
Error in strsplit(epa.df$request) :
argument "split" is missing, with no default
I tried writing a function that one-by-one loops through my column values and updates them. However, I cant seem to get this working.
Any help would be highly appreciated!

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Bracket-escaped table names with dplyr

I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.
I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")
This results in the error message:
Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3
This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:
sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3
Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.
EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:
library(dplyr)
data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")
Error in sqliteSendQuery(conn, statement, bind.data) :
error in statement: unrecognized token: "14M3"
This is a bug in dplyr. It's still there in the current github master. As #hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.
The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.
I've fixed both errors at a fork of dplyr. I've also put in a pull request to #hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.
Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.
Finally, the proposal from #RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

How to use indirect reference to read contents into a data table in R.

How do you use indirect references in R? More specifically, in the following simple read statement, I would like to be able to use a variable name to read inputFile into data table myTable.
myTable <- read.csv(inputFile, sep=",", header=T)
Instead of the above, I want to define
refToMyTable <- "myTable"
Then, how can I use refToMyTable instead of myTable to read inputFile into myTable?
Thanks for the help.
R doesn't really have references like that, but you can use strings to retrieve/create variables of that name.
But first let me say this is generally not a good practice. If you're looking to do this type of thing, it's generally a sign that you're not doing it "the R way.'
Nevertheless
assign(refToMyTable, read.csv(inputFile, sep=",", header=T))
Should to the trick. And the complement to assign is get to retrieve a variable's value using it's name.
I think you mean something like the following:
reftomytable='~/Documents/myfile.csv'
myTable=read.csv(reftomytable)
Perhaps assign as mentioned by MrFlick.
When you want the contents of the object named "myTable" you would use get:
get("myTable")
get(refToMyTable) # since get will evaluate its argument
(It would be better to assign results of multiple such dataframes to a ist object or a Reference Class.)
If you wanted a language-name object you would use as.name:
as.name("myTable")
# myTable .... is printed at the console; note no quotes
str(as.name("myTable"))
#symbol myTable

Resources