R : Updating an entry in mongodb using mongolite - r

I have a mongo database with information that I am passing to some R scripts for analysis. I am currently using the mongolite package to pass the information from mongo to R.
I have a field in each mongo entry called checkedByR, which is a binary that indicates whether the entry has been analysed by the R scripts already. Specifically, I am collecting a mongo entry by its respective mongo ID, running the scripts on the entry, assigning the checkedByR field with a 1, and then moving on.
For completeness, I am querying the database with the following request:
library(mongolite)
mongoID <- "1234abcd1234abcd1234"
m <- mongolite::mongo(url = "mongodb://localhost:27017",
collection = "collection",
db = "database")
rawData <- m$find(query = paste0('{"_id": { "$oid" : "',mongoID,'" }}'),
fields = '{"_id" : 1,
"checkedByR" : 1,
"somethingToCheck" : 1}')
checkedByR <- 1
However, I am having trouble successfully updating the mongo entry with the new checkedByR field.
I realise that an update function exists in the mongolite package (please consider : https://cran.r-project.org/web/packages/mongolite/mongolite.pdf), but I am having trouble gathering relevant examples to help me complete the updating process.
Any help would be greatly appreciated.

the mongo$update() function takes a query and a update argument. You use the query to find the data you want to update, and the update to tell it which field to update.
Consider this example
library(mongolite)
## create some dummy data and insert into mongodb
df <- data.frame(id = 1:10,
value = letters[1:10]
)
mongo <- mongo(collection = "another_test",
db = "test",
url = "mongodb://localhost")
mongo$insert(df)
## the 'id' of the document I want to update
mongoID <- "575556825dabbf2aea1d7cc1"
## find some data
rawData <- mongo$find(query = paste0('{"_id": { "$oid" : "',mongoID,'" }}'),
fields = '{"_id" : 1,
"id" : 1,
"value" : 1}'
)
## ...
## do whatever you want to do in R...
## ...
## use update to query on your ID, then 'set' to set the 'checkedByR' value to 1
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$set" : { "checkedByR" : 1} }'
)
## in my original data I didn't have a 'checkedByR' value, but it's added anyway
Update
the rmongodb library is no longer on CRAN, so the below code won't work
And for more complex structures & updates you can do things like
library(mongolite)
library(jsonlite)
library(rmongodb) ## used to insert a non-data.frame into mongodb
## create some dummy data and insert into mongodb
lst <- list(id = 1,
value_doc = data.frame(id = 1:5,
value = letters[1:5],
stringsAsFactors = FALSE),
value_array = c(letters[6:10])
)
## using rmongodb
mongo <- mongo.create(db = "test")
coll <- "test.another_test"
mongo.insert(mongo,
ns = coll,
b = mongo.bson.from.list(lst)
)
mongo.destroy(mongo)
## update document with specific ID
mongoID <- "5755f646ceeb7846c87afd90"
## using mongolite
mongo <- mongo(db = "test",
coll = "another_test",
url = "mongodb://localhost"
)
## to add a single value to an array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_array" : "checkedByR" } }'
)
## To add a document to the value_array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_array" : { "checkedByR" : 1} } }'
)
## To add to a nested array
mongo$update(
query = paste0('{"_id": { "$oid" : "', mongoID, '" } }'),
update = '{ "$addToSet" : { "value_doc.value" : "checkedByR" } }'
)
rm(mongo); gc()
see mongodb update documemtation for further details

Related

R6Class : initialize method raises the error message: "cannot add bindings to a locked environment"

My working environment:
R version: 3.6.3 (64 bits)
OS: Windows 10 (64 bits)
I was working on the following exercice from Hadley Wickham's advanced R book:
Create a bank account R6 class that stores a balance and allows you to
deposit and withdraw money.
Here is the class that I created
library(R6)
BankAccount <- R6Class(
classname = "BankAccount",
public = list(
initialize = function(first_name,
last_name,
email,
balance
) {
stopifnot(balance >= 0)
self$balance <- balance
self$email <- email
self$first_name <- first_name
self$last_name <- last_name
},
deposit = function(amount) {
stopifnot(amount > 0)
self$balance <- self$balance + amount
invisible(self)
},
withdraw = function(amount) {
stopifnot(amount > 0, self$balance - amount > 0)
self$balance = self$balance - amount
invisible(self)
},
print = function() {
cat(
"first name: ",
self$first_name,
"\n",
"last name: ",
self$last_name,
"\n",
"email : ",
self$email,
"\n",
"current balance : ",
self$balance,
"\n"
)
invisible(self)
}
)
)
my_bank_account <- BankAccount$new(
first_name = "test_firstname",
last_name = "test_lastname",
email = "testemail#somedomain.com",
balance = 140
)
The above code raises the following error upon execution:
Error in self$balance <- balance (from #10) :
cannot add bindings to a locked environment
>
I cannot understand the problem in the initialize method of my class. What's wrong with the initialize function in my code?
Well, I had to read more carefully the R6Class documentation:
If the public or private lists contain any items that have reference
semantics (for example, an environment), those items will be shared
across all instances of the class. To avoid this, add an entry for that
item with a 'NULL' initial value, and then in the 'initialize' method,
instantiate the object and assign it.
The problem with my code was that I had declared fields only inside my constructor but apparently they had to be declared first, out of the initializer function, assigned by NULL, and then inside the initializer be assigned by the corresponding function arguments. So here is the correct and new version of my class
library(R6)
BankAccount <- R6Class(
classname = "BankAccount",
public = list(
## I had forgotten to write the
## following 4 lines in the previous
## version and apparently this caused
## the problem.
balance = NULL,
email = NULL,
first_name = NULL,
last_name = NULL,
##
##
initialize = function(first_name,
last_name,
email,
balance
) {
stopifnot(balance >= 0)
self$balance <- balance
self$email <- email
self$first_name <- first_name
self$last_name <- last_name
},
deposit = function(amount) {
stopifnot(amount > 0)
self$balance <- self$balance + amount
invisible(self)
},
withdraw = function(amount) {
stopifnot(amount > 0, self$balance - amount > 0)
self$balance = self$balance - amount
invisible(self)
},
print = function() {
cat(
"first name: ",
self$first_name,
"\n",
"last name: ",
self$last_name,
"\n",
"email : ",
self$email,
"\n",
"current balance : ",
self$balance
)
invisible(self)
}
)
)
my_bank_account <- BankAccount$new(
first_name = "test_firstname",
last_name = "test_lastname",
email = "testemail#somedomain.com",
balance = 140
)
print(my_bank_account)
And this time the code runs without any problem.

I need to write this data to a .csv file

I'm using the following query and trying to write the results to a .csv file:
.libPaths("G:/R/R-3.6.1/library")
library(mongolite)
library(data.table)
library(dplyr)
###Path to list of search terms
#------------------------------------------------------------------------------------------------------------#
#Input the words you want to search in a text file, and call the text file below in place of ".txt"
#-------------------------------------------------------------------------------------------------------------#
searchStrings = readLines("//int/elm/Work/Text Analytics/Opportunities/New Folder/terms.txt")
searchCounts = data.table(searchString = searchStrings, count = 0)
db <- mongo(collection = "2020Emails", db = "textAnalytics", verbose = TRUE,
url = "mongodb://user:m0ng0b0ng0#interactionsprojection:7999/textAnalytics")
firstQuery = TRUE
#Update this JSON if if you want different fields returned
fieldsjson = '{"DocEid" : true,
"RequestType" : true,
"FromEmailAddress" : true,
"ReceiptTime" : true,
"RawBodyText" : true,
"CMF" : true,
"ReplyTo" : true,
"InboundMode" : true,
"PartNumbers" : true,
"_id": false}'
for (term in searchStrings) {
queryString = paste0('{"$text" : { "$search" : "\\"',term,'\\"" }}')
if (firstQuery == TRUE) {
results = NULL
results = data.table(db$find(query = queryString, fields = fieldsjson))
if (nrow(results) > 0) {
results[, searchString := term]
searchCounts[searchString == term, count := nrow(results)]
firstQuery = FALSE
}
} else {
tempdt = data.table(db$find(query = queryString, fields = fieldsjson))
if (nrow(tempdt) > 0) {
tempdt[, searchString := term]
results = rbind(results,tempdt, fill= TRUE)
searchCounts[searchString == term, count := nrow(tempdt)]
}
}
}
View(results)
#------------------------------------------------------------------------------#
# Specify the location and new file name where the results will be stored below
#------------------------------------------------------------------------------#
fwrite(results, "C:/Results/terms.csv")
I am getting the following error when trying to write this to a .csv. file.
Error in fwrite(results, "C:/Results/terms.csv") :
Row 1 of list column is type 'list' - not yet implemented. fwrite() can write list columns containing items which are atomic vectors of type logical, integer, integer64, double, complex and character.

Insert/Update R data.table into PostgreSQL table

I have a PostgreSQL database set up with a table and columns already defined. The primary key for the table is a combination of (Id, datetime) column. I need to periodically INSERT data for different Ids from R data.table into the database. However, if data for a particular (Id, datetime) combination already exists it should be UPDATED (overwritten). How can I do this using RPostgres or RPostgreSQL packages?
When I try to insert a data.table where some (Id, datetime) rows already exist I get an error saying the primary key constraint is violated:
dbWriteTable(con, table, dt, append = TRUE, row.names = FALSE)
Error in connection_copy_data(conn#ptr, sql, value) :
COPY returned error: ERROR: duplicate key value violates unique constraint "interval_data_pkey"
DETAIL: Key (id, dttm_utc)=(a0za000000CSdLoAAL, 2018-10-01 05:15:00+00) already exists.
CONTEXT: COPY interval_data, line 1
You can use my pg package that has upsert functionality, or just grab code for upsert from there: https://github.com/jangorecki/pg/blob/master/R/pg.R#L249
It is basically what others said in comments. Write data into temp table and then insert into destination table using on conflict clause.
pgSendUpsert = function(stage_name, name, conflict_by, on_conflict = "DO NOTHING", techstamp = TRUE, conn = getOption("pg.conn"), .log = getOption("pg.log",TRUE)){
stopifnot(!is.null(conn), is.logical(.log), is.logical(techstamp), is.character(on_conflict), length(on_conflict)==1L)
cols = pgListFields(stage_name)
cols = setdiff(cols, c("run_id","r_timestamp")) # remove techstamp to have clean column list, as the fresh one will be used, if any
# sql
insert_into = sprintf("INSERT INTO %s.%s (%s)", name[1L], name[2L], paste(if(techstamp) c(cols, c("run_id","r_timestamp")) else cols, collapse=", "))
select = sprintf("SELECT %s", paste(cols, collapse=", "))
if(techstamp) select = sprintf("%s, %s::INTEGER run_id, '%s'::TIMESTAMPTZ r_timestamp", select, get_run_id(), format(Sys.time(), "%Y-%m-%d %H:%M:%OS"))
from = sprintf("FROM %s.%s", stage_name[1L], stage_name[2L])
if(!missing(conflict_by)) on_conflict = paste(paste0("(",paste(conflict_by, collapse=", "),")"), on_conflict)
on_conflict = paste("ON CONFLICT",on_conflict)
sql = paste0(paste(insert_into, select, from, on_conflict), ";")
pgSendQuery(sql, conn = conn, .log = .log)
}
#' #rdname pg
pgUpsertTable = function(name, value, conflict_by, on_conflict = "DO NOTHING", stage_name, techstamp = TRUE, conn = getOption("pg.conn"), .log = getOption("pg.log",TRUE)){
stopifnot(!is.null(conn), is.logical(.log), is.logical(techstamp), is.character(on_conflict), length(on_conflict)==1L)
name = schema_table(name)
if(!missing(stage_name)){
stage_name = schema_table(stage_name)
drop_stage = FALSE
} else {
stage_name = name
stage_name[2L] = paste("tmp", stage_name[2L], sep="_")
drop_stage = TRUE
}
if(pgExistsTable(stage_name)) pgTruncateTable(name = stage_name, conn = conn, .log = .log)
pgWriteTable(name = stage_name, value = value, techstamp = techstamp, conn = conn, .log = .log)
on.exit(if(drop_stage) pgDropTable(stage_name, conn = conn, .log = .log))
pgSendUpsert(stage_name = stage_name, name = name, conflict_by = conflict_by, on_conflict = on_conflict, techstamp = techstamp, conn = conn, .log = .log)
}

Truncated updated string with R DBI package

I need to update a wide table on an SQL SERVER from R. So the package DBI seems to be very useful for that.
The problem is that the R data.frame contains strings of more than 3000 characters and when I use the DBI dbSendQuery function, all strings are truncated to 256 characters.
Here could be a code example :
con <- odbc::dbConnect(drv = odbc::odbc(),
dsn = '***',
UID = '***',
PWD = '***')
df = data.frame(TEST = paste(rep("A", 300), collapse=""),
TEST_ID = 1068858)
df$TEST = df$TEST %>% as.character
query = paste0('UPDATE MY_TABLE SET "TEST"=? WHERE TEST_ID=?')
update <- DBI::dbSendQuery(con, query)
DBI::dbBind(update, df)
DBI::dbClearResult(update)
odbc::dbDisconnect(con)
Then the following request return 256 instead of 300 :
SELECT LEN(TEST) FROM MY_TABLE WHERE TEST_ID = 1068858
NB : TEST is of type (varchar(max), NULL) and already contains strings of more than 256 chars.
Thanks in advance for any advice
In the end, I choose to get rid of sophisticated functions. A solution was to write the table in .csv file and bulk insert it into the database. Here is an example using RODBC package :
write.table(x = df,
file = "/path/DBI_error_test.csv",
sep = ";",
row.names = FALSE, col.names = FALSE,
na = "NULL",
quote = FALSE)
Query = paste("CREATE TABLE #MY_TABLE_TMP (
TEST varchar(max),
TEST_ID int
);
BULK INSERT #MY_TABLE_TMP
FROM 'C:\\DBI_error_test.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
BATCHSIZE = 500000,
CHECK_CONSTRAINTS
)
UPDATE R
SET R.TEST = #MY_TABLE_TMP.TEST
FROM MY_TABLE AS R
INNER JOIN #MY_TABLE_TMP ON #MY_TABLE_TMP.TEST_ID = R.TEST_ID;
DROP TABLE #MY_TABLE_TMP;
")
channel <- RODBC::odbcConnect(dsn = .DB_DSN_NAME,
uid = .DB_UID,
pwd = .DB_PWD)
RODBC::sqlQuery(channel = channel, query = query, believeNRows = FALSE)
RODBC::odbcClose(channel = channel)

Mongolite: How to insert Date in mongodb using r

Hi I'm trying insert date but it is only taking as string.
mongo$update(query = paste0('{"_id": ', c, ' }'),
update = paste0('{"$addToSet": {"values": {date_data": "ISODate("',dat,'")"
} } }'))
If i remove quotes from value: "ISODate("',dat,'")", its is giving invalid json object error and with quote it inserts as string.
Any help will be appreciated...
To insert date in mongodb using rmongolite package use $date.
mongo$update(
query = paste0('{"_id": ', c, ' }'),
update = paste0('{"$addToSet":
{"values":{"date_data":{"$date":"', dat, '"} } }'
)
)

Resources