Write values from R to a PostgreSQL table based on Row IDs - r

I have a PostgreSQL table Scores on a local server that looks like this:
ID Score_X Score_Y
1 NA NA
2 NA NA
3 NA NA
4 NA NA
I do a series of calculations in R that produces a dataframe Calc_Scores that looks like this:
ID Score_X Score_Y
1 0.53 0.81
4 0.75 0.95
I would like to write the scores that correspond with each ID from R to the PostgreSQL table such that the final PostgreSQL table should look like this:
ID Score_X Score_Y
1 0.53 0.81
2 NA NA
3 NA NA
4 0.75 0.95
I have a connection to the PostgreSQL table called connection which I setup using the function dbConnect(). The actual tables are quite big. What line/code in R could I use to write these scores to the PostgreSQL table? I have been looking for a similar question but couldn't find anything. I have tried
dbWriteTable(connection, "Scores", value = Calc_Scores, overwrite=T, append = F, row.names = F)
However, the entire table gets overwritten. I want only the scores to be updated.
Thank you.

Creating a temporary table could be an option:
# Create temporary table
dbWriteTable(connection, "ScoresTmp", value = Calc_Scores, overwrite=T, append = F, row.names = F)
# Update main table
dbExecute(connection,"
UPDATE Scores
SET Score_X = ScoresTmp.Score_X,
Score_Y = ScoresTmp.Score_Y
FROM ScoresTmp
WHERE Scores.ID = ScoresTmp.ID
")
# Clean up
dbExecute(connection,"DROP TABLE ScoresTmp")
Note that you should be able to create a real temporary table using the temporary=TRUE option : according to #Sirius comment below, it should work on a PostGreSQL database.
For users of an SQLServer database, this option doesn't work, but they can use the # prefix to create a temporary table.
In the example above, this would be:
dbWriteTable(connection, "#ScoresTmp", value = Calc_Scores, overwrite=T, append = F, row.names = F)

One way of doing this relies on the SQL 'update' and in essence you do
- open a connection to your database
- loop over your changeset and for each row
- form the update statement, i.e. for example via
cmd <- paste('update table set x=', Score_x, ', y=',
Score_y, ' where id=', id)
- submit the cmd via eg `dbSendQuery`
- close the connection
There are examples in RPostgreSQL.

Related

How to efficiently send a dataframe with multiple rows via httr::PUT

Probably due to my limited knowledge of communicating with APIs, (Which I am trying to remedy :) ) I seem to be unable to execute a put request for more than 1 row of a dataframe at a time. for example, if df_final consists of 1 row, the following code works. If there are multiple rows, it fails and I get a 400 status.
reqBody <- list(provName = df_final$Provider,site = df_final$Site,
monthJuly = df_final$July, monthAugust = df_final$August,
monthSeptember = df_final$September, monthOctober =df_final$October,
monthNovember = df_final$November ,
monthDecember = df_final$December, monthJanuary = df_final$January, monthFebruary = df_final$February,
monthMarch = df_final$March, monthApril = df_final$April, monthMay = df_final$May,
monthJune = df_final$June,
assumptions = paste("Monthly Volume:", input$Average, "; Baseline Seasonality:", input$Year, "; Trend:", input$Year_slopes),
rationale = as.character(input$Comments), fiscalYear = FY_SET, updateDtm = Sys.time())
r <- PUT(fullURL, body = reqBody, encode = "json", content_type_json())
Using with_verbose() I am able to see that the json being sent is formatted differently for the 2 cases. I haven't found anything in the documentation ( https://cran.r-project.org/web/packages/httr/httr.pdf) that has been particularly helpful in overcoming this.
The format it appears to be sending out in the first instance (1 row in the data frame) Looks like this:
{"provName":"Name","site":"site","monthJuly":56,"monthAugust":71,"monthSeptember":65,"monthOctober":78,"monthNovember":75,"monthDecember":98,"monthJanuary":23,"monthFebruary":39,"monthMarch":38,"monthApril":42,"monthMay":57,"monthJune":54,"assumptions":"Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","rationale":"","fiscalYear":2022,"updateDtm":"2023-02-03 15:19:40"}
and again, it works sans issues.
With 2 rows I get the following format:
{"provName":["Name","Name"],"site":["site","site"],"monthJuly":[56,56],"monthAugust": [71,71],"monthSeptember":[65,65],"monthOctober":[78,78],"monthNovember":[75,75],"monthDecember": [98,98],"monthJanuary":[23,23],"monthFebruary":[39,39],"monthMarch":[38,38],"monthApril": [42,42],"monthMay":[57,57],"monthJune":[54,54],"assumptions":["Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017","Monthly Volume: Last 3 Months of 2019 ; Baseline Seasonality: 2017 ; Trend: 2017"],"rationale":["",""],"17":2,"18":2}
And it fails with status 400.
I suppose I could use lapply and PUT for each row, however with thousands of rows in a dataframe, I think that would be less than ideal.
Anyone have any light to share on this?
Any help would be greatly appreciated!
PS: this didn't really answer my question
R httr put requets
and as I mentioned, Doing something like this is not ideal:
Convert each data frame row to httr body parameter list without enumeration
Looks like you are using a list as the request body. Use a data frame instead.
Lists and data frames get serialized to JSON differently:
jsonlite::toJSON(list(x = 1:2, y = 3:4))
#> {"x":[1,2],"y":[3,4]}
jsonlite::toJSON(data.frame(x = 1:2, y = 3:4))
#> [{"x":1,"y":3},{"x":2,"y":4}]

How to handle "write.xlsx" error: arguments imply differing number of rows

I'm trying to write an xlsx file from a list of dataframes that I created but I'm getting an error due to missing data (I couldn't download it). I just want to write the xlsx file besides having this lacking data. Any help is appreciated.
For replication of the problem:
library(quantmod)
name_of_symbols <- c("AKER","YECO","SNOA")
research_dates <- c("2018-11-19","2018-11-19","2018-11-14")
my_symbols_df <- lapply(name_of_symbols, function(x) tryCatch(getSymbols(x, auto.assign = FALSE),error = function(e) { }))
my_stocks_OHLCV <- list()
for (i in 1:3) {
trade_date <- paste(as.Date(research_dates[i]))
OHLCV_data <- my_symbols_df[[i]][trade_date]
my_stocks_OHLCV[[i]] <- data.frame(OHLCV_data)
}
And you can see the missing data down here in my_stocks_OHLCV[[2]] and the write.xlsx error I'm getting:
print(my_stocks_OHLCV)
[[1]]
AKER.Open AKER.High AKER.Low AKER.Close AKER.Volume AKER.Adjusted
2018-11-19 2.67 3.2 1.56 1.75 15385800 1.75
[[2]]
data frame with 0 columns and 0 rows
[[3]]
SNOA.Open SNOA.High SNOA.Low SNOA.Close SNOA.Volume SNOA.Adjusted
2018-11-14 1.1 1.14 1.01 1.1 107900 1.1
write.xlsx(my_stocks_OHLCV, "C:/Users/MICRO/Downloads/Datasets_stocks/dux_OHLCV.xlsx")
Error in (function (..., row.names = NULL, check.rows = FALSE,
check.names = TRUE,:arguments imply differing number of rows: 1, 0
How do I run write.xlsx even though I have this missing data?
The main question you need to ask is, what do you want instead?
As you are working with stock data, the best idea, is that if you don't have data for a stock, then remove it. Something like this should work,
my_stocks_OHLCV[lapply(my_stocks_OHLCV,nrow)>0]
If you want a row full of NA or 0
Then use the lapply function and for each element of the list, of length 0, replace with either NA's, vector of 0's (c(0,0,0,0,0,0)) etc...
Something like this,
condition <- !lapply(my_stocks_OHLCV,nrow)>0
my_stocks_OHLCV[condition] <- data.frame(rep(NA,6))
Here we define the condition variable, to be the elements in the list where you don't have any data. We can then replace those by NA or swap the NA for 0. However, I can't think of a reason to do this.
A variation on your question, and one you could handle inside your for loop, is to check if you have data, and if you don't, replace the values there, with NAs, and you could given it the correct headers, as you know which stock it relates to.
Hope this helps.

Selecting features from a feature set using mRMRe package

I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.

R - Number Rounding error when copying to Row

Below is how I created an empty data frame I intend to populate 1 row at a time from a data source.
numTweets=31
finalDataFrame = as.data.frame( matrix(NA, numTweets-1, 23), stringsAsFactors=FALSE)
names(finalDataFrame) = c( "TweetID", "TweetTime", "Text", "Source",
"UserID", "Username", "Screenname", "FollowerCount", "FriendCount",
"Location", "Latitude", "Longitude", "ReplyTweetID", "ReplyUserID",
"ReplyScreenname", "RetweetID", "RetweetCreated", "RetweetUsername",
"RetweetScreename", "RetweetLocation", "RetweetFollowers", "RetweetFriends",
"RetweetSource" )
An example of a row I am inserting is below as well
print( thisRow, row.names=FALSE )
TweetID TweetTime Text
877010425019158529 Tue Jun 20 03:49:14 +0000 2017 #OmniDestiny I would recommend trying to find the facebook group for evergreen because i think their school facebook page got shut down.
Source UserID Username Screenname FollowerCount FriendCount Location Latitude Longitude ReplyTweetID ReplyUserID ReplyScreenname RetweetID
Twitter Web Client 843603187298779137 Albert HellhoyZ 4 72 Bellevue, WA 0 0 876742560328417281 4726147296 OmniDestiny NA
RetweetCreated RetweetUsername RetweetScreename RetweetLocation RetweetFollowers RetweetFriends RetweetSource
NA NA NA NA NA NA NA
So, this row looks perfectly fine, and the data frame I have created to store it in looks fine. However, when I try to copy it in...
## Z minus 1 since we started our loop at 2
finalDataFrame[z-1, ] = thisRow
Many values get weird. For example, thisRow displays the ReplyTweetID value (an int64 value) perfectly as 876742560328417281, but when I look at it in R in the finalDataFrame....
finalDataFrame[1, "ReplyTweetID" ]
[1] 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000046816189162956993
>
I'm not sure what can cause this drastic change. Any ideas?
EDIT: I'm pretty sure it has to be due to the value being an int64, and the Matrix not liking that. However, is there a way to prep the Matrix for that? I can alternatively do toString( IDVALUEHERE ) when I am making "thisRow" in the first place, but that seems like it shouldn't be necessary.

Replace all contents of a googlesheet using R googlesheets package?

Just discovered the googlesheets package and find it very helpful thus far. I would now like to be able to replace all or a subset of the contents in an existing sheet.
Example:
> library(googlesheets)
> set.seed(10)
> test1 <- data.frame(matrix(rnorm(10),nrow = 5))
> test1
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
> gs_new("foo_sheet", input = test1, trim = TRUE)
This creates a new sheet as expected. Let's say that we then need to update the sheet (this data is used for a shinyapps.io hosted shiny app, and I would prefer to not have to redeploy the app in order to change sheet references).
> test1$X2 <- NULL
> test1
X1
1 0.01874617
2 -0.18425254
3 -1.37133055
4 -0.59916772
5 0.29454513
I tried to simply overwrite with gs_new() but run into the following warning message:
> gs_new("foo_sheet", input = test1, trim = TRUE)
Warning message:
At least one sheet matching "foo_sheet" already exists, so you may
need to identify by key, not title, in future.
This results in a new sheet foo_sheet being created with a new key, but does not replace the existing sheet and will therefore produce a key error if we try to register the updated sheet with
gs_title("foo_sheet")
Error in gs_lookup(., "sheet_title", verbose) :
"foo_sheet" matches sheet_title for multiple sheets returned by gs_ls() (which should reflect user's Google Sheets home screen). Suggest you identify this sheet by unique key instead.
This means that if we later try to access the new sheet foo_sheet with gs_read("foo_sheet"), the API will return the original sheet, rather than the new one.
> df <- gs_read("foo_sheet")
> df
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
It is my understanding that one possible solution could be to first delete the sheet with gs_delete("test1") and then create a new one. Alternatively one could perhaps empty cells with gs_edit_cells(), but was hoping for some form of overwrite function.
Thanks in advance!
I find that the edit cells function is a good workaround:
gs_edit_cells(ss = "foo_sheet", ws = "worksheet name", input = test1, anchor = "A1" trim = TRUE, col_names = TRUE)
By anchoring the data to the upper left corner, you can effectively overwrite all other data. The trim function will eliminate all cells that are not to be updated.

Resources