Replace all contents of a googlesheet using R googlesheets package? - r

Just discovered the googlesheets package and find it very helpful thus far. I would now like to be able to replace all or a subset of the contents in an existing sheet.
Example:
> library(googlesheets)
> set.seed(10)
> test1 <- data.frame(matrix(rnorm(10),nrow = 5))
> test1
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
> gs_new("foo_sheet", input = test1, trim = TRUE)
This creates a new sheet as expected. Let's say that we then need to update the sheet (this data is used for a shinyapps.io hosted shiny app, and I would prefer to not have to redeploy the app in order to change sheet references).
> test1$X2 <- NULL
> test1
X1
1 0.01874617
2 -0.18425254
3 -1.37133055
4 -0.59916772
5 0.29454513
I tried to simply overwrite with gs_new() but run into the following warning message:
> gs_new("foo_sheet", input = test1, trim = TRUE)
Warning message:
At least one sheet matching "foo_sheet" already exists, so you may
need to identify by key, not title, in future.
This results in a new sheet foo_sheet being created with a new key, but does not replace the existing sheet and will therefore produce a key error if we try to register the updated sheet with
gs_title("foo_sheet")
Error in gs_lookup(., "sheet_title", verbose) :
"foo_sheet" matches sheet_title for multiple sheets returned by gs_ls() (which should reflect user's Google Sheets home screen). Suggest you identify this sheet by unique key instead.
This means that if we later try to access the new sheet foo_sheet with gs_read("foo_sheet"), the API will return the original sheet, rather than the new one.
> df <- gs_read("foo_sheet")
> df
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
It is my understanding that one possible solution could be to first delete the sheet with gs_delete("test1") and then create a new one. Alternatively one could perhaps empty cells with gs_edit_cells(), but was hoping for some form of overwrite function.
Thanks in advance!

I find that the edit cells function is a good workaround:
gs_edit_cells(ss = "foo_sheet", ws = "worksheet name", input = test1, anchor = "A1" trim = TRUE, col_names = TRUE)
By anchoring the data to the upper left corner, you can effectively overwrite all other data. The trim function will eliminate all cells that are not to be updated.

Related

Write values from R to a PostgreSQL table based on Row IDs

I have a PostgreSQL table Scores on a local server that looks like this:
ID Score_X Score_Y
1 NA NA
2 NA NA
3 NA NA
4 NA NA
I do a series of calculations in R that produces a dataframe Calc_Scores that looks like this:
ID Score_X Score_Y
1 0.53 0.81
4 0.75 0.95
I would like to write the scores that correspond with each ID from R to the PostgreSQL table such that the final PostgreSQL table should look like this:
ID Score_X Score_Y
1 0.53 0.81
2 NA NA
3 NA NA
4 0.75 0.95
I have a connection to the PostgreSQL table called connection which I setup using the function dbConnect(). The actual tables are quite big. What line/code in R could I use to write these scores to the PostgreSQL table? I have been looking for a similar question but couldn't find anything. I have tried
dbWriteTable(connection, "Scores", value = Calc_Scores, overwrite=T, append = F, row.names = F)
However, the entire table gets overwritten. I want only the scores to be updated.
Thank you.
Creating a temporary table could be an option:
# Create temporary table
dbWriteTable(connection, "ScoresTmp", value = Calc_Scores, overwrite=T, append = F, row.names = F)
# Update main table
dbExecute(connection,"
UPDATE Scores
SET Score_X = ScoresTmp.Score_X,
Score_Y = ScoresTmp.Score_Y
FROM ScoresTmp
WHERE Scores.ID = ScoresTmp.ID
")
# Clean up
dbExecute(connection,"DROP TABLE ScoresTmp")
Note that you should be able to create a real temporary table using the temporary=TRUE option : according to #Sirius comment below, it should work on a PostGreSQL database.
For users of an SQLServer database, this option doesn't work, but they can use the # prefix to create a temporary table.
In the example above, this would be:
dbWriteTable(connection, "#ScoresTmp", value = Calc_Scores, overwrite=T, append = F, row.names = F)
One way of doing this relies on the SQL 'update' and in essence you do
- open a connection to your database
- loop over your changeset and for each row
- form the update statement, i.e. for example via
cmd <- paste('update table set x=', Score_x, ', y=',
Score_y, ' where id=', id)
- submit the cmd via eg `dbSendQuery`
- close the connection
There are examples in RPostgreSQL.

Need to use jsonlite to handle ndjson message list using stream_in() and stream_out()

I have an ndjson data source. For a simple example, consider a text file with three lines, each containing a valid json message. I want to extract 7 variables from the messages and put them in a dataframe.
Please use the following sample data in a text file. You can paste this data into a text editor and save it as "ndjson_sample.txt"
{"ts":"1","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":12353,\"Var5\":1,\"Var6\":\"abc\",\"Var7\":\"x\"}"}
{"ts":"2","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-68,\"Var4\":4528,\"Var5\":1,\"Var6\":\"def\",\"Var7\":\"y\"}"}
{"ts":"3","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":-5409,\"Var5\":1,\"Var6\":\"ghi\",\"Var7\":\"z\"}"}
The following three lines of code accomplish what I want to do:
file1 <- "ndjson_sample.txt"
json_data1 <- ndjson::stream_in(file1)
raw_df_temp1 <- as.data.frame(ndjson::flatten(json_data1$ct))
For reasons I won't get into, I cannot use the ndjson package. I must find a way to use the jsonlite package to do the same thing using the stream_in() and stream_out() functions. Here's what I tried:
con_in1 <- file(file1, open = "rt")
con_out1 <- file(tmp <- tempfile(), open = "wt")
callback_func <- function(df){
jsonlite::stream_out(df, con_out1, pagesize = 1)
}
jsonlite::stream_in(con_in1, handler = callback_func, pagesize = 1)
close(con_out1)
con_in2 <- file(tmp, open = "rt")
raw_df_temp2 <- jsonlite::stream_in(con_in2)
This is not giving me the same data frame as a final output. Can you tell me what I'm doing wrong and what I have to change to make raw_df_temp1 equal raw_df_temp2?
I could potentially solve this with a the fromJSON() functions operating on each line of the file, but I'd like to find a way to do it with the stream functions. The files I will be dealing with a are quite large and so efficiency will be key. I need this to be as fast as possible.
Thank you in advance.
Currently under ct you'll find a string that can (subsequently) be fed to fromJSON independently, but it will not be parsed as such. Ignoring your stream_out(stream_in(...),...) test, here are a couple of ways to read it in:
library(jsonlite)
json <- stream_in(file('ds_guy.ndjson'), simplifyDataFrame=FALSE)
# opening file input connection.
# Imported 3 records. Simplifying...
# closing file input connection.
cbind(
ts = sapply(json, `[[`, "ts"),
do.call(rbind.data.frame, lapply(json, function(a) fromJSON(a$ct)))
)
# ts Var1 Var2 Var3 Var4 Var5 Var6 Var7
# 1 1 6 6 -70 12353 1 abc x
# 2 2 6 6 -68 4528 1 def y
# 3 3 6 6 -70 -5409 1 ghi z
Calling fromJSON on each string might be cumbersome, and with larger data this slow-down is why there is stream_in, so if we can capture the "ct" component into a stream of its own, then ...
writeLines(sapply(json, `[[`, "ct"), 'ds_guy2.ndjson')
(There are far-more-efficient ways to do this with non-R tools, including perhaps a simple
sed -e 's/.*"ct":"\({.*\}\)"}$/\1/g' -e 's/\\"/"/g' ds_guy.ndjson > ds_guy.ndjson2
though this makes a few assumptions about the data that may not be perfectly safe. A better solution would be to use jq, which should "always" correctly-parse proper json, then a quick sed to replace escaped quotes:
jq '.ct' ds_guy.ndjson | sed -e 's/\\"/"/g' > ds_guy2.ndjson
and you can do that with system(...) in R if needed.)
From there, under the assumption that each line will contain exactly one row of data.frame data:
json2 <- stream_in(file('ds_guy2.ndjson'), simplifyDataFrame=TRUE)
# opening file input connection.
# Imported 3 records. Simplifying...
# closing file input connection.
cbind(ts=sapply(json, `[[`, "ts"), json2)
# ts Var1 Var2 Var3 Var4 Var5 Var6 Var7
# 1 1 6 6 -70 12353 1 abc x
# 2 2 6 6 -68 4528 1 def y
# 3 3 6 6 -70 -5409 1 ghi z
NB: in the first example, "ts" is a factor, all others are character because that's what fromJSON gives. In the second example, all strings are factor. This can easily be addressed through judicious use of stringsAsFactors=FALSE, depending on your needs.

Selecting features from a feature set using mRMRe package

I am a new user of R and trying to use mRMRe R package (mRMR is one of the good and well known feature selection approaches) to obtain feature subset from a feature set. Please excuse if my question is simple as I really want to know how I can fix an error. Below is the detail.
Suppose, I have a csv file (gene.csv) having feature set of 6 attributes ([G1.1.1.1], [G1.1.1.2], [G1.1.1.3], [G1.1.1.4], [G1.1.1.5], [G1.1.1.6]) and a target class variable [Output] ('1' indicates positive class and '-1' stands for negative class). Here's a sample gene.csv file:
[G1.1.1.1] [G1.1.1.2] [G1.1.1.3] [G1.1.1.4] [G1.1.1.5] [G1.1.1.6] [Output]
11.688312 0.974026 4.87013 7.142857 3.571429 10.064935 -1
12.538226 1.223242 3.669725 6.116208 3.363914 9.174312 1
10.791367 0.719424 6.115108 6.47482 3.597122 10.791367 -1
13.533835 0.37594 6.766917 7.142857 2.631579 10.902256 1
9.737828 2.247191 5.992509 5.992509 2.996255 8.614232 -1
11.864407 0.564972 7.344633 4.519774 3.389831 7.909605 -1
11.931818 0 7.386364 5.113636 3.409091 6.818182 1
16.666667 0.333333 7.333333 4.333333 2 8.333333 -1
I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
When I run this code, I am getting following error for the statement f_data <- mRMR.data(data = data.frame(df)):
Error in .local(.Object, ...) :
data columns must be either of numeric, ordered factor or Surv type
However, data in each column of the csv file are real number.So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of target_indices in the statement mRMR.ensemble(data = f_data, target_indices = 7,feature_count = 2, solution_count = 1) as my target class variable name is "[Output]" in the gene.csv file.
I will appreciate much if anyone can help me to obtain the best feature subset based on the gene.csv file using mRMRe R package.
I solved the problem by modifying my code as follows.
library(mRMRe)
file_n<-paste0("E:\\gene", ".csv")
df <- read.csv(file_n, header = TRUE)
df[[7]] <- as.numeric(df[[7]])
f_data <- mRMR.data(data = data.frame(df))
results <- mRMR.classic("mRMRe.Filter", data = f_data, target_indices = 7,
feature_count = 2)
solutions(results)
It worked fine. The output of the code gives the indices of the selected 2 features.
I think it has to do with your Output column which is probably of class integer. You can check that using class(df[[7]]).
To convert it to numeric as required by the warning, just type:
df[[7]] <- as.numeric(df[[7]])
That worked for me.
As for the other question, after reading the documentation, setting target_indices = 7 seems the right choice.

R: trouble assigning values to a dynamic variable in a dataframe

I am trying to assign values to a dataframe variable defined by the user. The user specifies the name of the variable, let's call this x, in the dataframe df. For simplicity I want to assign a value of 3 to everything in the column the user specifies. The simplified code is:
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
But I get an error:
Error in file(filename, "r") : cannot open the connection
In addition: Warning message:
In file(filename, "r") :
cannot open file 'df$x': No such file or directory
I've tried all kinds of remedies to no avail. If I simply try to print the values of the column.
eval(parse(text=variableName))
I get no errors and it prints out ok. It's only when I try to give that column a value that I get the error. Any help would be appreciated.
I believe the issue is that there is no way to use the result of eval() on the LHS of an assignment.
df = data.frame(foo = 1:5,
bar = -3)
x = "bar"
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
#> Warning in file(filename, "r"): cannot open file 'df$bar': No such file or
#> directory
#> Error in file(filename, "r"): cannot open the connection
## This error is a bit misleading. Breaking it apart I get a different error.
eval(expression(df$bar)) <- 3
#> Error in eval(expression(df$bar)) <- 3: could not find function "eval<-"
## And it works if you put it all in the string to be parsed.
ex1 <- paste0("df$", x, "<-3")
eval(parse(text=ex1))
df
#> foo bar
#> 1 1 3
#> 2 2 3
#> 3 3 3
#> 4 4 3
#> 5 5 3
## But I doubt that's the best way to do it!

Missing `parse` information inside vignette build

Goal
The goal is to create a package that parses R scripts and lists functions (from the package - like mvbutils- but also imports).
Function
The main function relies on parsing R script with
d<-getParseData(x = parse(text = deparse(x)))
Reproducible code
For example in an interactive R session the output of
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x)))
Has for first few lines:
line1 col1 line2 col2 id parent token terminal text
23 1 1 4 1 23 0 expr FALSE
1 1 1 1 8 1 23 FUNCTION TRUE function
2 1 10 1 10 2 23 '(' TRUE (
3 1 11 1 11 3 23 SYMBOL_FORMALS TRUE x
4 1 12 1 12 4 23 ')' TRUE )
Error
When building a vignette with knitr containing - either with knit html from RStudio or devtools::build_vignettes, the output of the previous chunk of code is NULL. On the other hand using "knitr::knit" inside an R session will give the correct output.
Questions:
Is there a reason for the parser to behave differently inside the knit function/environment, and is there a way to bypass this?
Update
Changing code to:
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x),keep.source = TRUE))
Fixes the issue, but this does not answer the question of why the same function behaves differently.
From the help page ?options:
keep.source:
When TRUE, the source code for functions (newly defined or loaded) is stored internally allowing comments to be kept in the right places. Retrieve the source by printing or using deparse(fn, control = "useSource").
The default is interactive(), i.e., TRUE for interactive use.
When building the vignette, you are running a non-interactive R session, so the source code is discarded in parse().
parse(file = "", n = NULL, text = NULL, prompt = "?",
keep.source = getOption("keep.source"), srcfile,
encoding = "unknown")

Resources