How to write contents of data frame back to range? - r

I need to perform the following sequence:
Open Excel Workbook
Read specific worksheet into R dataframe
Read from a database updating dataframe
Write dataframe back to worksheet
I have steps 1-3 working OK using the BERT tool. (the R scripting interface)
For step 2 I use range.to.data.frame from BERT
Any pointer on how to perform step 4? There is no data.frame.to.range
I tried range$put_Value(df) but no error return and no update to Excel
I can update a single cell from R using put_Value - which I cannot see documented
#
# manipulate status data using R BERT tool
#
wb <- EXCEL$Application$get_ActiveWorkbook()
wbname = wb$get_FullName()
ws <- EXCEL$Application$get_ActiveSheet()
topleft = ws$get_Range( "a1" )
rng = topleft$get_CurrentRegion()
#rngbody = rng$get_Offset(1,0)
ssot = rng$get_Value()
ssotdf = range.to.data.frame( ssot, headers=T )
# emulate data update on 2 columns
ssotdf$ServerStatus = "Disposed"
ssotdf$ServerID = -1
# try to write df back
retcode = rng$put_Value(ssotdf)

This answer doesn't use R Excel BERT.
Try the openxlsx library. You probably can do all the steps using that library. For the step 4, after installing openxlsx library, the following code will write a file:
openxlsx::write.xlsx(ssotdf, 'Dataframe.xlsx',asTable = T)

I think your problem is that you are not changing the size of the range, so you are not going to see your new columns. Try creating a new range that has two extra columns before you insert the data.

I just had the same question and was able to resolve it by transforming the data.frame to a matrix in the call to put_value. I figured this out after playing with the old version in excel-functions.r. Try something like:
retcode = rng$put_Value(as.matrix(ssotdf))

You may have already solved your problem but, if not, the following stripped down R function does what I think you need:
testDF <- function(rng1,rng2){
app <- EXCEL$Application
ref1 <- app$get_Range( rng1 ) # get source range reference
data <- ref1$get_Value() # get source range data
#
ref2 <- app$get_Range( rng2 ) # get destination range reference
ref2$put_Value( data ) # put data in destination range
}
I simulated a dataframe by setting values in range "D4:F6" of the speadsheet to:
col1 col2 col3
1 2 txt1
7 3 txt2
then ran
testDF("D4:F6","H10:J12")
in the Bert console. The dataframe then appears in range "H10:J12".

Related

Importing Excel-tables in R

Is there a way to import a named Excel-table into R as a data.frame?
I typically have several named Excel-tables on a single worksheet, that I want to import as data.frames, without relying on static row - and column references for the location of the Excel-tables.
I have tried to set namedRegion which is an available argument for several Excel-import functions, but that does not seem to work for named Excel-tables. I am currently using the openxlxs package, which has a function getTables() that creates a variable with Excel-table names from a single worksheet, but not the data in the tables.
To get your named table is a little bit of work.
First you need to load the workbook.
library(openxlsx)
wb <- loadWorkbook("name_excel_file.xlsx")
Next you need to extract the name of your named table.
# get the name and the range
tables <- getTables(wb = wb,
sheet = 1)
If you have multiple named tables they are all in tables. My named table is called Table1.
Next you to extract the column numbers and row numbers, which you will later use to extract the named table from the Excel file.
# get the range
table_range <- names(tables[tables == "Table1"])
table_range_refs <- strsplit(table_range, ":")[[1]]
# use a regex to extract out the row numbers
table_range_row_num <- gsub("[^0-9.]", "", table_range_refs)
# extract out the column numbers
table_range_col_num <- convertFromExcelRef(table_range_refs)
Now you re-read the Excel file with the cols and rows parameter.
# finally read it
my_df <- read.xlsx(xlsxFile = "name_excel_file.xlsx",
sheet = 1,
cols = table_range_col_num[1]:table_range_col_num[2],
rows = table_range_row_num[1]:table_range_row_num[2])
You end up with a data frame with only the content of your named table.
I used this a while ago. I found this code somewhere, but I don't know anymore from where.
This link is might be useful for you
https://stackoverflow.com/a/17709204/10235327
1. Install XLConnect package
2. Save a path to your file in a variable
3. Load workbook
4. Save your data to df
To get table names you can use function
getTables(wb,sheet=1,simplify=T)
Where:
wb - your workbook
sheet - sheet name or might be the number as well
simplify = TRUE (default) the result is simplified to a vector
https://rdrr.io/cran/XLConnect/man/getTables-methods.html
Here's the code (not mine, copied from the topic above, just a bit modified)
require(XLConnect)
sampleFile = "C:/Users/your.name/Documents/test.xlsx"
wb = loadWorkbook(sampleFile)
myTable <- getTables(wb,sheet=1)
df<-readTable(wb, sheet = 1, table = myTable)
You can check next packages:
library(xlsx)
Data <- read.xlsx('YourFile.xlsx',sheet=1)
library(readxl)
Data <- read_excel('YourFile.xlsx',sheet=1)
Both options allow you to define specific regions to load the data into R.
I use read.xlsx from package openxlsx. For example:
library(openxlsx)
fileA <- paste0(some.directory,'excel.file.xlsx')
A <- read.xlsx(fileA, startRow = 3)
hope it helps

Subset large .csv file at reading in R

I have a very large .csv file (~4GB) which I'd like to read, then subset.
The problem comes at reading (memory allocation error). Being that large reading crashes, so what I'd like is a way to subset the file before or while reading it, so that it only gets the rows for one city (Cambridge).
f:
id City Value
1 London 17
2 Coventry 21
3 Cambridge 14
......
I've already tried the usual approaches:
f <- read.csv(f, stringsAsFactors=FALSE, header=T, nrows=100)
f.colclass <- sapply(f,class)
f <- read.csv(f,sep = ",",nrows = 3000000, stringsAsFactors=FALSE,
header=T,colClasses=f.colclass)
which seem to work for up to 1-2M rows, but not for the whole file.
I've also tried subsetting at the reading itself using pipe:
f<- read.table(file = f,sep = ",",colClasses=f.colclass,stringsAsFactors = F,pipe('grep "Cambridge" f ') )
and this also seems to crash.
I thought packages sqldf or data.table would have something, but no success yet !!
Thanks in advance, p.
I think this was alluded to already but just in case it wasn't completely clear. The sqldf package creates a temporary SQLite DB on your machine based on the csv file and allows you to write SQL queries to perform subsets of the data before saving the results to a data.frame
library(sqldf)
query_string <- "select * from file where City=='Cambridge' "
f <- read.csv.sql(file = "f.csv", sql = query_string)
#or rather than saving all of the raw data in f, you may want to perform a sum
f_sum <- read.csv.sql(file = "f.csv",
sql = "select sum(Value) from file where City=='Cambridge' " )
One solution to this type of error is
you can convert your csv file to excel file first.
Then you can map your excel file into mysql table by using toad for mysql it is easy.Just check for datatype of variables.
then using RODBC package you can access such a large dataset.
I am working with a datasets of size more than 20 GB this way.
Although there's nothing wrong with the existing answers, they miss the most conventional/common way of dealing with this: chunks (Here's an example from one of the multitude of similar questions/answers).
The only difference is, unlike for most of the answers that load the whole file, you would read it chunk by chunk and only keep the subset you need at each iteration
# open connection to file (mostly convenience)
file_location = "C:/users/[insert here]/..."
file_name = 'name_of_file_i_wish_to_read.csv'
con <- file(paste(file_location, file_name,sep='/'), "r")
# set chunk size - basically want to make sure its small enough that
# your RAM can handle it
chunk_size = 1000 # the larger the chunk the more RAM it'll take but the faster it'll go
i = 0 # set i to 0 as it'll increase as we loop through the chunks
# loop through the chunks and select rows that contain cambridge
repeat {
# things to do only on the first read-through
if(i==0){
# read in columns only on the first go
grab_header=TRUE
# load the chunk
tmp_chunk = read.csv(con, nrows = chunk_size,header=grab_header)
# subset only to desired criteria
cond = tmp_chunk[,'City'] == "Cambridge"
# initiate container for desired data
df = tmp_chunk[cond,] # save desired subset in initial container
cols = colnames(df) # save column names to re-use on next chunks
}
# things to do on all subsequent non-first chunks
else if(i>0){
grab_header=FALSE
tmp_chunk = read.csv(con, nrows = chunk_size,header=grab_header,col.names = cols)
# set stopping criteria for the loop
# when it reads in 0 rows, exit loop
if(nrow(tmp_chunk)==0){break}
# subset only to desired criteria
cond = tmp_chunk[,'City'] == "Cambridge"
# append to existing dataframe
df = rbind(df, tmp_chunk[cond,])
}
# add 1 to i to avoid the things needed to do on the first read-in
i=i+1
}
close(con) # close connection
# check out the results
head(df)

Read Multiple ncdf files and make average in R

By using R ill try to open my NetCDF data that contain 5 dimensional space with 15 variables. (variable for calculation is in matrix 1000X920 )
This problem actually look like the same with the other question before.
I got explanation from here and the others
At first I used RNetCDF package, but after some trial i found unconsistensy when the package read my data. And then finally better after used ncdf package.
there is no problem for opening data in a single file, but after ill try for looping in more than hundred data inside folder for a spesific variable (for example: var no 15) the program was failed.
> days = formatC(001:004, width=3, flag="0")
> ncfiles = lapply (days,
> function(d){ filename = paste("data",d,".nc",sep="")
> open.ncdf(filename) })
also when i try the command like this for a spesific variable
> sapply(ncfiles,function(file,{get.var.ncdf(file,"var15")})
so my question is, any solution to read all netcdf file with special variable then make calculation in one frame. From the solution before i was failed for generating the variable no 15 on whole netcdf data.
thanks for any solution to this problem.
UPDATE:
this is the last what i have done
when i write
library(ncdf)
files=list.files("allnc/",pattern='*nc',full.names=TRUE)
for(i in seq_along(files)) {
nc <- lapply(files[i],open.ncdf)
lw = get.var.ncdf(nc,"var15")
x=dim(lw)
rbind(df,data.frame(lw))->df
}
i can get all netcdf data by > nc
so i how i can get variable data with new name automatically like lw1,lw2...etc
i cant apply
var1 <- lapply(files, FUN = get.var.ncdf, variable = "var15")
then i can do calculation with all data.
the other technique i try used RNetCDF package n doing a looping
# Declare data frame
df=NULL
#Open all files
files= list.files("allnc/",pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'DBZH')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'var15')
# Add the values from each file to a single data.frame
}
i can take a variable data but i just got one data from my all file nc.
note: sampe of my data name ( data20150102001.nc,data20150102002.nc.....etc)
This solution uses NCO, not R. You may use it to check your R solution:
ncra -v var15 data20150102*.nc out.nc
That is all.
Full documentation in NCO User Guide.
You can use the ensemble statistics capabilities of CDO, but note that on some systems the number of files is limited to 256:
cdo ensmean data20150102*.nc ensmean.nc
you can replace "mean" with the statistic of your choice, max, std, var, min etc...

R not recognizing Excel cells populated with Bloomberg API code

I have built a dynamically updating spreadsheet in Excel with BBG addin that pulls price data using BBG API. I am trying to pull a table from that sheet into R and create a simple scatterplot using the below code:
wb <- loadWorkbook("Fx Vol Framework.xlsx")
data <- readWorksheet(wb,sheet = "Carry", region = "AL40:AN68",header=TRUE, rownames = 1)
plot(data,ylim = c(-2,12))
with(data,text(data, labels = row.names(data), pos = 1))
reg1 <- lm(data[,2]~data[,1])
abline(reg1)
The region I am calling (AL40:AN68) is populated with results from an HLOOKUP formula that calls from cells with BBG API. When I run the code, I get the below error (repeats the same error text for each cell):
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: Error when trying to evaluate cell AM41 - Name '_xll.BDP' is completely unknown in the current workbook
If I go back to the excel sheet and populate that same region AL40:AN68 with numeric values (copy -> paste values), save the workbook, and run the same code, I get the scatterplot I was expecting with the original code. Is there any way for me to get the scatterplot using the cells with Bloomberg API or do I need to run it with simple numeric values? Do I need the Bbg package for this to work? Thank you.
A simpler approach for other users now might be to use the Rblpapi package from CRAN to connect to the Bloomberg API directly.
I'm not familiar with BBG add in, but it seems like after calling readWorksheet you would want to call a function that actually opens the workbook. I think that would in a sense "complete the binding". At any rate I sometimes need to pass data between R and excel. Here is how I'd tackle the problem using the package RDCOMClient.
R Code:
library(RDCOMClient)
exB <- COMCreate("Excel.Application")
book <- exB$Workbooks()$Open("C:/'the right directory/exp.xlsx'")
dNames <- book$Worksheets("Sheet1")$Range("AL40:AN40")
dValues <- book$Worksheets("Sheet1")$Range("AL41:AN68")
dNames <- unlist(dNames[["Value"]])
dValues <- unlist(dValues[["Value"]])
data1 <- matrix(dValues,ncol=3)
colnames(data1) <- dNames
data1 <- as.data.frame(data1)
plot(data1$v1,data1$v2)
Obviously you can plot things, model things or whatever in a number of ways, but this gets things into R which is probably the best place for it. At any rate there is also a good introduction to using the RDCOMClient package to connect R and Excel for quick data tasks at http://www.omegahat.org/RDCOMClient/Docs/introduction.html

Read Excel Tables, not simple named ranges

To avoid "duplicate" close request: I know how to read Excel named ranges; examples are given in the code below. This is about "real" tables in Excel.
Excel2007 and later have the useful concept of tables: you can convert ranges to tables, and avoid hassles when sorting and rearranging. When you create a table in an Excel range, it gets a default name (Tabelle1 in German Version, TableName in the following example), but you can additionally simply name the range of the table (TableAsRangeName); as indicated by the icons in the Excel range name editor, these two seem to be treated differently.
I have not been able to read these tables (in the strict sense) from R. The only known workaround is using CSV intermediate, or converting the table to a normal named range, which has nasty irreversible side effects when you use column names in cell references; these are converted to A1 notation.
The example below shows the problem. You mileage may vary with different combinations of 32/64 bit ODBC drivers and 32/64 bit Java
# Read Excel Tables (not simply named ranges)
# Test Computer: 64 Bit Windows 7, R 32 bit
# My ODBC drivers are 32 bit
library(RODBC)
# Test file has three ranges
# NonTable Simple named range
# TableName Name of table
# TableAsRangeName Named Range covering the above table
sampleFile = "ExcelTables.xlsx"
if (!file.exists(sampleFile)){
download.file("http://www.menne-biomed.de/uni/ExcelTables.xlsx",sampleFile)
# Or do it manually, if this fails
}
# ODBC
channel = odbcConnectExcel2007(sampleFile)
sqlQuery(channel, "SELECT * from NonTable") # Ok
sqlQuery(channel, "SELECT * from TableName") # Could not find range
sqlQuery(channel, "SELECT * from TableAsRangeName") # Could not find range
close(channel)
# gdata has read.xls, but seems not to support named regions
library(xlsx)
wb = loadWorkbook(sampleFile)
getRanges(wb) # This one fails already with "TableName" does not exist
ws = getSheets(wb)[[1]]
readRange("NonTable",ws) # Invalid range address
readRange("TableName",ws) # Invalid range address
readRange("TableAsRangeName",ws) # Invalid range address
# my machine requires 64 bit for this one; depends on your Java installation
sampleFile = "ExcelTables.xlsx"
library(XLConnect) # requires Java
readNamedRegionFromFile(sampleFile,"NonTable") # OK
readNamedRegionFromFile(sampleFile,"TableName") # "TableName" does not exist
readNamedRegionFromFile(sampleFile,"TableAsRangeName") # NullPointerException
wb <- loadWorkbook(sampleFile)
readNamedRegion(wb,"NonTable") # Ok
readNamedRegion(wb,"TableName") # does not exist
readNamedRegion(wb,"TableAsRangeName") # Null Pointer
I've added some initial support for Excel tables in XLConnect. Please find the latest changes on github at https://github.com/miraisolutions/xlconnect
In the following a small sample:
require(XLConnect)
sampleFile = "ExcelTables.xlsx"
wb = loadWorkbook(sampleFile)
readTable(wb, sheet = "ExcelTable", table = "TableName")
Note that Excel tables are associated to a sheet. So as far as I can see it's possible to have multiple tables with the same name associated to different sheets. For this reason there is a sheet-argument to readTable.
You are correct that the table definitions are stored in XML.
sampleFile = "ExcelTables.xlsx"
unzip(sampleFile, exdir = 'test')
library(XML)
tData <- xmlParse('test/xl/tables/table1.xml')
tables <- xpathApply(tData, "//*[local-name() = 'table']", xmlAttrs)
[[1]]
id name displayName ref totalsRowShown
"1" "TableName" "TableName" "G1:I4" "0"
library(XLConnect)
readWorksheetFromFile(sampleFile, sheet = "ExcelTable", region = tables[[1]]['ref'], header = TRUE)
Name Age AgeGroup
1 Anton 44 4
2 Bertha 33 3
3 Cäsar 21 2
Depending on your situation you could search in the XML files for appropriate quantities.
Later addition:
readxl::readxl can read "real" tables and probably is the least troublesome solution when you want to read data frames/tibbles.
** After #Jamzy comment **
I tried again and could not read named ranges. False positive then or false negative now???

Resources