How to Import SQLite data (gathered by an Android device) into either Octave or MatLab? - sqlite

I have some data gathered by an Android phone and it is stored in SQLite format in an SQLite file. I would like to play around with this data (analysing it) using either MatLab or Octave. The SQLite data is stored as a file.
I was wondering what commands you would use to import this data into MatLab? To say, put it into a vector or matrix. Do I need any special toolboxes or packages like the Database Package to access the SQL format?

There is the mksqlite tool.
I've used it personally, had some issues of getting the correct version for my version of matlab. But after that, no problems. You can even query the database file directly to reduce the amount of data you import into matlab.

Although mksqlite looks nice it is not available for Octave, and may not be suitable as a long-term solution. Exporting the tables to CSV-files is an option, but the importing (into Octave) can be quite slow for larger data sets because of the string-parsing involved.
As an alternative, I ended up writing a small Python script to convert my SQLite table into a MAT file, which is fast to load into either Matlab or Octave. MAT files are platform-neutral binary files, and the method works both for columns with numbers and strings.
import sqlite3
import scipy.io
conn = sqlite3.connect('my_data.db')
csr = conn.cursor()
res = csr.execute('SELECT * FROM MY_TABLE')
db_parms = list(map(lambda x: x[0], res.description))
# Remove those variables in db_parms you do not want to export
X = {}
for prm in db_parms:
csr.execute('SELECT "%s" FROM MY_TABLE' % (prm))
v = csr.fetchall()
# v is now a list of 1-tuples
X[prm] = list(*zip(*v))
scipy.io.savemat('my_data.mat', X)

Related

Alternative of xlwings without using CoInitialize

I am trying to write a code that will be able to replace cells in an existing excel sheet with a dataframe values. The lines work, but the problem is: that the codes require pythoncom library, a windows base library. So when I tried to upload it to streamlit cloud, which is based on Linux, an error arose.
the code is like this:
pythoncom.CoInitialize()
with xw.App() as app:
wb=xw.Book(path)
wb.sheets(sheet_name).range(kolom_cluster+str(header+1)).options( index=False,chunksize=30_000).value=df["cluster"]
wb.save(path)
wb.close()
Therefore, I am wondering if there is an alternative to doing the same
(writing df values to an existing excel file) without the need for pythoncom?
Thank you very much for your kind attention and solution.
This is my first question in Stackoverflow. Have been using it for quite a while. I have tried to search for the same solution over StackOverflow quite some time. Thus, I am hoping, if you guys could help me solve the problem.
Thank you very much.
PS.
I have tried script like this, using openpyxl
writer = pd.ExcelWriter(path, engine='openpyxl',
mode='a', # append data instead of overwriting all the book by default
if_sheet_exists='overlay' # write the data over the old one instead of raising a ValueError
)
df["cluster"].to_excel(
writer,
sheet_name=sheet_name,
startrow=header+1, # upper left corner where to start writing data
startcol=ord(kolom_cluster), # note that it starts from 0, not 1 as in Excel
index=False, # don't write index
header=False # don't write headers
)
writer.save()
writer.close()
But it returns some error:
I think we could use pandas.ExcelWriter in this case with openpyxl as an engine, for example. I hope the code below is self-explaining:
writer = pd.ExcelWriter(
"path to the file of interest",
engine='openpyxl',
mode='a', # append data instead of overwriting all the book by default
if_sheet_exists='overlay' # write the data over the old one instead of raising a ValueError
)
df.to_excel(
writer,
sheet_name="worksheet of interest"
startrow=0, # upper left corner where to start writing data
startcol=0, # note that it starts from 0, not 1 as in Excel
index=False, # don't write index
header=False # don't write headers
)
writer.save()
Update
Here's a short test to see if it works:
import pandas as pd
from pathlib import Path
from openpyxl import Workbook
df = pd.DataFrame(1, [1,2,3], [*'abc'])
f = Path('test_openpyxl.xlsx')
if not f.exists():
wb = Workbook()
wb.worksheets[0].title = "Data"
wb.save(f)
wb.close()
with pd.ExcelWriter(
f,
mode='a',
if_sheet_exists='overlay'
) as writer:
assert writer.engine == 'openpyxl'
for n, (i, j) in enumerate(zip([0,0,3,3],[0,3,0,3]), 1):
(n*df).to_excel(
writer,
sheet_name="Data",
startrow=i,
startcol=j,
index=False,
header=False
)
My environment:
python 3.9.7
pandas 1.4.3
openpyxl 3.0.10
excel version 2108 (Office 365)

How can I write an array to a specific excel sheet beginning at specific cell using Julia XLSX.jl?

Coming from MATLAB, the MATLAB way to write an array to a specific sheet and from a specific cell is straight forward:
xlswrite("filename",array,"sheetname2","cell")
I've tried reading the tutorials on how do do this using the Julia XLSX.jl package, but I just can't understand what's going on or how to do it. Is there a "straight forward" way to do this in Julia using XLSX.jl, or is there another package I could use that has easier syntax? At the beginning of my Julia script I have:
using DataFrames, XLSX
data = DataFrame(XLSX.readtable("filename","sheetname1")...)
which works just fine.
I must say, Julia is fast and free, but many Julia tutorials aren't written for the Julia newbie and leave alot of the syntax unexplained.
I believe the other answer is not correct so here is mine.
Suppose you have the following Excel file with two sheets (BTW notice this very useful syntax for creating multi-sheet Excel files):
using XLSX, DataFrames
df = DataFrame(a=1:3, b=string.('a':'c'), d=0.1:0.1:0.3)
XLSX.writetable("file.xlsx", sheet1=(eachcol(df), names(df)),
sheet2=(eachcol(df), names(df)))
I understand the goal is to add some data to let's say sheet1. This can be accomplished by:
XLSX.openxlsx("file.xlsx", mode="rw") do xf
XLSX.writetable!(xf["sheet1"], eachcol(df), names(df); anchor_cell=XLSX.CellRef("A6"))
end
Let us see the result:
If you rather want just to put a matrix use setdata!:
julia> mx = rand(3,2)
3×2 Matrix{Float64}:
0.626637 0.245274
0.560975 0.59444
0.439289 0.0400645
julia> XLSX.openxlsx("file.xlsx", mode="rw") do xf
XLSX.setdata!(xf["sheet1"], XLSX.CellRef("A6"), mx)
end
This is what finally did it for me (A is the array name):
XLSX.openxlsx("filename", mode="rw") do xf
for i = 1:size(A,1)
XLSX.setdata!(xf["sheet"], XLSX.CellRef(1+i,4), A[i,1])
XLSX.setdata!(xf["sheet"], XLSX.CellRef(1+i,5), A[i,2])
end
end

Exporting embeddings per epoch in Keras

I am trying to get access to the output of the embedding layer (the n-dimensional vectors) in Keras on a per epoch basis. There doesn't seem to be a specific callback for this. I 've tried the Tensorboard callbacks since it provides an option for logging the embeddings on each epoch but when I find the log files, I can't read them. They are probably files that can be accessed only by Tensorboard for visualization purposes. I need the embedding vectors to be saved in a format I can use later on outside keras, like a TSV file. Is there a way I could do this?
Thanks a lot!
OK, so I figured out how to do this, with much needed help from Nazmul Hasan on how to format the name to be updated with each epoch. Essentially I created a custom callback:
import io
encoder = info.features['text'].encoder
class CustomCallback(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
out_v = io.open('vecs_{}.tsv'.format(epoch), 'w', encoding='utf-8')
vec = model.layers[0].get_weights()[0] # skip 0, it's padding.
out_v.write('\t'.join([str(x) for x in vec]) + "\n")
out_v.close()

Issue when importing float as string from Excel. Adding precision incorrectly

Using openxlsx read.xlsx to import a dataframe from a multi-class column. The desired result is to import all values as strings, exactly as they're represented in Excel. However, some decimals are represented as very long floats.
Sample data is simply an Excel file with a column containing the following rows:
abc123,
556.1,
556.12,
556.123,
556.1234,
556.12345
require(openxlsx)
df <- read.xlsx('testnumbers.xlsx', )
Using the above R code to read the file results in df containing these string
values:
abc123,
556.1,
556.12,
556.12300000000005,
556.12339999999995,
556.12345000000005
The Excel file provided in production has the column formatted as "General". If I format the column as Text, there is no change unless I explicitly double-click each cell in Excel and hit enter. In that case, the number is correctly displayed as a string. Unfortunately, clicking each cell isn't an option in the production environment. Any solution, Excel, R, or otherwise is appreciated.
*Edit:
I've read through this question and believe I understand the math behind what's going on. At this point, I suppose I'm looking for a workaround. How can I get a float from Excel to an R dataframe as text without changing the representation?
Why Are Floating Point Numbers Inaccurate?
I was able to get the correct formats into a data frame using pandas in python.
import pandas as pd
test = pd.read_excel('testnumbers.xlsx', dtype = str)
This will suffice as a workaround, but I'd like to see a solution built in R.
Here is a workaround in R using openxlsx that I used to solve a similar issue. I think it will solve your question, or at least allow you to format as text in the excel files programmatically.
I will use it to reformat specific cells in a large number of files (I'm converting from general to 'scientific' in my case- as an example of how you might alter this for another format).
This uses functions in the openxlsx package that you reference in the OP
First, load the xlsx file in as a workbook (stored in memory, which preserves all the xlsx formatting/etc; slightly different than the method shown in the question, which pulls in only the data):
testnumbers <- loadWorkbook(here::here("test_data/testnumbers.xlsx"))
Then create a "style" to apply which converts the numbers to "text" and apply it to the virtual worksheet (in memory).
numbersAsText <- createStyle(numFmt = "TEXT")
addStyle(testnumbers, sheet = "Sheet1", style = numbersAsText, cols = 1, rows = 1:10)
finally, save it back to the original file:
saveWorkbook(testnumbers,
file = here::here("test_data/testnumbers_formatted.xlsx"),
overwrite = T)
When you open the excel file, the numbers will be stored as "text"

Loading data with RSQLite which has quoted values

I am trying to load a large-ish csv file into a SQL lite database using the RSQLite package (I have also tried the sqldf package). The file contains all UK postcodes and a variety of lookup values for them.
I wanted to avoid loading it into R and just directly load it into the database. Whilst this is not strictly necessary for this task, I want to do so in order to have the technique ready for larger files which won't fit in memory should I have to handle them in the future.
Unfortunately the csv is provided with the values in double quotes and the dbWriteTable function doesn't seem able to strip them or ignore them in any form. Here is the download location of the file: http://ons.maps.arcgis.com/home/item.html?id=3548d835cff740de83b527429fe23ee0
Here is my code:
# Load library
library("RSQLite")
# Create a temporary directory
tmpdir <- tempdir()
# Set the file name
file <- "data\\ONSPD_MAY_2017_UK.zip"
# Unzip the ONS Postcode Data file
unzip(file, exdir = tmpdir )
# Create a path pointing at the unzipped csv file
ONSPD_path <- paste0(tmpdir,"\\ONSPD_MAY_2017_UK.csv")
# Create a SQL Lite database connection
db_connection <- dbConnect(SQLite(), dbname="ons_lkp_db")
# Now load the data into our SQL lite database
dbWriteTable(conn = db_connection,
name = "ONS_PD",
value = ONSPD_path,
row.names = FALSE,
header = TRUE,
overwrite = TRUE
)
# Check the data upload
dbListTables(db_connection)
dbGetQuery(db_connection,"SELECT pcd, pcd2, pcds from ONS_PD LIMIT 20")
Having hit this issue, I found a reference tutorial (https://www.r-bloggers.com/r-and-sqlite-part-1/) which recommended using the sqldf package but unfortunately when I try to use the relevant function in sqldf (read.csv.sql) then I get the same issue with double quotes.
This feels like a fairly common issue when importing csv files into a sql system, most import tools are able to handle double quotes so I'm surprised to be hitting an issue with this (unless I've missed an obvious help file on the issue somewhere along the way).
EDIT 1
Here is some example data from my csv file in the form of a dput output of the SQL table:
structure(list(pcd = c("\"AB1 0AA\"", "\"AB1 0AB\"", "\"AB1 0AD\"",
"\"AB1 0AE\"", "\"AB1 0AF\""), pcd2 = c("\"AB1 0AA\"", "\"AB1 0AB\"",
"\"AB1 0AD\"", "\"AB1 0AE\"", "\"AB1 0AF\""), pcds = c("\"AB1 0AA\"",
"\"AB1 0AB\"", "\"AB1 0AD\"", "\"AB1 0AE\"", "\"AB1 0AF\"")), .Names = c("pcd",
"pcd2", "pcds"), class = "data.frame", row.names = c(NA, -5L))
EDIT 2
Here is my attempt using the filter argument in sqldf's read.csv.sql function (note that Windows users will need rtools installed for this). Unfortunately this still doesn't seem to remove the quotes from my data, although it does mysteriously remove all the spaces.
library("sqldf")
sqldf("attach 'ons_lkp_db' as new")
db_connection <- dbConnect(SQLite(), dbname="ons_lkp_db")
read.csv.sql(ONSPD_path,
sql = "CREATE TABLE ONS_PD AS SELECT * FROM file",
dbname = "ons_lkp_db",
filter = 'tr.exe -d ^"'
)
dbGetQuery(db_connection,"SELECT pcd, pcd2, pcds from ONS_PD LIMIT 5")
Also, thanks for the close vote from whoever felt this wasn't a programming question in the scope of Stack Overflow(?!).
The CSV importer in the RSQLite package is derived from the sqlite3 shell, which itself doesn't seem to offer support for quoted values when importing CSV files (How to import load a .sql or .csv file into SQLite?, doc). You could use readr::read_delim_chunked():
callback <- function(data) {
name <- "ONS_PD"
exists <- dbExistsTable(con, name)
dbWriteTable(con, name, data, append = exists)
}
readr::read_delim_chunked(ONSPD_path, callback, ...)
Substitute ... with any extra arguments you need for your CSV file.
Use read.csv.sql from the sqldf package with the filter argument and provide any utility which strips out double quotes or which translates them to spaces.
The question does not provide a fully reproducible minimal example but I have provided one below. If you are using read.csv.sql in order to pick out a subset of rows or columns then just add the appropriate sql argument to do so.
First set up the test input data and then try any of the one-line solutions shown below. Assuming Windows, ensure that the tr utility (found in R's Rtools distribution) or the third party csvfix utility (found here and for Linux also see this) or the trquote2space.vbs vbscript utility (see Note at end) is on your path:
library(sqldf)
cat('a,b\n"1","2"\n', file = "tmp.csv")
# 1 - corrected from FAQ
read.csv.sql("tmp.csv", filter = "tr.exe -d '^\"'")
# 2 - similar but does not require Windows cmd quoting
read.csv.sql("tmp.csv", filter = "tr -d \\42")
# 3 - using csvfix utility (which must be installed first)
read.csv.sql("tmp.csv", filter = "csvfix echo -smq")
# 4 - using trquote2space.vbs utility as per Note at end
read.csv.sql("tmp.csv", filter = "cscript /nologo trquote2space.vbs")
any of which give:
a b
1 1 2
You could also use any other language or utility that is appropriate. For example, your Powershell suggestion could be used although I suspect that dedicated utilities such as tr and csvfix would run faster.
The first solution above is corrected from the FAQ. (It did work at the time the FAQ was written many years back but testing it now in Windows 10 it seems to require the indicated change or possibly the markdown did not survive intact from the move from Google Code, where it was originally located, to github which uses a slightly different markdown flavor.)
For Linux, tr is available natively although quoting differs from Windows and can even depend on the shell. csvfix is available on Linux too but would have to be installed. The csvfix example shown above would work identically on Windows and Linux. vbscript is obviously specific to Windows.
Note: sqldf comes with a mini-tr utility written in vbscript. If you change the relevant lines to:
Dim sSearch : sSearch = chr(34)
Dim sReplace : sReplace = " "
and change the name to trquote2space.vbs then you will have a Windows specific utility to change double quotes to spaces.
Honestly I could not find anything to solve this problem.
sqldf documentation tells
"so, one limitation with .csv files is that quotes
are not regarded as special within files so a comma within a data field such as
"Smith, James"
would be regarded as a field delimiter and the quotes would be entered as part of the data which
probably is not what is intended"
So, It looks like there is no solution as far as I know.
One possible suboptimal approach (other then obvious find and replace in text editor)
is to use SQL commands like this
dbSendQuery(db_connection,"UPDATE ONS_PD SET pcd = REPLACE(pcd, '\"', '')")

Resources