Databricks, Folder Management and SQL. What is happening behind the scenes? - directory

New Databricks user.
Im able to create subfolders in the user directory I am provided.
E.g. I am provided /mnt/DUAXXX/USERID/files
and I can create /mnt/DUAXXX/USERID/files/subfolder.
However, I cannot figure out how to create tables in this subfolder and use the resulting dataset.
I issue the following command, because the source datasets reside in this location:
%python
use DUAXXX
However, I want to create the resulting dataset in the subfolder.
Ive tried something like:
create table test
location 'mnt/DUAXXX/USERID/files/subfolder'
select * from
data
This completes, but when I navigate using the Databricks GUI 'Data' tab, the test dataset appears in the DUAXXX folder.
However, when I issue the following command:
dbutils.fs.ls(f"dbfs:/mnt/DUAXXX/USERID/files/subfolder")
I see numerous sorts of .snappy.parquet files.
I know these files are created by the above code.
Its as though the underlying data is stored where I want them in this .snappy.parquet format, but Databricks is creating a link to all these files in the DUAXXX folder.
I realize a lot of this is likely down to how the administrations implemented Databricks, and I have no access to those people. Does anyone know what is actually going on here?
Ultimately, all I am trying to do is create subfolders to organize my datasets, rather than have everything in a single folder.
Thanks.

Related

Trying to import data into R in a way that will allow anyone to access it when opening the markdown file/ accessing the html knit

I am currently working on a coding project and I am running into trouble with how i Should import the data set. We are supposed to have it read in a way so that our instructor can access our markdown file and be able to import the data and run the code without changing file paths. I know about using relative file paths to make it accessible to anyone, however I don't know how to get around the /users/owner part of the file path. Any help would be greatly appreciated and if you have any further questions feel free to ask.
I've tried changing the working directory to a certain folder that both I and my instructor have named the same thing, however, like I said above, when I use read.csv to import the data frame I am still forced to use the /users/owner filepath which obviously is specific to my computer.
I can understand your supervisor, I request the same from my students. My recommended solution is to put both data and R script (or the .Rmd file) in the same folder. Then one does not need to add a path in the read.csv (or similar) function.
If you use RStudio, move to the folder in the Files pane and then use the gear icon and select "Set as Working Directory".
Then send both files (.R or .Rmd) and the data to the supervisor, ideally as a zip file. The supervisor can then unpack it to an arbitrary folder and just double click to the .R/.Rmd file. The containing folder will then automatically become the working directory.
Other options are:
to use a subfolder for the data or
to put the data to a publicly readable internet location, e.g.
Github and read it directly from there.
The last option requires of course that the data have a free license.

TestCase for multiple files

I created one TestCase which I want to use on multiple files in one folder. The TestCase is for each file the same. Is there a possibility to do that in the Execution Section?
If possible I want to see after every File if it was successful or not.
Thank you and best regards
I have a similar use case where I have 1 test case executing several iterations from a file. Currently, I'm leveraging Tricentis TDM solution to store the data rather then pulling from a file(s). You can create multiple repositories to store data ie: SQL Lite, MS SQL and other source systems. Check out https://www.tricentis.com/products/automate-continuous-testing-tosca/test-data-management/

How to extract infos in a .db file to create .csv or any viable "bookmark" file?

I am using a quite unknown bookmark manager on Android. I picked this one after trying others because it was possible to import, export, classify by folders, the design was good and it was easy to search in my bookmarks.
After importing all my bookmarks from other browsers and also from files, I started classifying all of them into folders, subfolders, etc..
I spent many days to classify them all as I wanted.
After classifying them, I tried to export them.
The problem is that the only option offered is to export them in a .html file, containing all the bookmarks but without any folder.
The .html file contains all my bookmarks but in complete desorder, and doesnt mention the folders.
In the app there was also a "backup" function, so I tried and it creates a .db file.
I opened this .db file with some SQLiteViewer app and I found written inside, among other things I dont understand, a list of all my bookmarks with a number next to each one of them, and also a list of my folders with next to them the corresponding number.
When I open the .db file, I have a choice between
-SQlite master
-android metadata
-bookmarks
-folders
-sqlite sequence
If I click on "Bookmarks", all my bookmarks are in a kind of spreadsheet with lines and columns. Next to them in another columns, for example for each bookmark related with "Kitchen recipes" it's written the number 1.
And in the "Folders" folder, next to the folder called "Recipes" its also written 1.
So I'm happy because it seems that my classification is stored in this file.
But the fact is I dont know how to extract easily all that data, and create with it a "bookmark" file importable in other bookmark app or browser ( for example .csv or .xbel or .html but with folders)
I guess I need some "script" working like this:
if the first raw in "Folders" got the number 8 next to it
Then take all the bookmarks in the "bookmarks" folder that also got an 8 written next to it, and put it inside this folder.
I'm a complete noob in coding, I dont know what is SQlite, nor anything.
So i know that maybe I am asking for too much informations at the same time.
But if some kind person could put me in the way, by explaining me if
thats possible
what would be the easiest way
if some solution already exist
if someone like me can do it and what do I have to learn if I want some day to be able to do it
Thanks
Here's pictures so you understand easier:
Sqlite
Folders
Bookmarks

Modifying tables interactively with Shiny

I am trying to create an interface where I can interactively modify column values of a given source csv permanently. It must function somewhat like MS excel - the entire table displays and I can change column values on the fly and the resulting modifications reflect in the source csv saved in a specific server directory. I was wondering if R shiny can do this. I have experience in creating fluid/reactive pages and manipulating display (column display,check boxes,sliders, filtering etc.) but I have no clue as to how the source data itself can be modified using the Shiny GUI. Can someone please provide some direction? What package (if available) is needed etc. I have full write access on the source csv so credentials are not a problem.
Once I have some traction I plan to expand the operations onto a database.
Thanks in advance!

Tie R script and QlikSense software together

I require qliksense to create an Excel file when the user selects a set of tuples and R to automatically pick up this file, perform my script on the file. My script then creates another CSV file which I then want qliksense to automatically pick up and perform some predefined operations on it. Is there any way I can link the two of these software together in such a manner?
So to clarify the flowchart is: Qlik gets a large data set -> the user selects a set of rows and creates csv -> My custom R script (picks up this csv automatically) is run on the csv and creates a new csv -> qlik picks it up (automatically) and visually displays the results of the program
Is there any kind of wrapper software to tie them together? Or is it a better idea to perhaps just make some sort of UI that works with R in the background and the user can manually pass the file through the UI? Thanks for the help.
Check out the R extension that has been developed on http://branch.Qlik.com , extensions are being created and added all the time. You will probably need to create an account to access the project but once you have the direct link is below.
http://branch.qlik.com/#/project/5677d32d7f70718900987bdd

Resources