I have created a MS Access database that is written using three (3) databases. One is a front end for the users (several users). The other two (2) are data storage tables only. The two data storage tables are identical (databases named the same, tables named the same...). One is stored locally on each users laptop and one is on the network (shared). The purpose of this arrangement is to allow the user a read only version of the data when off line traveling by using their local tables. They can make changes to the data when on line - these changes will be made to the on-line tables. I've written databases in MS Access for about 10 years, but I build mostly in queries. I'm not strong in VBA. Ideally I could get the VBA code to link to a button that changes the link from one data storage database to the other. I will then use this code to make an 'on-line' button and an 'off-line' so I can toggle back and forth. Thanks so much for your time and knowledge. I do appreciate it.
Related
I have some data contained in a CSV file, I need to efficiently access that information and want to importing it into my existing database.
I am wondering if I can make a pre-loaded database with the tables I need and then build the rest of the database on top of it (or make a second separate connection), or load the database from the CSV files on first startup.
What would be the preferred method and either way how would can I achieve it efficiently?
p.s 2 files are about 1000 lines long and 2 columns wide which seems to me to be considered fairly small... and the other ones really shouldn't be more then 10 lines long and 6-7 columns wide
Edit: realised I have a bunch of tables that need to be updated yearly, so any form that risks the user input data is unacceptable so using the existing DB is a not an option...
We can save an image with 2 way
upload image in Server and save image url in Database.
save directly image into database
which one is better?
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
Like many questions, the ansewr is "it depends." Systems like SharePoint use option 2. Many ticket tracking systems (I know for sure Trac does this) use option 1.
Think also of any (potential) limitations. As your volume increases, are you going to be limited by the size of your database? This has particular relevance to hosted databases and applications where increasing the size of your database is much more expensive than increasing your storage allotment.
Saving the image to the server will work better for a website, given that these are incidental to your website, like per customer branding images - if you're setting up the next Flickr obviously the answer would be different :). You'd want to set up one server to act as a file server, share out the /uploaded_images directory (or whatever you name it), and set up an application variable defining the base url of uploaded images. Why is it better? Cost. File servers are dirt cheap commodity hardware. You can back up the file contents using dirt cheap commodity (even just consumer grade) backup software. And if your file server croaks and someone loses a day of uploaded images? Who cares. They just upload them again. Our database server is an enterprise cluster running on SSD SAN. Our backups and tran logs are shipped to remote sites over expensive bandwidth and maintained even on tape for x period. We use it for all the data where we need the ACID (atomicity, consistency, isolation, durability) benefits of a RDBMS. We don't use it for company logos.
Store them in the database unless you have a good reason not to.
Storing them in the filesystem is premature optimization.
With a database you get referential integrity, you can back everything up at once, integrated security, etc.
The book SQL Anti-Patterns calls storing files in the filesystem an anti-pattern.
I have lots of data to wrangle and I need some help.
I have been using an excel file that has two worksheets of interest to me. They each produce a OLAP pivot table with the data I need to work with. What I would like to do is move those (.odc) connections to access queries so I don't have to hand paste all of this info out and manipulate it and then go through the whole process several more times.
One table is Throughput (number of parts through an operation(s)) by Part Number and by Date. The other is Hours Logged at the operation(s) by Part Number and by Date. I also have a master list of all part numbers with some more data that I have to mix in.
Biggest problem: Each chart is producing its own subset of dates and part numbers so I have to take care to match up the data to run the calculations. I've tried:
By hand. Got tired with that real quick.
Using LOOKUP, VLOOKUP, MATCH with INDIRECT and all sorts of tricks.
It's a mess. But I'm confident that if I can put the original pivot tables into Access I can add a few joins and write up a couple queries and it will turn out beautifully.
Worst comes to worse I can copy/paste the pivot table data into access by hand, but what if I want to change or expand the data set? I'd rather work with the raw data.
EDIT:
The data is held on SQL Server and I cannot change that.
The excel pivot tables use a .ODC file for the connection. They gives the following connection string:
Provider=MSOLAP.3;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=[MyCatalog];Data Source=[MySource];MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error
Provider=MSOLAP.4;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=[MyCatalog];Data Source=[MySource];MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error
(I replaced the actual catalog and source)
Can I use the .odc file information to create a pass through query in Access?
Have you consider using a proper OLAP server?
Comparison of OLAP Servers
Once setup you'll be able to connect your Excel's Pivot Table to the server (as well as other reporting tools).
Talked to our IT dept. The guy who built the Cubes is working on querying the same info into MS Access for me.
Thanks everyone.
I am a newbie with no comp sci background. So please forgive me for whatever dumb stuff I may say. I am working on a solar power monitoring project to monitor the power output of the solar power systems my company installs. I am writing a client that will query the inverter (for power output, voltage output, current output, system errors/faults, etc--which constitutes one "reading") of each of our monitoring customers every 15 minutes for as long as they have their system--which means roughly 35k readings per year per customer. So I was thinking of organizing my sqlite3 database in one of the two following ways.
(1) Have the database be two tables, one table with regular customer information (name, email, etc) and another much bigger table where each row represents one reading and includes the customer id and timestamp of reading as identifiers. Which means roughly 35k rows will be being added to this bigger table per customer per year. (Data more than two years old will be pared down and archived.)
OR
(2) Store all readings in a csv file (one csv file per customer) and store the csv file name in my table with regular customer information
This database will be serving a website (built on rails if that makes any difference for options) where customers will be able to view their power output data. I want to minimize the amount of time it will take to load their output data on login. I basically don't have a clear idea of the amount of time it would take for my computer to open and read in lines from a text file versus open, look for (based on customer id) and read in the data from a huge sqlite3 table--and therefore am having trouble knowing how to judge between the two options above. Also I'm having trouble gauging the limits of sqlite3 where it functions optimally despite having read some about it (I don't think I have the background to understand the reading I did because it seems to say 100s of millions of rows are just fine when I read other people's comments seeming to say just the opposite.). I am also open to a completely different option as I'm not married to anything right now. Whatever makes things load faster. Thanks so much in advance!
Storing the parsed data in sqlite would definitely be a timesaver if you're doing any kind of repeated data mining on it. CSV Parsing overhead would almost instantly eat up any database space/time savings you'd gain.
As for efficiency, you'd have to test it. There's no one hard fast rule that says "use this database" or "use that database". It's ALWAYS a "depends on the scenario". SQLite may be perfect for you in this case, but be useless for someone else with a slightly different workload.
SQL applications in general do very well with large data sets, as long as the columns being queried are indexed. You should keep them in the same database. It will take a huge lot less to obtain the data from the database than for parsing CSV files. Databases are created with the purpose of storing and retrieving data, CSV files are not.
I use MySQL databases with tens of millions of rows per table and queries return results in fractions of a second. SQLite might be faster.
Just make sure you create indexes for what you will be searching.
I would do option 1, but use a database server such as PostgreSQL instead of SQLite.
SQLite will lock the table on update so you may run into locking issues if you read and write to the table a lot. SQLite is better suited for single user applications on the desktop or in a smartphone.
You can easily have millions of rows without it causing any problems.
I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.
Anyway to begun, because there is no RSS feeds available, I'm screen scraping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.
However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).
So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.
Here's my current idea:
Download the page itself
Create a temporary table for the parsed data to go into
Do a comparison between the official and the temporary table and copy updates and/or new information to the official table
This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?
Edit 1:
I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.
This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.
Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?
NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.
YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.