I'm a MongoDB beginner.
I'm working on Intellijidea IDE to develop JAVA program in the aim to execute data mining processes on social media like Twitter and Facebook, based on Twitter4j and Facebook4j.
I use MongoDB to store database collections for test and evaluation purposes. I have saved several MongoDB databases, which were all accessible until a few days, in a folder as E:/data/db. So, all my previous databases are accessible in E:/data/db and I can easily control the structure of the databases collections through a terminal command in windows shell (show dbs(), show collections(), db stats()).
Last week, I launched a new data mining database collection, with several collections, and probably made a mistake in the localization of the database on my computer, where I put the new database in E:/data/db/newdatabase.
The problem is that I need to get the data mining process running while I would like to analyze the old databases collections through the R software.
Right now, I'm not able to access to the old MongoDB databases on Windows terminal command, as I can see only that there is some bytes, but no structured collections etc… When I'm trying to call the collections and the databases from R with rmongodb package, I'm not able to see the previous collections.
Might I be able to restore the old databases collections with Mongo restore or something like that ? What Kind of mistake could I have done to keep these old databases collections not accessible while there were a few days ago ?
MongoDB is not intended to be manipulated at the filesystem level.
Instead, you should be using mongoexport and mongoimport to transfer individual databases.
Check if you still have your collections data in your E:\data\db directory, named with your collection name
yourCollection.0
yourCollection.1
yourCollection.ns
Try by copy your newdatabase and db folder in a new folder (like E:\backup) then try to start two mongo instance :
mongod --dbpath=E:\backup\db --port 27001
mongod --dbpath=E:\backup\newdatabase --port 27002
try to connect to each dabase and check if eveything is ok (no data corruption, ...)
mongo --port 27001
mongo --port 27002
If it's ok, then, as jmkgreen explain, export your database, and import it in your previous database.
Related
I have an existing database for which i was looking to create a new clustered environment. I tried the following steps:
Create a new database instance (OS & DB Server).
Take a backup / snapshot from existing database server for all the databases.
Import the snapshot to the new server.
Configure the cluster - referred to various sites but all giving same solution. Example reference site - https://vexxhost.com/resources/tutorials/how-to-configure-a-galera-cluster-with-mariadb-on-ubuntu-12-04/
Ran the command (sudo galera_new_cluster) on the primary server. (Primary server - no issue starting up). But when we tried starting the secondary server - it actually crashed for some reason.
Unfortunately at this point, dont have the logs stored / backed up with me where it failed. But it seemed like it tried to sync in with the primary server - had some failure with that.
As for additional part of the actions performed above. Both the server with same username / password - created a passwordless ssh connection between both the machines. Also, the method of syncing is set to rsync.
Am i missing something or doing it wrong? Is there a better way available on it?
We need to connect to on premise Teradata from Azure Databricks .
Is that possible at all ?
If yes please let me know how .
I was looking for this information as well and I recently was able to access our Teradata instance from Databricks. Here is how I was able to do it.
Step 1. Check your cloud connectivity.
%sh nc -vz 'jdbcHostname' 'jdbcPort'
- 'jdbcHostName' is your Teradata server.
- 'jdbcPort' is your Teradata server listening port. By default, Teradata listens to the TCP port 1025
Also check out Databrick’s best practice on connecting to another infrastructure.
Step 2. Install Teradata JDBC driver.
Teradata Downloads page provides JDBC drivers by version and archive type. You can also check the Teradata JDBC Driver Supported Platforms page to make sure you pick the right version of the driver.
Databricks offers multiple ways to install a JDBC library JAR for databases whose drivers are not available in Databricks. Please refer to the Databricks Libraries to learn more and pick the one that is right for you.
Once installed, you should see it listed in the Cluster details page under the Libraries tab.
Terajdbc4.jar dbfs:/workspace/libs/terajdbc4.jar
Step 3. Connect to Teradata from Databricks.
You can define some variables to let us programmatically create these connections. Since my instance required LDAP, I added LOGMECH=LDAP in the URL. Without LOGMECH=LDAP it returns “username or password invalid” error message.
(Replace the text in italic to the values in your environment)
driver = “com.teradata.jdbc.TeraDriver”
url = “jdbc:teradata://Teradata_database_server/Database=Teradata_database_name,LOGMECH=LDAP”
table = “Teradata_schema.Teradata_tablename_or_viewname”
user = “your_username”
password = “your_password”
Now that the connection variables are specified, you can create a DataFrame. You can also explicitly set this to a particular schema if you have one already. Please refer to Spark SQL Guide for more information.
Now, let’s create a DataFrame in Python.
My_remote_table = spark.read.format(“jdbc”)\
.option(“driver”, driver)\
.option(“url”, url)\
.option(“dbtable”, table)\
.option(“user”, user)\
.option(“password”, password)\
.load()
Now that the DataFrame is created, it can be queried. For instance, you can select some particular columns to select and display within Databricks.
display(My_remote_table.select(“EXAMPLE_COLUMN”))
Step 4. Create a temporary view or a permanent table.
My_remote_table.createOrReplaceTempView(“YOUR_TEMP_VIEW_NAME”)
or
My_remote_table.write.format(“parquet”).saveAsTable(“MY_PERMANENT_TABLE_NAME”)
Step 3 and 4 can also be combined if the intention is to simply create a table in Databricks from Teradata. Check out the Databricks documentation SQL Databases Using JDBC for other options.
Here is a link to the write-up I published on this topic.
Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects
If you create a virtual network that can connect to on prem then you can deploy your databricks instance into that vnet. See https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html.
I assume that there is a spark connector for terradata. I haven't used it myself but I'm sure one exists.
You can't. If you run Azure Databricks, all the data needs to be stored in Azure. But you can call the data using REST API from Teradata and then save data in Azure.
I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.
According to one of my posts (below) it seems that there is no such thing as a database in Oracle. What we call database in MySQL and MS-SQL is called schema in Oracle.
If that is the case, then why do the oracle docs mention the create database statement ?
For the record, I am using Oracle 11g and oracle SQL Developer GUI tool.
Post-
How to create a small and simple database using Oracle 11 g and SQL Developer?
The create database statement from oracle docs is given below. If there is no database concept, then how did this command come into the picture ?
CREATE DATABASE
CREATE DATABASE [ database ]
{ USER SYS IDENTIFIED BY password
| USER SYSTEM IDENTIFIED BY password
| CONTROLFILE REUSE
| MAXDATAFILES integer
| MAXINSTANCES integer
| CHARACTER SET charset
| NATIONAL CHARACTER SET charset
| SET DEFAULT
{ BIGFILE | SMALLFILE } TABLESPACE
| database_logging_clauses
| tablespace_clauses
| set_time_zone_clause
}... ;
There is concept of a "database" in Oracle. What the term "database" means in Oracle terms is different than what the term means in MySQL or SQL Server.
Since you are using the express edition, Oracle automatically runs the CREATE DATABASE statement as part of the installation process. You can only have 1 express edition database on a single machine. If you are installing a different edition, you can choose whether to have the installer create a database as part of the installation process or whether to do that manually via the CREATE DATABASE statement later. If you are just learning Oracle, you're much better off letting Oracle create the database for you at installation time-- you can only create the database via command-line tools (not SQL Developer) and it is rare that someone just starting out would need to tweak the database settings in a way that the installer didn't prmopt you for.
In Oracle, a "database" is a set of data files that includes the data files for the SYS and SYSTEM schemas which contain all the Oracle data dictionary tables, the data files for the TEMP tablespace where sorts and other temporary operations occur, and the data files for whatever schemas you want to create. In SQL Server and other RDBMSs, these would be separate "databases". In SQL Server, you have a master database, a tempdb database, additional database for different products (i.e. msdb for the SQL Server Agent), and then additional user-defined databases. In Oracle, these would all be separate schemas in a larger container that Oracle refers to as a "database".
Occasionally, a DBA will want to run multiple Oracle databases on the same server-- most commonly when there are different packaged applications that have different requirements about database versions or parameters. If you want to run application A that requires an 11.2 database and application B that doesn't support 11.2 yet, you would need to have two different databases on the server. The DBA could create a separate database and a separate instance but that doubles the memory requirements, doubles the number of background processes required to run the database, and generally makes things less scalable. It's necessary if you really want to run different versions of the database simultaneously but it's not ideal.
The person who answered your original question is correct. The DDL (Data Definition Language) above prepares a space for schemas, which is analogous to MySQL's 'database'. The above statement defines characteristics of the schemas, such as timezone, MBs of space for tables, encoding characterset, root account, etc. You would then issue DDL statements such as those in your other post to create schemas, which define what each user can see.
How to create a small and simple database using Oracle 11 g and SQL Developer ?
I am seeing too many errors and I cannot find any way to make a simple database.
For example
create database company;
Caused the following error:
Error starting at line 1 in command:
create database company
Error at Command Line:1 Column:0
Error report:
SQL Error: ORA-01501: CREATE DATABASE failed
ORA-01100: database already mounted
01501. 00000 - "CREATE DATABASE failed"
*Cause: An error occurred during create database
*Action: See accompanying errors.
EDIT-
This is completely different from MySQL and MS-SQL that I am familiar with.
Not as intuitive as I was expecting.
First off, what Oracle calls a "database" is generally different than what most other database products call a "database". A "database" in MySQL or SQL Server is much closer to what Oracle calls a "schema" which is the set of objects owned by a particular user. In Oracle, you would generally only have one database per server (a large server might have a handful of databases on it) where each database has many different schemas. If you are using the express edition of Oracle, you are only allowed to have 1 database per server. If you are connected to Oracle via SQL Developer, that indicates that you already have the Oracle database created.
Assuming that you really want to create a schema, not a database (using Oracle terminology), you would create the user
CREATE USER company
IDENTIFIED BY <<password>>
DEFAULT TABLESPACE <<tablespace to use for objects by default>>
TEMPORARY TABLESPACE <<temporary tablespace to use>>
You would then assign the user whatever privileges you wanted
GRANT CREATE SESSION TO company;
GRANT CREATE TABLE TO company;
GRANT CREATE VIEW TO company;
...
Once that is done, you can connect to the (existing) database as COMPANY and create objects in the COMPANY schema.
Actually the answer from Justin above could not be more incorrect. SQL Server and MySQL are for smallish databases. Oracle is for large enterprise databases, thus the difference in it's structure. And it is common to have more than one Oracle database on a server provided that the server is robust enough to handle the load. If you received the error posted above then you obviously are trying to create a new Oracle database and if you are doing that then you probably already understand the structure of an Oracle database. The likely scenario is that you attempted to create a database using dbca, it initially failed, but the binaries were created. You then adjusted your initial parameters and re-tried creating the database using dbca. However, the utility sees the binaries and folder structure for the database that you are creating so it thinks that the database already exists but is not mounted. Dropping the database and removing the binaries and folders as well as any other cleanup of the initial attempt should be done first, then try again.
From your question description, I think you were to create a database schema, not a database instance. In Oracle terminology, a database instance is a set of files in the file system. It's more like data files in MySQL. Whereas database in MySQL is somewhat equivalent to Oracle's schema.
To create a schema in Oracle: https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6014.htm
To create a database instance in Oracle (I personally prefer CDBA):
https://docs.oracle.com/cd/E11882_01/server.112/e25494/create.htm#ADMIN11068
Notice the Oracle Express edition does not support mounting more than one database instance at one time.