Can't export special characters from DynamoDB with AWS cli - amazon-dynamodb

I am trying to export all my data from DynamoDB using AWS cli.
I use this command:
aws dynamodb scan --tablename TABLENAME > output.json
When I open the file, the Norwegian special characters (æøå) are replaced with this symbol �.
Is there any way to export the data with special characters?

Related

Export in parquet file format in Teradata

I am trying to export data using the TDload utility of Teradata and I need the exported file to be in parquet format.
The command I have used is:
tdload --SourceTdpid xxx.xxx.xxx.xxx --SourceUserName dbc
--SourceUserPassword dbc --SourceTable DimAccount
--TargetFilename DimAccount.parquet
But this does not export the data in parquet.
How to achieve it?

Can you use Athena ODBC/JDBC to return the S3 location of results?

I've been using the metis package to run Athena queries via R. While this is great for small queries, there still does not seem to be a viable solution for queries with very large return datasets (10's of thousands of rows, for example). However, when running these same queries in the AWS console, it is fast/straightforward to use the download link to obtain the CSV file of the query result.
This got me thinking: is there a mechanism for sending the query via R but returning/obtaining the S3:// bucket location where the query results live instead of the normal results object?
As mentioned in my comment above you could investigate the RAthena and noctua packages.
These packages connect to AWS Athena using AWS SDK's as their drivers. What this means is that they will also download the data from S3 in as similar method that is mentioned by #Zerodf. They both use data.table to load the data into R so they are pretty quick. Also you can get to Query Execution ID if required for some reason.
Here is an example of how to use the packages:
RAthena
Create a connection to AWS Athena, for more information around how to connect please look at: dbConnect
library(DBI)
con <- dbConnect(RAthena::athena())
Example in how to query Athena:
dbGetQuery(con, "select * from sampledb.elb_logs")
How to access the Query ID:
res <- dbSendQuery(con, "select * from sampledb.elb_logs")
sprintf("%s%s.csv",res#connection#info$s3_staging, res#info$QueryExecutionId)
noctua
Create a connection to AWS Athena, for more information around how to connect please look at: dbConnect
library(DBI)
con <- dbConnect(noctua::athena())
Example in how to query Athena:
dbGetQuery(con, "select * from sampledb.elb_logs")
How to access the Query ID:
res <- dbSendQuery(con, "select * from sampledb.elb_logs")
sprintf("%s%s.csv",res#connection#info$s3_staging, res#info$QueryExecutionId)
Sum up
These packages should do what you are looking for however as they download the data from the query output in s3 I don't believe you will need to go to the query execution ID to do the same process.
You could look at the Cloudyr Project. They have a package that handles creating the signature requests for the AWS API. Then you can fire off a query, poll AWS until the query finishes (using the QueryExecutionID), and use aws.s3 to download the result set.
You can also use system() to use AWS CLI commands to execute a query, wait for the results, and download the results.
For example: You could run the following commands on the command line to get the results of a query.
$ aws athena start-query-execution --query-string "select count(*) from test_null_unquoted" --execution-context Database=stackoverflow --result-configuration OutputLocation=s3://SOMEBUCKET/ --output text
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Once you get the query-execution-id, then you can check on the results.
$ aws athena get-query-execution --query-execution-id=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX --output text
QUERYEXECUTION select count(*) from test_null_unquoted XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
QUERYEXECUTIONCONTEXT stackoverflow
RESULTCONFIGURATION s3://SOMEBUCKET/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.csv
STATISTICS 104 1403
STATUS 1528809056.658 SUCCEEDED 1528809054.945
Once the query succeeds, you can download the data.
$ aws s3 cp s3://stack-exchange/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.csv
Edit:
You can even turn those commands into a one liner (Bash example here), but I'm sure you could do the same thing in powershell.
$ eid=`aws athena start-query-execution --query-string "select count(*) from test_null_unquoted" --query-execution-context Database=SOMEDATABASE--result-configuration OutputLocation=s3://SOMEBUCKET/ --output text --output text` && until aws athena get-query-execution --query-execution-id=$eid --output text | grep "SUCCEEDE
D"; do sleep 10 | echo "waiting..."; done && aws s3 cp s3://SOMEBUCKET/$eid.csv . && unset eid

Does SQOOP support export for CLOB/BLOB data back to ORACLE / SQL Server

I am newbie to SQOOP 1.4.5. I have gone through the sqoop documentation. I have successfully Imported / Exported the simple datatypes kinds of records to and from hdfs.
NEXT I TRIED FOR LOB DATA FOR EXAMPLE CLOB.
I have a simple CLOB table that Create Query is as following...
CREATE TABLE “SCOTT”.”LARGEDATA” (“ID” VARCHAR2(20 BYTE), “IMG” CLOB ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE “USERS” LOB (“IMG”) STORE AS BASICFILE (TABLESPACE “USERS” ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING );
I can successfully import data to hdsf
sqoop import –connect jdbc:oracle:thin:#:1522: –username –password –table ‘LARGEDATA’ -m 1 –target-dir /home/mydata/tej/LARGEDATA2 –fields-terminated-by , –escaped-by \\ –enclosed-by ‘\”‘
But when I tried to export this data BACK to ORACLE using following command
sqoop export –connect jdbc:oracle:thin:#:1522: –username –password –table ‘LARGEDATA’ -m 1 –export-dir /home/mydata/tej/LARGEDATA2 –fields-terminated-by , –escaped-by \\ –enclosed-by ‘\”‘
I got following exception
java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef at java.lang.Object.clone(Native Method)
java.io.IOException: Could not buffer record at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
and the error metioned in this link https://stackoverflow.com/questions/30778340/sqoop-export-4000-characters-column-data-into-oracle-clob
I google about it and got following links that have mentioned that sqoop does not support export for BLOB and CLOB data. Out of that some are of Jul 2015 post. and some jira issue shown it is still opened. forum links are as following…
https://issues.apache.org/jira/browse/SQOOP-991
Can sqoop export blob type from HDFS to Mysql?
http://sofb.developer-works.com/article/19310921/Can+sqoop+export+blob+type+from+HDFS+to+Mysql%3F
http://grokbase.com/t/sqoop/user/148te4tghg/sqoop-import-export-clob-datatype
Exporting sequence file to Oracle by Sqoop
Can anyone please let me know is SQOOP support export for LOB data? if yes then please guide me how can I do this?
Try creating a staging table in oracle and use --staging-table --clear-staging-table. Keep staging table column as varchar2(10000).

error during sqoop data migration

I have one doubt. I am trying to pull data from oracle and want to push it into HDFS using sqoop 1.4.6.The table which I want to migrate contains column named "COMMENT"(which is a reserved keyword in oracle), but when I tried to push table into HDFS using sqoop, the error occurred was:
15/09/30 14:52:49 ERROR db.DBRecordReader: Top level exception:
java.sql.SQLSyntaxErrorException: ORA-00936: missing expression
I have tried by putting \ and " for this column as:
"\"\"COMMENT\"\"" when I listed column names during query
so how to get this error fixed..
Please try to use --query option for sqoop import.
eg: sqoop import --query "select COMMENT from Table_Name ....."

R Postgres and shortcut commands?

Do postgres shortcuts like \d+ tablename work with RPostgreSQL?
If I try to run 'em I get a syntax error: Error: '\d' is an unrecognized escape in character string starting "\d".
I tried escape it, but did not figure it out. RPostgreSQL itself works -- I can access and query my database.
No. These are "psql metacommands" and only recognised by the psql command-line interpreter. Only SQL commands can be passed through RPostgreSQL to the Postgres database.

Resources