How to drop delimiter in hive table - r

I download the file using download.file function in R
Save it locally
Move it to HDFS
My hive table is mapped to this location of file in HDFS. Now, this file has , as the delimiter/separator which I cannot change using download.file function from R.
There is a field that has , in its content. How do I tell hive to drop this delimiter if found anywhere within the field contents?
I understand we can change the delimiter but is there really a way I can drop it like sqoop allows us to do?
My R script is as simple as
download.file()
hdfs.put
Is there a workaround?

Related

Open CSV data in tableau

I have had problems uploading the following file to Tableau:
https://www.kaggle.com/datasets/shivamb/netflix-shows/download
When loaded it looks like this
but loading it in R
Is it possible to load them in R and then by Rserve connect to tableau or is there a way to load them fine
Looks like a problem within the interpreter.
I can't download the file myself as I don't have a Kaggle account, and its not clear from you R screenshots, though you could adjust the text file properties to see if you can adjust how the interpreter works by right-mouse the object "netflix_titles.csv" in the data model window and selecting Text file properties from the context menu.
Another option would be to try to use the interpreter Usar el intérprete de datos
It looks like Tableau is reading this file as a Text file and not a CSV. Tableau should have multiple headers for every comma that it sees but your screenshot has a single column for the entire first row.
Sometimes, Tableau can correctly read the file if you check the "Use Data Interpreter" checkbox.
If you have trouble making that work, just simply open the CSV in Excel and save it as an XLSX. You could even connect to it via an import to Google Sheets if you don't have Excel.

Getting embedded nul string while reading an excel csv

I'm trying to open an Excel CSV file within R Studio but I get this error:
Error Is this a valid CSV file? embedded nul in the string: 'C\0a\0m\0p\0a\0g\0n\0e\0_\0N\0o\0C\0a\0r\0a\0v\0a\0g\0g\0i\0o\0_\0C\0o\0s\0t\0o\0>\00'
the file is generated automatically by the Google Ads platform as Excel csv and it works normally with Excel but in order to open it on R Studio I have to convert it as .xlsx
is there a way to bypass this or to convert the file without opening it?
otherwise the script which is based upon this file needs a manual passage to convert the source file
What function are you using to open it? Check your file and see if you have other commas within a column value; this may confuse the function. Also, it is worth trying to use the "Import dataset" option in the environment window within Rstudio. Try to use the readr option and adjust your import options until you have it correct. Check the package RAdwords maybe you can extract your Google Ads information without the CSV exporting step.

Update csv file from shinyapp

I am building one Application using "shiny" package in R. I am passing the path of input .csv file into textInput component which is getting loaded into Dataframe at serverside. Now I want to apply sequence of operation on this data like convert to lowercase, remove punctuation and write it back to .csv file.
can anybody tell me how it is doable in R?

Pass R object name as argument in shell

I'm having a little trouble here using the shell command in R. I have the a java JAR file that takes as input a file containing a character vector (1 tweet per line). I'm calling it from the shell function:
shell("java -Xmx500m -jar C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/ark-tweet-nlp-0.3.2.jar --input-format text C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/examples/test.txt",intern=T)
Rather than pull the character vector from a text file external to the R environment, I want to be able to pass a vector that I have preprocessed within R. For example, if the file "text.txt" is imported into R as a character vector called test, I thought I could do this:
shell(paste("java -Xmx500m -jar C:/Users/User/Documents/R/java/ark-tweet-nlp-0.3.2/ark-tweet-nlp-0.3.2.jar --input-format text",test,sep=" "),intern=T)
But the jar file that is being called needs to actually read the file name, not the file contents. My workaround is to write the preprocessed file to my drive and then reimport using the shell script, but that is clunky and will mess up later processing I plan on doing.
Use the system command set to create an environment variable, then read it from java. The shared location will be the environment variable table.

Export From Teradata Table to CSV

Is it possible to transfer the date from the Teradata Table into .csv file directly.
Problem is - my table has more that 18 million rows.
If yes, please send tell me the process
For a table that size I would suggest using the FastExport utility. It does not natively support a CSV export but you can mimic the behavior.
Teradata SQL Assistant will export to a CSV but it would not be appropriate to use with a table of that size.
BTEQ is another alternative that may be acceptable for a one-time dump if the table.
Do you have access to any of these?
It's actually possible to change the delimiter of exported text files within Teradata SQL Assistant, without needing any separate applications:
Go to Tools > Options > Export/Import. From there, you can change the Use this delimiter between column option from {Tab} to ','.
You might also want to set the 'Enclose column data in' option to 'Double Quote', so that any commas in the data itself don't upset the file structure.
From there, you use the regular text export: File > Export Results, run the query, and select one of the Delimited Text types.
Then you can just use your operating system to manually change the file extension from .txt to .csv.
These instructions are from SQL Assistant version 16.20.0.7.
I use the following code to export data from the Teradata Table into .csv file directly.
CREATE EXTERNAL TABLE
database_name.table_name (to be created) SAMEAS database_name.table_name (already existing, whose data is to be exported)
USING (DATAOBJECT ('C:\Data\file_name.csv')
DELIMITER '|' REMOTESOURCE 'ODBC');
You can use FastExport utility from Teradata Studio for exporting the table in CSV format. You can define the delimiter as well.
Very simple.
Basic idea would be to export first table as a TXT file and then converting TXT t o CSV using R...read.table ()---> write.csv().....
Below are the steps of exporting TD table as txt file:
Select export option from file
Select all records from the table you want to export
Save it as a TXT file
Then use R to convert TXT file to CSV (set working directory to the location where you have saved your big TXT file):
my_table<-read.table("File_name.txt", fill = TRUE, header = TRUE)
write.csv(my_table,file = "File_name.csv")
This had worked for 15 million records table. Hope it helps.

Resources