R download.file downloading corrupted and smaller file - r

I'm trying to create a script to download a daily updated dataset. But the final file seems corrupted and is only 9~10kb while the original one is 95kb. Below is my code.
#Getting dataset from website daily
dir = getwd()
dash= "/"
date = Sys.Date()
date = gsub("-","",date)
filename = "CoronaDataSet"
extension = ".zip"
save = paste(dir,dash,date," - ",filename,extension, sep = "")
save
url = "https://www.kaggle.com/imdevskp/corona-virus-report/download"
download.file(url,save)
My console returns the following:
trying URL 'https://www.kaggle.com/imdevskp/corona-virus-report/download'
Content type 'text/html; charset=utf-8' length unknown
downloaded 8957 bytes
I also tried mode = "wb","a","ab". None downloaded the full size file.
Method = "auto","libcurl","curl" returns and invalid argument.
cannot open destfile 'C:/Users/Name/Documents/R/Course/Download file from url/20200323 - CoronaDataSet.zip', reason 'Invalid argument'
What may I be missing in my code? Any help is much appreciated.

Related

how to view, open and save a .rdb file in RStudio

I am able to see every database in the .rdb file in the variable environment as a "promise" as per direction here. Now, I want to edit one of the file and save it. How can I do that? I am new in R.
In a discussion on r-pkg-devel, Ivan Krylov provided the following function ro read an RDB database:
# filename: the .rdb file
# offset, size: the pair of values from the .rdx
# type: 'gzip' if $compressed is TRUE, 'bzip2' for 2, 'xz' for 3
readRDB <- function(filename, offset, size, type = 'gzip') {
f <- file(filename, 'rb')
on.exit(close(f))
seek(f, offset + 4)
unserialize(memDecompress(readBin(f, 'raw', size - 4), type))
}
Therefore, you should be able to implement the reverse using a combination of serialize, memCompress, and writeBin.
Note that if the object changes size, you will also have to adjust the index file.

Python, The fastest way to find string in multiple text files (some files are big)

I try to search a string in multiple files, my code works fine but for big text files it takes a few minutes.
wrd = b'my_word'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'rb') as ofile:
#### loops through every line in the file comparing the strings ####
for line in ofile:
if wrd in line:
try:
sendMail(...)
logging.warning('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}' .format(e.strerror))
sys.exit(0)
I found this topic : Python finds a string in multiple files recursively and returns the file path
but it does not answer my question..
Do you have an idea how to make it faster ?
Am using python3.4 on windows server 2003.
Thx ;)
My files are generated from an oracle application and if there is an error, i log it and stop generation my files.
So i search my string by reading the files from the end, because the string am looking for is an Oracle error and is at the end of the files.
wrd = b'ORA-'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'r') as ofile:
try:
ofile.seek (0, 2) # Seek a end of file
fsize = ofile.tell() # Get Size
ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
lines = ofile.readlines() # Read to end
lines = lines[-10:] # Get last 10 lines
for line in lines:
if string in line:
sendMail(.....)
logging.error('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}'.format(e.strerror))
sys.exit(0)

How can i fetch starting of file name from the path with different extensions using R

"/D/data_DataAnalysis/Progrm/datset1/set2/genus/Huttenhower_LC8_genus_reported.tsv"
"/c/bioinfoTools/data/mock/test/truth/file_sets/genus/Huttenhower_LC8_TRUTH.txt"
I want "Huttenhower_LC8" from two file name using R.
Similer to the python code
fileName_temp = a_file.split("/")[-1]
filename = a_file.split("/")[-1][:-9]
for another_file in all_slim_files:
a_filename = another_file.split("/")[-1][:-18]

R Write a dbf file

I want to create a dbf file to export a data frame to, I already tried:
write.dbf(MyDF,MyDF.dbf,factor2char = F)
but get the error code:
Error in write.dbf(MyDF, MyDF.dbf, factor2char = F) :
object 'MyDF.dbf' not found
I can tell why but I just can't find a solution.
The 2nd parameter should be a string. Try this:
write.dbf(MyDF,"MyDF.dbf",factor2char = F)
The error you are getting is that MyDF.dbf is being treated as a variable name and you haven't defined a variable with that name.

RAmazonS3 broken pipe when upload files

I'm trying to use RAmazonS3 to upload a local file to S3 storage, but I keep getting a broken pipe error.
require(RAmazonS3)
options(AmazonS3 = c('xxx' = "xxx")) #login and secret
setwd('[local directory]/reports') #set working directory to location of "polarity.png"
addFile("polarity.png", "umusergen", "destination.png",type="image/png",meta = c(foo = 123, author = "Duncan Temple Lang"))
Send failure: Broken Pipe
It works fine if I just try to upload content
addFile(I("This is a test"), "umusergen", "destination.png",type="text",meta = c(foo = 123, author = "Duncan Temple Lang"))
addFile() is quite simplistic and is focused on text content unless told otherwise.
Use
content = readBin("polarity.png", raw(), file.info("polarity.png")[1, "size"])
and
addFile(content, "umusergen", "destination.png", type = "image/png")
I'll update the addFile() function to allow one to indicate this is binary or content
or to use the (MIME) type.

Resources