I'm trying to create a script to download a daily updated dataset. But the final file seems corrupted and is only 9~10kb while the original one is 95kb. Below is my code.
#Getting dataset from website daily
dir = getwd()
dash= "/"
date = Sys.Date()
date = gsub("-","",date)
filename = "CoronaDataSet"
extension = ".zip"
save = paste(dir,dash,date," - ",filename,extension, sep = "")
save
url = "https://www.kaggle.com/imdevskp/corona-virus-report/download"
download.file(url,save)
My console returns the following:
trying URL 'https://www.kaggle.com/imdevskp/corona-virus-report/download'
Content type 'text/html; charset=utf-8' length unknown
downloaded 8957 bytes
I also tried mode = "wb","a","ab". None downloaded the full size file.
Method = "auto","libcurl","curl" returns and invalid argument.
cannot open destfile 'C:/Users/Name/Documents/R/Course/Download file from url/20200323 - CoronaDataSet.zip', reason 'Invalid argument'
What may I be missing in my code? Any help is much appreciated.
Related
I am able to see every database in the .rdb file in the variable environment as a "promise" as per direction here. Now, I want to edit one of the file and save it. How can I do that? I am new in R.
In a discussion on r-pkg-devel, Ivan Krylov provided the following function ro read an RDB database:
# filename: the .rdb file
# offset, size: the pair of values from the .rdx
# type: 'gzip' if $compressed is TRUE, 'bzip2' for 2, 'xz' for 3
readRDB <- function(filename, offset, size, type = 'gzip') {
f <- file(filename, 'rb')
on.exit(close(f))
seek(f, offset + 4)
unserialize(memDecompress(readBin(f, 'raw', size - 4), type))
}
Therefore, you should be able to implement the reverse using a combination of serialize, memCompress, and writeBin.
Note that if the object changes size, you will also have to adjust the index file.
I try to search a string in multiple files, my code works fine but for big text files it takes a few minutes.
wrd = b'my_word'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'rb') as ofile:
#### loops through every line in the file comparing the strings ####
for line in ofile:
if wrd in line:
try:
sendMail(...)
logging.warning('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}' .format(e.strerror))
sys.exit(0)
I found this topic : Python finds a string in multiple files recursively and returns the file path
but it does not answer my question..
Do you have an idea how to make it faster ?
Am using python3.4 on windows server 2003.
Thx ;)
My files are generated from an oracle application and if there is an error, i log it and stop generation my files.
So i search my string by reading the files from the end, because the string am looking for is an Oracle error and is at the end of the files.
wrd = b'ORA-'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'r') as ofile:
try:
ofile.seek (0, 2) # Seek a end of file
fsize = ofile.tell() # Get Size
ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
lines = ofile.readlines() # Read to end
lines = lines[-10:] # Get last 10 lines
for line in lines:
if string in line:
sendMail(.....)
logging.error('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}'.format(e.strerror))
sys.exit(0)
"/D/data_DataAnalysis/Progrm/datset1/set2/genus/Huttenhower_LC8_genus_reported.tsv"
"/c/bioinfoTools/data/mock/test/truth/file_sets/genus/Huttenhower_LC8_TRUTH.txt"
I want "Huttenhower_LC8" from two file name using R.
Similer to the python code
fileName_temp = a_file.split("/")[-1]
filename = a_file.split("/")[-1][:-9]
for another_file in all_slim_files:
a_filename = another_file.split("/")[-1][:-18]
I want to create a dbf file to export a data frame to, I already tried:
write.dbf(MyDF,MyDF.dbf,factor2char = F)
but get the error code:
Error in write.dbf(MyDF, MyDF.dbf, factor2char = F) :
object 'MyDF.dbf' not found
I can tell why but I just can't find a solution.
The 2nd parameter should be a string. Try this:
write.dbf(MyDF,"MyDF.dbf",factor2char = F)
The error you are getting is that MyDF.dbf is being treated as a variable name and you haven't defined a variable with that name.
I'm trying to use RAmazonS3 to upload a local file to S3 storage, but I keep getting a broken pipe error.
require(RAmazonS3)
options(AmazonS3 = c('xxx' = "xxx")) #login and secret
setwd('[local directory]/reports') #set working directory to location of "polarity.png"
addFile("polarity.png", "umusergen", "destination.png",type="image/png",meta = c(foo = 123, author = "Duncan Temple Lang"))
Send failure: Broken Pipe
It works fine if I just try to upload content
addFile(I("This is a test"), "umusergen", "destination.png",type="text",meta = c(foo = 123, author = "Duncan Temple Lang"))
addFile() is quite simplistic and is focused on text content unless told otherwise.
Use
content = readBin("polarity.png", raw(), file.info("polarity.png")[1, "size"])
and
addFile(content, "umusergen", "destination.png", type = "image/png")
I'll update the addFile() function to allow one to indicate this is binary or content
or to use the (MIME) type.