Extract Zip File with 100% Compression Ratio

Extract Zip File with 100% Compression Ratio - r

I noticed this problem when trying to run the following R script.
library(downloader)
download('http://download.cms.gov/nppes/NPPES_Data_Dissemination_Feb_2016.zip',
dest = 'dataset.zip', mode = 'wb')
npi <- read.csv(unz('dataset.zip', 'npidata_20050523-20160207.csv'),
as.is = TRUE)
The script kept spinning for some reason so I manually downloaded the data and noticed the compression ratio was 100%.
I am not certain if StackOverflow is the best Exchange for this question, so I am open to moving this question is another Exchange is suggested. The Open Data Exchange might be appropriate, but there isn't very much activity on that site.
My question is this: I work a lot with government curated data from Centers for Medicare and Medicaid Services (CMS). The data downloads from this site are in the form of zip files and occasionally, they have zip ratios of 100%. This is clearly impossible since the uncompressed size is ~800PB. (CMS notes on their site that they estimate the uncompressed size to be ~4GB.) This has affected me on my work computer; I have replicated this problem with co-worker's computer as well as my own personal computer.
One example can be found here. (Click the link and then click on NPPES Data Dissemination). There are other examples I've noticed and I've emailed CMS about this. They respond that the files are large and can't be handled with Excel. I am aware of this and this isn't really the problem I'm facing.
Does any one know why this would be happening and how I can fix it?

Per cdetermans point, what is the available system memory you have available for R to execute the uncompressing and subsequent loading of the data? Looking at both the image you posted, and the link to the actual data, which reads as ~560mb compressed, it did not pose a problem on my system ( Win 10, 16 GB, Core i7, R v.3.2.3) to download, uncompress, read the uncompressed CSV into a table.
I would recommend - if nothing else works - to decouple your uncompressing and data loading steps. Might even go as far as invoking (depending on your OS) a R system command to decompress your data, manually inspect, and then separately issue piecewise read.tables on the dataset.
Best of luck
rudycazabon

Related

readxl memory leak in R on Linux RStudio Server

I was working on something relatively simple: I have three files that weigh ~150MB each, about 240k rows and 145 columns each, and wanted to join them. The thing is, when I open the first file with readxl::read_excel, it suddenly requires 10GB of memory just to open the file, making it impossible for me to open the three files (was barely able to open the first one after several tries and reinstalling readxl), even though when this file is read, the dataframe object weighs 287MB as per object_size().
I'm a bit baffled as to why R is needing so much RAM to open my file. Any ideas on what could be happening? Something I might be missing? Any less memory intensive alternatives?
As extra information, when I opened the file I saw it has filters enabled and some table formatting from Excel.
Thank you very much

Merging/concatenating video and keeping sound in R using AV package

I am trying to merge/concatenate multiple videos with sound sequentially into one video using only R, (I don't want to work with ffmpeg in the command line as the rest of the project doesn't require it and I would rather not bring it in at this stage).
My code looks like the following:
dir<-"C:/Users/Admin/Documents/r_programs/"
videos<-c(
paste0(dir,"video_1.mp4"),
paste0(dir,"video_2.mp4"),
paste0(dir,"video_3.mp4")
)
#encoding
av_encode_video(
videos,
output=paste0(dir,"output.mp4"),
framerate=30,
vfilter="null",
codec="libx264rgb",
audio=videos,
verbose=TRUE
)
It almost works, the output file is an mp4 containing the 3 videos sequentially one after the other, but the audio present is only from the first of the 3 video and then it cuts off.
It doesn't really matter what the videos are. I have recreated this issue with the videos I was using or even 3 randomly downloaded 1080p 30fps videos from YouTube.
Any help is appreciated & thank you in advance.

The experienced behavior (only 1 audio source) is exactly how it is designed to do. In the C source code, you can identify that encode_video only takes the first audio entry and ignores the rest. Overall, audio is poorly supported by ropensci/av atm as its primary focus is to turn R plots into videos. Perhaps, you can file a feature request issue on GitHub.
Meanwhile, why not just use base.system function to call FFmpeg from R? This will likely speed up your process significantly assuming the videos have identical format by using concat demuxer + stream-copy (-c copy). The av library does not support this feature as far as I can tell. (If formats differ, you need to use the concat filter which is also explained in the link above.)

Why won't Autocad print georefenced jpeg that is reduced?

So, on my projects I require an aerial photo of a site. I usually use ones in public record. I use the USGS high res ortho photos located here https://earthexplorer.usgs.gov/. I have them uploaded to my server and they are TIFs and have TFWs and XMLs associated with them (I am unsure of what the xml is for). I can load these into autocad fine, and print them just fine. average file size of these appears to be in the 250,000 kb range.
On some of my projects, I need more detail. I get privately flown aerial photos of a site. These come as a JPG format and are georefernced by a .jgw. These files are about 25000 kb depending on the site ( I did not notice this at first, as i was told they are very large relative to the TIFs). When these are loaded into autocad and i try to plot, the whole system freezed and wont plot for about 15-20 mins. At first I thought this was a file size issue. So I did the following in R, to try to reduce the size. My code is as follows.
library(jpeg)
library(tiff)
img <- readJPEG("ortho.jpg", native = TRUE)
writeJPEG(img, quality = 0.2)'
This got the file size down to about 9000 kb. I loaded this into autocad and it still would not plot. This leads me to assume that size is not the issue. With this is mind what could be a property of this photo that would freeze autocad? How could I fix those properties in R or in Autocad.

First I would check out the first and third causes listed here: https://knowledge.autodesk.com/support/autocad/troubleshooting/caas/sfdcarticles/sfdcarticles/Some-OLE-objects-do-not-plot.html and see if that fixes your issue.
Second I would convert to a png (in my limited experience those seem to be the most stable in autocad.)
library(png)
writePNG(img)
If you really need it in jpg I would try the solutions here too: https://www.landfx.com/kb/autocad-fxcad/images/item/1926-raster-disappear.html

Generating Excel file with XLConnect-Removed Feature: Format from /xl/styles.xml part (Styles)

I am using XLConnect in R for the purpose of daily report generation. I have a program that runs automatically at specific time to append the data for most recent date daily into an excel file (Excel 2007). The program works fine to do this task. But, sometimes when i open the excel file it says that "Excel found unreadable content. do you want to recover the content of this workbook?"
The best part of this issue is that i can't reproduce this issue again to know the exact root cause for the problem. It arises in a random manner. Because, when i try to run the program again it works fine. Can somebody help me to identify the root cause?

Reading and writing binary content in R

I have to download binary files stored in PostgreSQL database as bytea and then work with them in R. I use DBI library to download the content
data <- dbGetQuery("select binary_content from some_table limit 1", connection)
next I have to work with this content. The problem is that even after reviewing SO threads (e.g. this one), PostgreSQL documentation, R documentation for several functions (writeBin, charToRaw, as.raw etc.), multiple web pages and intensive Googling I am unable to find any hints how it can be done in R. What I want to do is to (1) download the content, (2) save it locally as individual files, (3) work with the files. No matter what I do R always saves the content as it was a long gibberish character string.
Unfortunately I am unable to provide reproducible example since I cannot share the data I am using.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex