Average hours of each month with files located in multiple folders in Python - directory

Background:
I am working with .nc files which I downloaded in mass using a package called goes2go. The way the data is downloaded is that, for example, if I want data for January 2018, then it created a folder called 2018. Within that folder are 31 folders representing the days, and within each day folder are 24 folders representing the hours. When I click on an hour folder, there are 6 files.
Question:
I want to average all the hours for the month of January and eventually do it for each month for the years of 2018-2020. I am incredibly stuck and have no clue how to begin especially since the data is all in different folders within each folder, but here is what I have so far:
import os
from glob import glob
import netCDF4 as nc
from netCDF4 import Dataset
import xarray
#set directory
t= os.chdir("C:/Users/elain/data/noaa-goes16/ABI-L2-ACMC")
print("Current directory", os.getcwd())
data_path = os.path.join("2018")
print(data_path)
stack_bands_path = glob(os.path.join(data_path))
print(stack_bands_path)
Thank you so much.

Related

find order or hyerarchy in directory containing python code files

I have been given a directory containing 68 files, each contain Python code for doing some task, for a total of about 10,000 lines.
Most of these files import code from "system" locations, typically in lines like:
from datetime import datetime,date,time,timedelta
many import code from this same directory, for example:
from sendGeneralCallMail import sendGeneralCallMail
In this directory I actually find file sendGeneralCallMail.py, and this local importing can have more than one level.
What I would like to do is find which files in this directory import other files in the same directory, but are NOT likewise imported (call them the upper level of my hyerarchy, or my "main" programs if you want), which files do NOT import other local files (the lowest level, or the functions, or tools), and which are in between, you get the idea. A graph of this importing scheme would be the ideal thing.
Do you know of some tool for doing this? Obviously, I have NO documentation of any sort. Some years ago I had a similar problem with a directory of C++ classes in a complex inheritance structure, and I found such a documenting tool but I do not remember its name.
Thank you for any help!

Can R create a new folder with different file names everyday

Could anyone suggest, Can I create a folder through R and then create new files every day through R?
What I am trying to do is I have to update an excel file that will backup a column in a database every day. I want to place this R file in the Task scheduler which should run the file at a particular time and create a folder through R and file with that day's date on it and do the extraction.
Presently I manually run the R script and do the data extraction from that table.

xarray.open_dataset will not open all netcdf files in one folder

I have a folder containing many netcdf files for individual station data (about 8,000 files total). When I attempt to read in all of these files using xarray.open_mfdataset, I get an error that reads "OSError: no files to open." I am using the code below:
station_data = xr.open_mfdataset('~/Data/station_data/all_stations/*.nc')
In contrast, I do not have any issue opening individual station data (one file) using xarray.open_dataset, as below:
station1 = xr.open_dataset('~/Data/station_data/all_stations/hadisd.3.1.0.2019f_19310101-20200101_010010-99999.nc')
I have been playing with other ways to express the path, with no luck. Any suggestions on how to properly read in all station data at once are appreciated!

R import excel file following some pattern

I am trying to import a excel file from local. assume the now is Jun 2018, the file name is like this:
20180620 data 201803.xlsx
so you can see first part is the date they create the file, and the file is for the data of 201803, which is at the end of file name.
and sometimes there are multiple files in Jun 2018 created for 201803, and I always want the latest file. e.g. 20180620 data 201803.xlsx vs 20180614 data 201803.xlsx,I want 20180620 data 201803.xlsx
currently this is my code, but I dont know how to always pick the latest date file. Any idea how to do that? Thank you!
list = list.files(path = folder, pattern = paste0(substr(today,1,4),substr(today,6,7),".xlsx$"))
I actually have something that does just that. Here's the solution I'm using:
files<-list.files(pattern=".xlsx")
file<-files[file.info(files)$ctime==max(file.info(files)$ctime)]
data<-readxl::read_excel(file)
This assumes that the files in the directory will only be in the format you suggested. Otherwise you might want to put in some pattern.

r-unzip slows down when reading more than 50 000 csv's in zip file

I'm trying to unzip a 7zip file with the unzip function. The zip file contains around 155 000 csv files inside. When it starts unzipping the first 25 000 csv files are read within 5 minutes and then it starts to slow down and reads all files in almost an hour. Is this common behavior.
I know not providing a reproducible example makes everyones life more difficult but I cannot share the data with which I'm working. I'm just interested whether there is another way to extract files from zip with r, tweak the function or maybe call outside program or something?
Extracting with 7zip outside of R takes 20 mins tops, which is why I believe this to be an R related issue

Resources