Open CSV files from a directory with IDL - idl-programming-language

I am trying to open with a for loop 25 csv file one after the other in IDL
I have the following code:-
The_file_list=FILE_SEARCH('D:/MapsCharts/PairedStations/','*.csv',/FOLD_CASE)
FOR Filein = 0, N_ElEMENTS (The_file_list)-1 DO BEGIN
Print, Filein
OPENR,1,filein
temp=''
READF,1,temp
Station=STRMID(temp,1,13)
ENDFOR
The first line works but I cannot get the individual file data
Can somebody advise

In you code Filein is an integer from 0 to N_ELEMENTS(the_file_list) - 1, not a file. So when you do:
openr, 1, filein
that is trying to open a file named 0, 1, etc. You mean:
openr, 1, the_file_list[filein]

The Answer is as follows
The_NETCDF_File=FILE_SEARCH('D:/Rwork/The28000files/*')
The_NETCDF_CODE=STRMID(The_NETCDF_File,23,14)
; This is the CSV file containing the 25 neighbouring stations to$
; canadiate(pilot) stations
The_file_list=FILE_SEARCH('D:/MapsCharts/PairedStations/','*.csv',/FOLD_CASE)
FOR Filein = 0, N_ElEMENTS (The_file_list)-1 DO BEGIN
Current_file=the_file_list[filein]
My_File_Names = READ_CSV(Current_File)
; Pulling out only the station ID without the extension of the csv files
Station_Names=My_File_Names.field1

Related

How to append suffix to file names in write.csv() in R?

I have many data frames. I write them to csv, but I would not like to manually enter to each file the ending '_100' only to be able to specify it once and that each file would write with this ending
write.csv(results_SVM, file = "results_SVM.csv")
write.csv(results_ANN, file = "results_ANN.csv")
write.csv(results_RBF, file = "results_ANN.csv")
Get the same suffix for each file:
write.csv(results_SVM, file = "results_SVM_100.csv")
write.csv(results_ANN, file = "results_ANN_100.csv")
write.csv(results_RBF, file = "results_ANN_100.csv")
You can use paste in the filename:
#suf <- "" #nothing
suf <- "_100" #with _100
write.csv(results_SVM, file = paste0("results_SVM",suf,".csv"))
write.csv(results_ANN, file = paste0("results_ANN",suf,".csv"))
write.csv(results_RBF, file = paste0("results_ANN",suf,".csv"))

How to set a keyword to write fully to the CSV file

This script is working in so far that the output is correct. However it is not populating the CSV file for me. But only populating the last iteration of the loop. Being new to IDL, I need to grasp this concept of the keyword.
I believe I need a keyword, but my attempts of inserting this have all failed.
Can some amend the script so that the csv file populates fully please.
PRO Lat_Lon_Alt_Array
; This program is the extract the Latitute, Longigitude & Altitute
; with the Site name and file code.
; The purpose is to output the above dimensions from the station files
; into a csv file.
COMPILE_OPt IDL2
the_file_list = file_search('D:/Rwork/Project/25_Files/','*.nc')
FOR filein = 0, N_ElEMENTS (the_file_list)-1 DO BEGIN
station = NCDF_OPEN(the_file_list[filein])
NCDF_VARGET, station, 'station_name', St_Name
NCDF_VARGET, station, 'lat', latitude
NCDF_VARGET, station, 'lon', longitude
NCDF_VARGET, station, 'alt', height
latitude=REFORM(latitude,1)
longitude=REFORM(longitude,1)
height=REFORM(height,1)
Print,the_file_list[filein]
Print, 'name'
Print, St_Name
Print,'lat'
Print,latitude
Print,'lon'
print,longitude
Print,'alt'
Print,height
; Add each station data to the file
WRITE_CSV, 'LatLon.csv', the_file_list[filein],latitude,longitude,height
ENDFOR
RETURN
END
WRITE_CSV overwrites the file every time it is called, hence you only ever see the last entry.
Create arrays to hold all the values before the for loop:
n_files = N_ElEMENTS(the_file_list)
latitude_arr = DBLARR(n_files) ; Assuming type is double
longitude_arr = DBLARR(n_files)
height_arr = DBLARR(n_files)
In your for loop fill them with:
latitude_arr[filein] = latitude
longitude_arr[filein] = longitude
height_arr[filein] = height
Then after the for loop, write them with:
WRITE_CSV, 'LatLon.csv', the_file_list, latitude_arr, longitude_arr, height_arr

Unable to combine specific lines in notepad++ files using nested for loops

I'm trying to compare portions of lines in two notepad++ files against each other using two variables(vg_line and sn_line)in order to combine them together if equal. Once it has found its pair it prints out certain information from each for loop, but it only finds the first pair and doesn't continue to loop through vg_lines file in order to compare other lines with sn_lines file.
input_file = open(input_VG_name)
input_Server_name = open(input_Server_name)
for line in input_file:
line_data = line.strip()
vg_line = line_data[0:44]
volume_group = line_data[44:58]
for line1 in input_Server_name:
line_data = line1.strip()
sn_line = line_data[0:44]
server_name = line_data[46:64]
if vg_line == sn_line:
print(vg_line, volume_group, server_name)
First post so any tips on what I can do better coding/asking questions is much appreciated!
You are not reading the files
Try the following:
input_file = r'c:\file.txt'
input_Server_name = r'c:\server_file.txt'
with open(input_file, 'r') as file:
for line in file.readlines():
line_data = line.strip()
vg_line = line_data[0:44]
volume_group = line_data[44:58]
with open(input_Server_name, 'r') as file1:
for line1 in file1.readlines():
line1_data = line1.strip()
sn_line = line1_data[0:44]
server_name = line1_data[46:64]
if vg_line == sn_line:
print(vg_line, volume_group, server_name)
The thing is: this code will have to read the second file for every line in the first file (which is what I got from your original code).
There are other methods two match to files up, have a search around, there are plenty of answers. Don't forget to check "Code Review" which has some good examples as well.

convert date column in a text file to float

I have a data file with the 2nd column in the file being dates in the format '01/01/2007'. I am trying to convert this column into number format so that I can insert the data in the textfile into a mysql database. I keep getting these errors when I try to do so:
Traceback (most recent call last):
File "C:/Python27/numpy", line 5, in <module>
x = np.loadtxt(fname='xyz.txt', dtype=[('date', 'str', 12),('x','float')], converters={1:datestr2num}, delimiter=None, skiprows=0, usecols=None);
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 713, in loadtxt
X.append(tuple([conv(val) for (conv, val) in zip(converters, vals)]))
File "C:/Python27/numpy", line 4, in datestr2num
return datetime.datetime.strptime(s,'"%m/%d/%y"')
File "C:\Python27\lib\_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '"01/01/2007"' does not match format '"%m/%d/%y"'
Can anyone please help me with this? Here is the code I am trying:
import numpy as np
import datetime
def datestr2num(s):
return datetime.datetime.strptime(s,'"%m/%d/%y"')
x = np.loadtxt(fname='xyz.txt', dtype= 'float', converters={1:datestr2num}, delimiter=None, skiprows=0, usecols=None);
print x;
'%y' is for two-digit years (e.g. '14'); you have four-digit years (e.g. '2014') so should be using '%Y' - see the documentation.

How can I cut large csv files using any R packages like ff or data.table?

I want to cut large csv files (file size more than RAM size) and use them or save each in disk for later usage. Which R package is best for doing this for large files?
I haven't tried but using skip and nrows parameters in read.table or read.csv is worth a try. These are from ?read.table
skip integer: the number of lines of the data file to skip before
beginning to read data.
nrows integer: the maximum number of rows to read in. Negative and
other invalid values are ignored.
To avoid some troublesome issues at the end you need to do some error handling. In other words I don't know what happpens when skip value is greater than the number of rows in your big csv.
p.s. I also don't know whether header=TRUE is affecting skip or not, you also have to check that.
The answer given bu #berkorbay is OK and I can confirm that header can be used with skip. However, if your file is really large it gets painfully slow, as each subsequent reading after the first must skip over all previously read lines.
I had to do something similar and, after wasting quite a bit of time, I wrote a short script in PERL which fragments the original file in chuncks that you can read one after the other. It is much faster. I enclose the source here, translating some parts so that the intent is clear:
#!/usr/bin/perl
system("cls");
print("Fragment .csv file keeping header in each chunk\n") ;
print("\nEnter input file name = ") ;
$entrada = <STDIN> ;
print("\nEnter maximum number of lines in each fragment = ") ;
$nlineas = <STDIN> ;
print("\nEnter output file name stem = ") ;
$salida = <STDIN> ;
chop($salida) ;
open(IN,$entrada) || die "Cannot open input file: $!\n" ;
$cabecera = <IN> ;
$leidas = 0 ;
$fragmento = 1 ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
while(<IN>) {
if ($leidas > $nlineas) {
close(OUT) ;
$fragmento++ ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
$leidas = 0;
}
$leidas++ ;
print OUT $_ ;
}
close(OUT) ;
Just save with whatever name and execute. The first line might have to be changed if you have PERL in a diferent place (an, if you are on Windows, you migh have to invoke the script as "perl name-of-script").
One should have used read.csv.ffdf of ff package with specific parameters like this to read big file:
library(ff)
a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)
Once big file is read into a ff object, Subsetting ffobject into data frames can be done using:
a[1000:1000000,]
Rest of the code for subsetting and saving broken dataframes
totalrows = dim(a)[1]
row.size = as.integer(object.size(a[1:10000,])) / 10000 #in bytes
block.size = 200000000 #in bytes .IN Mbs 200 Mb
#rows.block is rows per block
rows.block = ceiling(block.size/row.size)
#nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
nmaps = floor(totalrows/rows.block)
for(i in (0:nmaps)){
if(i==nmaps){
df = a[(i*rows.block+1) : totalrows,]
}
else{
df = a[(i*rows.block+1) : ((i+1)*rows.block),]
}
#process df or save it
write.csv(df,paste0("M",i+1,".csv"))
#remove df
rm(df)
}
Alternatively you can first read the files into mysql using dbWriteTable and then use read.dbi.ffdf function from the ETLUtils package to read it back to R. Consider the function below;
read.csv.sql.ffdf <- function(file, name,overwrite = TRUE, header = TRUE, drv = MySQL(), dbname = "new", username = "root",host='localhost', password = "1234"){
conn = dbConnect(drv, user = username, password = password, host = host, dbname = dbname)
dbWriteTable(conn, name, file, header = header, overwrite = overwrite)
on.exit(dbRemoveTable(conn, name))
command = paste0("select * from ", name)
ret = read.dbi.ffdf(command, dbConnect.args = list(drv =drv, dbname = dbname, username = username, password = password))
return(ret)
}

Resources