How to set a keyword to write fully to the CSV file - idl-programming-language

This script is working in so far that the output is correct. However it is not populating the CSV file for me. But only populating the last iteration of the loop. Being new to IDL, I need to grasp this concept of the keyword.
I believe I need a keyword, but my attempts of inserting this have all failed.
Can some amend the script so that the csv file populates fully please.
PRO Lat_Lon_Alt_Array
; This program is the extract the Latitute, Longigitude & Altitute
; with the Site name and file code.
; The purpose is to output the above dimensions from the station files
; into a csv file.
COMPILE_OPt IDL2
the_file_list = file_search('D:/Rwork/Project/25_Files/','*.nc')
FOR filein = 0, N_ElEMENTS (the_file_list)-1 DO BEGIN
station = NCDF_OPEN(the_file_list[filein])
NCDF_VARGET, station, 'station_name', St_Name
NCDF_VARGET, station, 'lat', latitude
NCDF_VARGET, station, 'lon', longitude
NCDF_VARGET, station, 'alt', height
latitude=REFORM(latitude,1)
longitude=REFORM(longitude,1)
height=REFORM(height,1)
Print,the_file_list[filein]
Print, 'name'
Print, St_Name
Print,'lat'
Print,latitude
Print,'lon'
print,longitude
Print,'alt'
Print,height
; Add each station data to the file
WRITE_CSV, 'LatLon.csv', the_file_list[filein],latitude,longitude,height
ENDFOR
RETURN
END

WRITE_CSV overwrites the file every time it is called, hence you only ever see the last entry.
Create arrays to hold all the values before the for loop:
n_files = N_ElEMENTS(the_file_list)
latitude_arr = DBLARR(n_files) ; Assuming type is double
longitude_arr = DBLARR(n_files)
height_arr = DBLARR(n_files)
In your for loop fill them with:
latitude_arr[filein] = latitude
longitude_arr[filein] = longitude
height_arr[filein] = height
Then after the for loop, write them with:
WRITE_CSV, 'LatLon.csv', the_file_list, latitude_arr, longitude_arr, height_arr

Related

Unable to combine specific lines in notepad++ files using nested for loops

I'm trying to compare portions of lines in two notepad++ files against each other using two variables(vg_line and sn_line)in order to combine them together if equal. Once it has found its pair it prints out certain information from each for loop, but it only finds the first pair and doesn't continue to loop through vg_lines file in order to compare other lines with sn_lines file.
input_file = open(input_VG_name)
input_Server_name = open(input_Server_name)
for line in input_file:
line_data = line.strip()
vg_line = line_data[0:44]
volume_group = line_data[44:58]
for line1 in input_Server_name:
line_data = line1.strip()
sn_line = line_data[0:44]
server_name = line_data[46:64]
if vg_line == sn_line:
print(vg_line, volume_group, server_name)
First post so any tips on what I can do better coding/asking questions is much appreciated!
You are not reading the files
Try the following:
input_file = r'c:\file.txt'
input_Server_name = r'c:\server_file.txt'
with open(input_file, 'r') as file:
for line in file.readlines():
line_data = line.strip()
vg_line = line_data[0:44]
volume_group = line_data[44:58]
with open(input_Server_name, 'r') as file1:
for line1 in file1.readlines():
line1_data = line1.strip()
sn_line = line1_data[0:44]
server_name = line1_data[46:64]
if vg_line == sn_line:
print(vg_line, volume_group, server_name)
The thing is: this code will have to read the second file for every line in the first file (which is what I got from your original code).
There are other methods two match to files up, have a search around, there are plenty of answers. Don't forget to check "Code Review" which has some good examples as well.

IndexError: list index out of range, scores.append( (fields[0], fields[1]))

I'm trying to read a file and put contents in a list. I have done this mnay times before and it has worked but this time it throws back the error "list index out of range".
the code is:
with open("File.txt") as f:
scores = []
for line in f:
fields = line.split()
scores.append( (fields[0], fields[1]))
print(scores)
The text file is in the format;
Alpha:[0, 1]
Bravo:[0, 0]
Charlie:[60, 8, 901]
Foxtrot:[0]
I cant see why it is giving me this problem. Is it because I have more than one value for each item? Or is it the fact that I have a colon in my text file?
How can I get around this problem?
Thanks
If I understand you well this code will print you desired result:
import re
with open("File.txt") as f:
# Let's make dictionary for scores {name:scores}.
scores = {}
# Define regular expressin to parse team name and team scores from line.
patternScore = '\[([^\]]+)\]'
patternName = '(.*):'
for line in f:
# Find value for team name and its scores.
fields = re.search(patternScore, line).groups()[0].split(', ')
name = re.search(patternName, line).groups()[0]
# Update dictionary with new value.
scores[name] = fields
# Print output first goes first element of keyValue in dict then goes keyName
for key in scores:
print (scores[key][0] + ':' + key)
You will recieve following output:
60:Charlie
0:Alpha
0:Bravo
0:Foxtrot

Julia dictionary "key not found" only when using loop

Still trying to figure out this problem (I was having problems building a dictionary, but managed to get that working thanks to rickhg12hs).
Here's my current code:
#open files with codon:amino acid pairs, initiate dictionary:
file = open(readall, "rna_codons.txt")
seq = open(readall, "rosalind_prot.txt")
codons = {"UAA" => "stop", "UGA" => "stop", "UAG" => "stop"}
#generate dictionary entries using pairs from file:
for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])\s", file)
codon, aa = m.captures
codons[codon] = aa
end
All of that code seems to work as intended. At this point, I have the dictionary I want, and the right keys point to the right entries. If I just do print(codons["AUG"]) for example, it prints 'M', which is the correct output. Now I want to scan through a string in the second file, and for every 3 letters, pull out the entry referenced in the dictionary and add it to the prot string. So I tried:
for m in eachmatch(r"([AUGC]{3,3})", seq)
amac = codons[m.captures]
prot = "$prot$amac"
end
But this kicks out the error key not found: ["AUG"]. I know the key exists, because I can print codons["AUG"] and it returns the proper entry, so why can't it find that key when it's in the loop?

How can I cut large csv files using any R packages like ff or data.table?

I want to cut large csv files (file size more than RAM size) and use them or save each in disk for later usage. Which R package is best for doing this for large files?
I haven't tried but using skip and nrows parameters in read.table or read.csv is worth a try. These are from ?read.table
skip integer: the number of lines of the data file to skip before
beginning to read data.
nrows integer: the maximum number of rows to read in. Negative and
other invalid values are ignored.
To avoid some troublesome issues at the end you need to do some error handling. In other words I don't know what happpens when skip value is greater than the number of rows in your big csv.
p.s. I also don't know whether header=TRUE is affecting skip or not, you also have to check that.
The answer given bu #berkorbay is OK and I can confirm that header can be used with skip. However, if your file is really large it gets painfully slow, as each subsequent reading after the first must skip over all previously read lines.
I had to do something similar and, after wasting quite a bit of time, I wrote a short script in PERL which fragments the original file in chuncks that you can read one after the other. It is much faster. I enclose the source here, translating some parts so that the intent is clear:
#!/usr/bin/perl
system("cls");
print("Fragment .csv file keeping header in each chunk\n") ;
print("\nEnter input file name = ") ;
$entrada = <STDIN> ;
print("\nEnter maximum number of lines in each fragment = ") ;
$nlineas = <STDIN> ;
print("\nEnter output file name stem = ") ;
$salida = <STDIN> ;
chop($salida) ;
open(IN,$entrada) || die "Cannot open input file: $!\n" ;
$cabecera = <IN> ;
$leidas = 0 ;
$fragmento = 1 ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
while(<IN>) {
if ($leidas > $nlineas) {
close(OUT) ;
$fragmento++ ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
$leidas = 0;
}
$leidas++ ;
print OUT $_ ;
}
close(OUT) ;
Just save with whatever name and execute. The first line might have to be changed if you have PERL in a diferent place (an, if you are on Windows, you migh have to invoke the script as "perl name-of-script").
One should have used read.csv.ffdf of ff package with specific parameters like this to read big file:
library(ff)
a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)
Once big file is read into a ff object, Subsetting ffobject into data frames can be done using:
a[1000:1000000,]
Rest of the code for subsetting and saving broken dataframes
totalrows = dim(a)[1]
row.size = as.integer(object.size(a[1:10000,])) / 10000 #in bytes
block.size = 200000000 #in bytes .IN Mbs 200 Mb
#rows.block is rows per block
rows.block = ceiling(block.size/row.size)
#nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
nmaps = floor(totalrows/rows.block)
for(i in (0:nmaps)){
if(i==nmaps){
df = a[(i*rows.block+1) : totalrows,]
}
else{
df = a[(i*rows.block+1) : ((i+1)*rows.block),]
}
#process df or save it
write.csv(df,paste0("M",i+1,".csv"))
#remove df
rm(df)
}
Alternatively you can first read the files into mysql using dbWriteTable and then use read.dbi.ffdf function from the ETLUtils package to read it back to R. Consider the function below;
read.csv.sql.ffdf <- function(file, name,overwrite = TRUE, header = TRUE, drv = MySQL(), dbname = "new", username = "root",host='localhost', password = "1234"){
conn = dbConnect(drv, user = username, password = password, host = host, dbname = dbname)
dbWriteTable(conn, name, file, header = header, overwrite = overwrite)
on.exit(dbRemoveTable(conn, name))
command = paste0("select * from ", name)
ret = read.dbi.ffdf(command, dbConnect.args = list(drv =drv, dbname = dbname, username = username, password = password))
return(ret)
}

R: Plot ARC/INFO Generate File

I have an ARC/INFO generate file whose contents look like:
3594 -124.049541 44.429077
-123.381222 44.530192
-123.479913 44.625517
-123.578917 44.720704
-123.678234 44.815755
-123.777866 44.910669
-123.946044 44.885032
-124.114074 44.858987
-124.281949 44.832529
-124.449663 44.805654
-124.516511 44.684660
-124.583091 44.563597
-124.649404 44.442465
-124.715451 44.321261
-124.615376 44.227772
-124.515601 44.134147
-124.416125 44.040385
-124.316948 43.946486
-124.151513 43.973082
-123.985926 43.999247
-123.820193 44.024987
-123.654322 44.050307
-123.586447 44.170362
-123.518307 44.290360
-123.449899 44.410303
-123.381222 44.530192
END
3595 -123.103772 45.009223
-122.427717 45.101578
-122.525757 45.198252
-122.624122 45.294789
-122.722814 45.391191
-122.821833 45.487459
-122.992014 45.464007
-123.162072 45.440175
-123.332002 45.415959
-123.501798 45.391355
-123.571234 45.271264
-123.640389 45.151121
-123.709266 45.030923
-123.777866 44.910669
-123.678234 44.815755
-123.578917 44.720704
-123.479913 44.625517
-123.381222 44.530192
-123.213811 44.554460
-123.046278 44.578334
-122.878629 44.601816
-122.710869 44.624913
-122.640504 44.744148
-122.569859 44.863337
-122.498931 44.982480
-122.427717 45.101578
END
3676 -122.989567 44.147495
-122.323040 44.238368
-122.419523 44.335217
-122.516322 44.431923
-122.613437 44.528488
-122.710869 44.624913
-122.878629 44.601816
-123.046278 44.578334
-123.213811 44.554460
-123.381222 44.530192
-123.449899 44.410303
-123.518307 44.290360
-123.586447 44.170362
-123.654322 44.050307
-123.556277 43.955264
-123.458534 43.860080
-123.361093 43.764751
-123.263953 43.669279
-123.098838 43.693189
-122.933613 43.716694
-122.768285 43.739802
-122.602857 43.762515
-122.533309 43.881546
-122.463492 44.000532
-122.393403 44.119472
-122.323040 44.238368
END
END
My strategy is to read in the file generating a list of latitude-longitude points and beginning a new unique group id every time I encounter an END. I'll then plot using ggplot" andgeom_polygon".
Alas, I'm not sure how to efficiently accomplish the reading of the file.
Any thoughts?
Read the spatial task view on CRAN and then use readOGR from the rgdal package to read into an sp class object. You'll need a GDAL/OGR install with ARCGEN format support, which despite being listed as 'compiled by default' Link I don't have on my system.
Failing that, open the file as a connection, read each line, build a Polygon, then Polygons and SpatialPolygons.
Here's a fairly sub-optimal but working function:
readUng <- function(f){
require(sp)
stream = file(f,"r")
first = readLines(stream,1)
bits = strsplit(first," ")[[1]]
polys = list();ids=NULL
while(TRUE){
id=bits[1] # label pt = bits[2],bits[3]
ids=c(ids,id)
coords=NULL
while(TRUE){
xy=readLines(stream,1)
if(xy=="END"){
break
}
coords=rbind(coords,strsplit(xy," ")[[1]])
}
polys[[length(polys)+1]] = Polygons(list(Polygon(matrix(as.numeric(coords[,2:3]),ncol=2))),ID=id)
lines = readLines(stream,1)
if(lines == "END"){
break
}
bits = strsplit(lines," ")[[1]]
}
return(SpatialPolygons(polys))
}
Now its a proper spatial data object, you can also give it a coordinate system (looks like lat-long to me, so epsg:4326, but only you know). Now you could modify all this to produce whatever ggplot wants, but if its spatial data then you should keep it as a spatial data class and ggplot should be made capable of dealing with such.

Resources