Julia dictionary "key not found" only when using loop - dictionary

Still trying to figure out this problem (I was having problems building a dictionary, but managed to get that working thanks to rickhg12hs).
Here's my current code:
#open files with codon:amino acid pairs, initiate dictionary:
file = open(readall, "rna_codons.txt")
seq = open(readall, "rosalind_prot.txt")
codons = {"UAA" => "stop", "UGA" => "stop", "UAG" => "stop"}
#generate dictionary entries using pairs from file:
for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])\s", file)
codon, aa = m.captures
codons[codon] = aa
end
All of that code seems to work as intended. At this point, I have the dictionary I want, and the right keys point to the right entries. If I just do print(codons["AUG"]) for example, it prints 'M', which is the correct output. Now I want to scan through a string in the second file, and for every 3 letters, pull out the entry referenced in the dictionary and add it to the prot string. So I tried:
for m in eachmatch(r"([AUGC]{3,3})", seq)
amac = codons[m.captures]
prot = "$prot$amac"
end
But this kicks out the error key not found: ["AUG"]. I know the key exists, because I can print codons["AUG"] and it returns the proper entry, so why can't it find that key when it's in the loop?

Related

Airflow SqlToS3Operator has unwanted an index in the beginning

Recent airflow-providers-amazon has deprecated MySQLToS3Operator and introduced SqlToS3Operator and now it is adding an index column in the beginning of the CSV dump.
For example, if I run the following
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
)
The S3 file has something like this:
,created_at,score
1,2023-01-01,5
2,2023-01-02,6
The output seems to be a direct dump from Pandas. How can I remove this unwanted preceding index column?
The operator uses pandas DataFrame under the hood.
You should use pd_kwargs. It allows you to pass arguments to include in DataFrame .to_parquet(), .to_json() or .to_csv().
Since your output is csv the relevant pandas.DataFrame.to_csv parameters are:
header: bool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index: bool, default True
Write row names (index).
Thus you can do:
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
file_format="csv",
pd_kwargs={"index": False, "header": False},
)

The encryption won't decrypt

I was given an encrypted copy of the study guide here, but how do you decrypt and read it???
In a file called pa11.py write a method called decode(inputfile,outputfile). Decode should take two parameters - both of which are strings. The first should be the name of an encoded file (either helloworld.txt or superdupertopsecretstudyguide.txt or yet another file that I might use to test your code). The second should be the name of a file that you will use as an output file.
Your method should read in the contents of the inputfile and, using the scheme described in the hints.txt file above, decode the hidden message, writing to the outputfile as it goes (or all at once when it is done depending on what you decide to use).
The penny math lecture is here.
"""
Program: pennyMath.py
Author: CS 1510
Description: Calculates the penny math value of a string.
"""
# Get the input string
original = input("Enter a string to get its cost in penny math: ")
cost = 0
Go through each character in the input string
for char in original:
value = ord(char) #ord() gives us the encoded number!
if char>="a" and char<="z":
cost = cost+(value-96) #offset the value of ord by 96
elif char>="A" and char<="Z":
cost = cost+(value-64) #offset the value of ord by 64
print("The cost of",original,"is",cost)
Another hint: Don't forget about while loops...
Another hint: After letters -
skip ahead by their pennymath value positions + 2
After numbers - skip ahead by their number + 7 positions
After anything else - just skip ahead by 1 position
The issue I'm having in that I cant seem to get the coding right to decode the file it comes out looking the same. This is the current code I have been using. But once I try to decrypt the message it stays the same.
def pennycost(c):
if c >="a" and c <="z":
return ord(c)-96
elif c>="A" and c<="Z":
return ord(c)-64
def decryption(inputfile,outputfile):
with open(inputfile) as f:
fo = open(outputfile,"w")
count = 0
while True:
c = f.read(1)
if not c:
break;
if count > 0:
count = count -1;
continue
elif c.isalpha():
count = pennycost(c)
fo.write(c)
elif c.isdigit():
count = int(c)
fo.write(c)
else:
count = 6
fo.write(c)
fo.close()
inputfile = input("Please enter the input file name: ")
outputfile = input("Plese enter the output file name(EXISTING FILE WILL BE OVER WRITTEN!): ")
decryption(inputfile,outputfile)

How to set a keyword to write fully to the CSV file

This script is working in so far that the output is correct. However it is not populating the CSV file for me. But only populating the last iteration of the loop. Being new to IDL, I need to grasp this concept of the keyword.
I believe I need a keyword, but my attempts of inserting this have all failed.
Can some amend the script so that the csv file populates fully please.
PRO Lat_Lon_Alt_Array
; This program is the extract the Latitute, Longigitude & Altitute
; with the Site name and file code.
; The purpose is to output the above dimensions from the station files
; into a csv file.
COMPILE_OPt IDL2
the_file_list = file_search('D:/Rwork/Project/25_Files/','*.nc')
FOR filein = 0, N_ElEMENTS (the_file_list)-1 DO BEGIN
station = NCDF_OPEN(the_file_list[filein])
NCDF_VARGET, station, 'station_name', St_Name
NCDF_VARGET, station, 'lat', latitude
NCDF_VARGET, station, 'lon', longitude
NCDF_VARGET, station, 'alt', height
latitude=REFORM(latitude,1)
longitude=REFORM(longitude,1)
height=REFORM(height,1)
Print,the_file_list[filein]
Print, 'name'
Print, St_Name
Print,'lat'
Print,latitude
Print,'lon'
print,longitude
Print,'alt'
Print,height
; Add each station data to the file
WRITE_CSV, 'LatLon.csv', the_file_list[filein],latitude,longitude,height
ENDFOR
RETURN
END
WRITE_CSV overwrites the file every time it is called, hence you only ever see the last entry.
Create arrays to hold all the values before the for loop:
n_files = N_ElEMENTS(the_file_list)
latitude_arr = DBLARR(n_files) ; Assuming type is double
longitude_arr = DBLARR(n_files)
height_arr = DBLARR(n_files)
In your for loop fill them with:
latitude_arr[filein] = latitude
longitude_arr[filein] = longitude
height_arr[filein] = height
Then after the for loop, write them with:
WRITE_CSV, 'LatLon.csv', the_file_list, latitude_arr, longitude_arr, height_arr

String recognition in idl

I have the following strings:
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat
and from each I want to extract the three variables, 1. SWIR32 2. the date and 3. the text following the date. I want to automate this process for about 200 files, so individually selecting the locations won't exactly work for me.
so I want:
variable1=SWIR32
variable2=2005210
variable3=East_A
variable4=SWIR32
variable5=2005210
variable6=Froemke-Hoy
I am going to be using these to add titles to graphs later on, but since the position of the text in each string varies I am unsure how to do this using strmid
I think you want to use a combination of STRPOS and STRSPLIT. Something like the following:
s = ['F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat', $
'F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat']
name = STRARR(s.length)
date = name
txt = name
foreach sub, s, i do begin
sub = STRMID(sub, 1+STRPOS(sub, '\', /REVERSE_SEARCH))
parts = STRSPLIT(sub, '_', /EXTRACT)
name[i] = parts[0]
date[i] = parts[1]
txt[i] = STRJOIN(parts[2:*], '_')
endforeach
You could also do this with a regular expression (using just STRSPLIT) but regular expressions tend to be complicated and error prone.
Hope this helps!

IndexError: list index out of range, scores.append( (fields[0], fields[1]))

I'm trying to read a file and put contents in a list. I have done this mnay times before and it has worked but this time it throws back the error "list index out of range".
the code is:
with open("File.txt") as f:
scores = []
for line in f:
fields = line.split()
scores.append( (fields[0], fields[1]))
print(scores)
The text file is in the format;
Alpha:[0, 1]
Bravo:[0, 0]
Charlie:[60, 8, 901]
Foxtrot:[0]
I cant see why it is giving me this problem. Is it because I have more than one value for each item? Or is it the fact that I have a colon in my text file?
How can I get around this problem?
Thanks
If I understand you well this code will print you desired result:
import re
with open("File.txt") as f:
# Let's make dictionary for scores {name:scores}.
scores = {}
# Define regular expressin to parse team name and team scores from line.
patternScore = '\[([^\]]+)\]'
patternName = '(.*):'
for line in f:
# Find value for team name and its scores.
fields = re.search(patternScore, line).groups()[0].split(', ')
name = re.search(patternName, line).groups()[0]
# Update dictionary with new value.
scores[name] = fields
# Print output first goes first element of keyValue in dict then goes keyName
for key in scores:
print (scores[key][0] + ':' + key)
You will recieve following output:
60:Charlie
0:Alpha
0:Bravo
0:Foxtrot

Resources