How can I change ascii string to hex and vice versa in python 3.7? - data-conversion

I look some solution in this site but those not works in python 3.7.
So, I asked a new question.
Hex string of "the" is "746865"
I want to a solution to convert "the" to "746865" and "746865" to "the"

Given that your string contains ascii only (each char is in range 0-0xff), you can use the following snippet:
In [28]: s = '746865'
In [29]: import math
In [30]: int(s, base=16).to_bytes(math.ceil(len(s) / 2), byteorder='big').decode('ascii')
Out[30]: 'the'
Firstly you need to convert a string into integer with base of 16, then convert it to bytes (assuming 2 chars per byte) and then convert bytes back to string using decode

#!/usr/bin/python3
"""
Program name: txt_to_ASC.py
The program transfers
a string of letters -> the corresponding
string of hexadecimal ASCII-codes,
eg. the -> 746865
Only letters in [abc...xyzABC...XYZ] should be input.
"""
print("Transfer letters to hex ASCII-codes")
print("Input range is [abc...xyzABC...XYZ].")
print()
string = input("Input set of letters, eg. the: ")
print("hex ASCII-code: " + " "*15, end = "")
def str_to_hasc(x):
global glo
byt = bytes(x, 'utf-8')
bythex = byt.hex()
for b1 in bythex:
y = print(b1, end = "")
glo = str(y)
return glo
str_to_hasc(string)

If you have a byte string, then:
>>> import binascii
>>> binascii.hexlify(b'the')
b'746865'
If you have a Unicode string, you can encode it:
>>> s = 'the'
>>> binascii.hexlify(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'str'
>>> binascii.hexlify(s.encode())
b'746865'
The result is a byte string, you can decode it to get a Unicode string:
>>> binascii.hexlify(s.encode()).decode()
'746865'
The reverse, of course, is:
>>> binascii.unhexlify(b'746865')
b'the'

#!/usr/bin/python3
"""
Program name: ASC_to_txt.py
The program's input is a string of hexadecimal digits.
The string is a bytes object, and each byte is supposed to be
the hex ASCII-code of a (capital or small) letter.
The program's output is the string of the corresponding letters.
Example
Input: 746865
First subresult: ['7','4','6','8','6','5']
Second subresult: ['0x74', '0x68', '0x65']
Third subresult: [116, 104, 101]
Final result: the
References
Contribution by alhelal to stackoverflow.com (20180901)
Contribution by QintenG to stackoverflow.com (20170104)
Mark Pilgrim, Dive into Python 3, section 4.6
"""
import string
print("The program converts a string of hex ASCII-codes")
print("into the corresponding string of letters.")
print("Input range is [41, 42, ..., 5a] U [61, 62, ..., 7a]. \n")
x = input("Input the hex ASCII-codes, eg. 746865: ")
result_1 = []
for i in range(0,len(x)//2):
for j in range(0,2):
result_1.extend(x[2*i+j])
# First subresult
lenres_1 = len(result_1)
result_2 = []
for i in range(0,len(result_1) - 1,2):
temp = ""
temp = temp + "0x" + result_1[i] #0, 2, 4
temp = temp + result_1[i + 1] #1, 3, 5
result_2.append(temp)
# Second subresult
result_3 = []
for i in range(0,len(result_2)):
result_3.append(int(result_2[i],16))
# Third subresult
by = bytes(result_3)
result_4 = by.decode('utf-8')
# Final result
print("Corresponding string of letters:" + " "*6, result_4, end = "\n")

Related

Hex string to base58

does anyone know any package that support the following conversion of base58 to hex string or the other way round from hex string to base58 encoding.
below is an example of a python implementation.
https://www.reddit.com/r/Tronix/comments/ja8khn/convert_my_address/
this hex string <- "4116cecf977b1ecc53eed37ee48c0ee58bcddbea5e"
should result in this : "TC3ockcvHNmt7uJ8f5k3be1QrZtMzE8MxK"
here is a link to be used for verification: https://tronscan.org/#/tools/tron-convert-tool
I was looking for it and I was able to design functions that produce the desired result.
import base58
def hex_to_base58(hex_string):
if hex_string[:2] in ["0x", "0X"]:
hex_string = "41" + hex_string[2:]
bytes_str = bytes.fromhex(hex_string)
base58_str = base58.b58encode_check(bytes_str)
return base58_str.decode("UTF-8")
def base58_to_hex(base58_string):
asc_string = base58.b58decode_check(base58_string)
return asc_string.hex().upper()
They are useful if you want to convert the public key (hex) of the transactions to the addresses of the wallets (base58).
public_key_hex = "0x4ab99740bdf786204e57c00677cf5bf8ee766476"
address = hex_to_base58(public_key_hex)
print(address)
# TGnKLLBQyCo6QF911j65ipBz5araDSYQAD

Need help understanding how gsub and tonumber are used to encode lua source code?

I'm new to LUA but figured out that gsub is a global substitution function and tonumber is a converter function. What I don't understand is how the two functions are used together to produce an encoded string.
I've already tried reading parts of PIL (Programming in Lua) and the reference manual but still, am a bit confused.
local L0_0, L1_1
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
encodes = L0_0
L0_0 = gg
L0_0 = L0_0.toast
L1_1 = "__loading__\226\128\166"
L0_0(L1_1)
L0_0 = encodes
L1_1 = --"The Encoded String"
L0_0 = L0_0(L1_1)
L1_1 = load
L1_1 = L1_1(L0_0)
pcall(L1_1)
I removed the encoded string where I put the comment because of how long it was. If needed I can upload the encoded string as well.
gsub is being used to get 2 digit sections of A0_2. This means the string A0_3 is a 2 digit hexadecimal number but it is not in a number format so we cannot preform math on the value. A0_3 being a hex number can be inferred based on how tonubmer is used.
tonumber from Lua 5.1 Reference Manual:
Tries to convert its argument to a number. If the argument is already a number or a string convertible to a number, then tonumber returns this number; otherwise, it returns nil.
An optional argument specifies the base to interpret the numeral. The base may be any integer between 2 and 36, inclusive. In bases above 10, the letter 'A' (in either upper or lower case) represents 10, 'B' represents 11, and so forth, with 'Z' representing 35. In base 10 (the default), the number can have a decimal part, as well as an optional exponent part (see ยง2.1). In other bases, only unsigned integers are accepted.
So tonumber(A0_3, 16) means we are expecting for A0_3 to be a base 16 number (hexadecimal).
Once we have the number value of A0_3 we do some math and finally convert it to a character.
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
This block of code takes a string of hex digits and converts them into chars. tonumber is being used to allow for the manipulation of the values.
Here is an example of how this works with Hello World:
local str = "Hello World"
local hex_str = ''
for i = 1, #str do
hex_string = hex_string .. string.format("%x", str:byte(i,i))
end
function L0_0(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 256 - 13 + 255999744) % 256)
end))
end
local encoded = L0_0(hex_str)
print(encoded)
Output
;X__bJbe_W
And taking it back to the orginal string:
function decode(A0_2)
return (A0_2:gsub("..", function(A0_3)
return string.char((tonumber(A0_3, 16) + 13) % 256)
end))
end
hex_string = ''
for i = 1, #encoded do
hex_string = hex_string .. string.format("%x", encoded:byte(i,i))
end
print(decode(hex_string))

Python: Type Error unsupported operand type(s) for +: 'int' and 'str'

I'm really stuck, I am currently reading Python - How to automate the boring stuff and I am doing one of the practice projects.
Why is it flagging an error? I know it's to do with the item_total.
import sys
stuff = {'Arrows':'12',
'Gold Coins':'42',
'Rope':'1',
'Torches':'6',
'Dagger':'1', }
def displayInventory(inventory):
print("Inventory:")
item_total = sum(stuff.values())
for k, v in inventory.items():
print(v + ' ' + k)
a = sum(stuff.values())
print("Total number of items: " + item_total)
displayInventory(stuff)
error i get:
Traceback (most recent call last):
File "C:/Users/Lewis/Dropbox/Python/Function displayInventory p120 v2.py", line 17, in
displayInventory(stuff)
File "C:/Users/Lewis/Dropbox/Python/Function displayInventory p120 v2.py", line 11, in displayInventory
item_total = int(sum(stuff.values()))
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Your dictionary values are all strings:
stuff = {'Arrows':'12',
'Gold Coins':'42',
'Rope':'1',
'Torches':'6',
'Dagger':'1', }
yet you try to sum these strings:
item_total = sum(stuff.values())
sum() uses a starting value, an integer, of 0, so it is trying to use 0 + '12', and that's not a valid operation in Python:
>>> 0 + '12'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
You'll have to convert all your values to integers; either to begin with, or when summing:
item_total = sum(map(int, stuff.values()))
You don't really need those values to be strings, so the better solution is to make the values integers:
stuff = {
'Arrows': 12,
'Gold Coins': 42,
'Rope': 1,
'Torches': 6,
'Dagger': 1,
}
and then adjust your inventory loop to convert those to strings when printing:
for k, v in inventory.items():
print(v + ' ' + str(k))
or better still:
for item, value in inventory.items():
print('{:12s} {:2d}'.format(item, value))
to produce aligned numbers with string formatting.
You are trying to sum a bunch of strings, which won't work. You need to convert the strings to numbers before you try to sum them:
item_total = sum(map(int, stuff.values()))
Alternatively, declare your values as integers to begin with instead of strings.
This error produce because you was trying sum('12', '42', ..), so you need to convert every elm to int
item_total = sum(stuff.values())
By
item_total = sum([int(k) for k in stuff.values()])
The error lies in line
a = sum(stuff.values())
and
item_total = sum(stuff.values())
The value type in your dictionary is str and not int, remember that + is concatenation between strings and addition operator between integers. Find the code below, should solve the error, converting it into int array from a str array is done by mapping
import sys
stuff = {'Arrows':'12',
'Gold Coins':'42',
'Rope':'1',
'Torches':'6',
'Dagger':'1'}
def displayInventory(inventory):
print("Inventory:")
item_total = (stuff.values())
item_total = sum(map(int, item_total))
for k, v in inventory.items():
print(v + ' ' + k)
print("Total number of items: " + str(item_total))
displayInventory(stuff)

Encode a list of ids in an elegant id

I am trying to find the best way to encode a list of 10 integers (approximately between 1 and 10000) into a single id (I need a one-to-one function between a list of integers to a single integer or string).
I tried base64 (which is good because it is reversible), but the result is much longer than the input and that's quite bad.
For example, if I want to encode 12-54-235-1223-21-765-43-763-9522-908,
base64 gives me MTItNTQtMjM1LTEyMjMtMjEtNzY1LTQzLTc2My05NTIyLTkwOA==
Hashing functions are bad because I can't recover the input easily.
Maybe I could use the fact that I have only numbers as input and use number theory facts, someone has an idea?
If the integers are guaranteed to be smaller than 10^9, you could encode them as:
[number of digits in 1st number][1st number][number of digits in 2nd number][2nd number][...]
So 12,54,235,1223,21,765,43,763,9522,908 yields 21225432354122322137652433763495223908.
Sample Python implementation:
def numDigits(x):
if x < 10:
return 1
return 1 + numDigits(x/10)
def encode(nums):
ret = ""
for number in nums:
ret = ret + str(numDigits(number)) + str(number)
return ret
def decode(id):
nums = []
while id != "":
numDigits = int(id[0])
id = id[1:] #remove first char from id
number = int(id[:numDigits])
nums.append(number)
id = id[numDigits:] #remove first number from id
return nums
nums = [12,54,235,1223,21,765,43,763,9522,908]
id = encode(nums)
decodedNums = decode(id)
print id
print decodedNums
Result:
21225432354122322137652433763495223908
[12, 54, 235, 1223, 21, 765, 43, 763, 9522, 908]

alignment of sequences

I want to do pairwise alignment with uniprot and pdb sequences. I have an input file containing uniprot and pdb IDs like this.
pdb id uniprot id
1dbh Q07889
1e43 P00692
1f1s Q53591
first, I need to read each line in an input file
2) retrieve the pdb and uniprot sequences from pdb.fasta and uniprot.fasta files
3) Do alignment and calculate sequence identity.
Usually, I use the following program for pairwise alignment and seq.identity calculation.
library("seqinr")
seq1 <- "MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR"
seq2<- "MKGQQKTAETEEGTVQIQEGAVATGEDPTSVAIASIQSAATFPDPNVKYVFRTENGGQVM"
library(Biostrings)
globalAlign<- pairwiseAlignment(seq1, seq2)
pid(globalAlign, type = "PID3")
I need to print the output like this
pdbid uniprotid seq.identity
1dbh Q07889 99
1e43 P00692 80
1f1s Q53591 56
How can I change the above code ? your help would be appreciated!
'
This code is hopefully what your looking for:
class test():
def get_seq(self, pdb,fasta_file): # Get sequences
from Bio.PDB.PDBParser import PDBParser
from Bio import SeqIO
aa = {'ARG':'R','HIS':'H','LYS':'K','ASP':'D','GLU':'E','SER':'S','THR':'T','ASN':'N','GLN':'Q','CYS':'C','SEC':'U','GLY':'G','PRO':'P','ALA':'A','ILE':'I','LEU':'L','MET':'M','PHE':'F','TRP':'W','TYR':'Y','VAL':'V'}
p=PDBParser(PERMISSIVE=1)
structure_id="%s" % pdb[:-4]
structure=p.get_structure(structure_id, pdb)
residues = structure.get_residues()
seq_pdb = ''
for res in residues:
res = res.get_resname()
if res in aa:
seq_pdb = seq_pdb+aa[res]
handle = open(fasta_file, "rU")
for record in SeqIO.parse(handle, "fasta") :
seq_fasta = record.seq
handle.close()
self.seq_aln(seq_pdb,seq_fasta)
def seq_aln(self,seq1,seq2): # Align the sequences
from Bio import pairwise2
from Bio.SubsMat import MatrixInfo as matlist
matrix = matlist.blosum62
gap_open = -10
gap_extend = -0.5
alns = pairwise2.align.globalds(seq1, seq2, matrix, gap_open, gap_extend)
top_aln = alns[0]
aln_seq1, aln_seq2, score, begin, end = top_aln
with open('aln.fasta', 'w') as outfile:
outfile.write('> PDB_seq\n'+str(aln_seq1)+'\n> Uniprot_seq\n'+str(aln_seq2))
print aln_seq1+'\n'+aln_seq2
self.seq_id('aln.fasta')
def seq_id(self,aln_fasta): # Get sequence ID
import string
from Bio import AlignIO
input_handle = open("aln.fasta", "rU")
alignment = AlignIO.read(input_handle, "fasta")
j=0 # counts positions in first sequence
i=0 # counts identity hits
for record in alignment:
#print record
for amino_acid in record.seq:
if amino_acid == '-':
pass
else:
if amino_acid == alignment[0].seq[j]:
i += 1
j += 1
j = 0
seq = str(record.seq)
gap_strip = seq.replace('-', '')
percent = 100*i/len(gap_strip)
print record.id+' '+str(percent)
i=0
a = test()
a.get_seq('1DBH.pdb','Q07889.fasta')
This outputs:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------EQTYYDLVKAF-AEIRQYIRELNLIIKVFREPFVSNSKLFSANDVENIFSRIVDIHELSVKLLGHIEDTVE-TDEGSPHPLVGSCFEDLAEELAFDPYESYARDILRPGFHDRFLSQLSKPGAALYLQSIGEGFKEAVQYVLPRLLLAPVYHCLHYFELLKQLEEKSEDQEDKECLKQAITALLNVQSG-EKICSKSLAKRRLSESA-------------AIKK-NEIQKNIDGWEGKDIGQCCNEFI-EGTLTRVGAKHERHIFLFDGL-ICCKSNHGQPRLPGASNAEYRLKEKFF-RKVQINDKDDTNEYKHAFEIILKDENSVIFSAKSAEEKNNW-AALISLQYRSTL---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MQAQQLPYEFFSEENAPKWRGLLVPALKKVQGQVHPTLESNDDALQYVEELILQLLNMLCQAQPRSASDVEERVQKSFPHPIDKWAIADAQSAIEKRKRRNPLSLPVEKIHPLLKEVLGYKIDHQVSVYIVAVLEYISADILKLVGNYVRNIRHYEITKQDIKVAMCADKVLMDMFHQDVEDINILSLTDEEPSTSGEQTYYDLVKAFMAEIRQYIRELNLIIKVFREPFVSNSKLFSANDVENIFSRIVDIHELSVKLLGHIEDTVEMTDEGSPHPLVGSCFEDLAEELAFDPYESYARDILRPGFHDRFLSQLSKPGAALYLQSIGEGFKEAVQYVLPRLLLAPVYHCLHYFELLKQLEEKSEDQEDKECLKQAITALLNVQSGMEKICSKSLAKRRLSESACRFYSQQMKGKQLAIKKMNEIQKNIDGWEGKDIGQCCNEFIMEGTLTRVGAKHERHIFLFDGLMICCKSNHGQPRLPGASNAEYRLKEKFFMRKVQINDKDDTNEYKHAFEIILKDENSVIFSAKSAEEKNNWMAALISLQYRSTLERMLDVTMLQEEKEEQMRLPSADVYRFAEPDSEENIIFEENMQPKAGIPIIKAGTVIKLIERLTYHMYADPNFVRTFLTTYRSFCKPQELLSLIIERFEIPEPEPTEADRIAIENGDQPLSAELKRFRKEYIQPVQLRVLNVCRHWVEHHFYDFERDAYLLQRMEEFIGTVRGKAMKKWVESITKIIQRKKIARDNGPGHNITFQSSPPTVEWHISRPGHIETFDLLTLHPIEIARQLTLLESDLYRAVQPSELVGSVWTKEDKEINSPNLLKMIRHTTNLTLWFEKCIVETENLEERVAVVSRIIEILQVFQELNNFNGVLEVVSAMNSSPVYRLDHTFEQIPSRQKKILEEAHELSEDHYKKYLAKLRSINPPCVPFFGIYLTNILKTEEGNPEVLKRHGKELINFSKRRKVAEITGEIQQYQNQPYCLRVESDIKRFFENLNPMGNSMEKEFTDYLFNKSLEIEPRNPKPLPRFPKKYSYPLKSPGVRPSNPRPGTMRHPTPLQQEPRKISYSRIPESETESTASAPNSPRTPLTPPPASGASSTTDVCSVFDSDHSSPFHSSNDTVFIQVTLPHGPRSASVSSISLTKGTDEVPVPPPVPPRRRPESAPAESSPSKIMSKHLDSPPAIPPRQPTSKAYSPRYSISDRTSISDPPESPPLLPPREPVRTPDVFSSSPLHLQPPPLGKKSDHGNAFFPNSPSPFTPPPPQTPSPHGTRRHLPSPPLTQEVDLHSIAGPPVPPRQSTSQHIPKLPPKTYKREHTHPSMHRDGPPLLENAHSS
PDB_seq 100 # pdb to itself would obviously have 100% identity
Uniprot_seq 24 # pdb sequence has 24% identity to the uniprot sequence
For this to work on you input file, you need to put my a.get_seq() in a for loop with the inputs from your text file.
EDIT:
Replace the seq_id function with this one:
def seq_id(self,aln_fasta):
import string
from Bio import AlignIO
from Bio import SeqIO
record_iterator = SeqIO.parse(aln_fasta, "fasta")
first_record = record_iterator.next()
print '%s has a length of %d' % (first_record.id, len(str(first_record.seq).replace('-','')))
second_record = record_iterator.next()
print '%s has a length of %d' % (second_record.id, len(str(second_record.seq).replace('-','')))
lengths = [len(str(first_record.seq).replace('-','')), len(str(second_record.seq).replace('-',''))]
if lengths.index(min(lengths)) == 0: # If both sequences have the same length the PDB sequence will be taken as the shortest
print 'PDB sequence has the shortest length'
else:
print 'Uniport sequence has the shortes length'
idenities = 0
for i,v in enumerate(first_record.seq):
if v == '-':
pass
#print i,v, second_record.seq[i]
if v == second_record.seq[i]:
idenities +=1
#print i,v, second_record.seq[i], idenities
print 'Sequence Idenity = %.2f percent' % (100.0*(idenities/min(lengths)))
to pass the arguments to the class use:
with open('input_file.txt', 'r') as infile:
next(infile)
next(infile) # Going by your input file
for line in infile:
line = line.split()
a.get_seq(segs[0]+'.pdb',segs[1]+'.fasta')
It might be something like this; a repeatable example (e.g., with short files posted on-line) would help...
library(Biostrings)
pdb = readAAStringSet("pdb.fasta")
uniprot = readAAStringSet("uniprot.fasta")
to input all sequences into two objects. pairwiseAlignment accepts a vector as first (query) argument, so if you were wanting to align all pdb against all uniprot pre-allocate a result matrix
pids = matrix(numeric(), length(uniprot), length(pdb),
dimnames=list(names(uniprot), names(pdb)))
and then do the calculations
for (i in seq_along(uniprot)) {
globalAlignment = pairwiseAlignment(pdb, uniprot[i])
pids[i,] = pid(globalAlignment)
}

Resources