menkeron.blogg.se - Find stop codons snapgene

#Find stop codons snapgene how to

I would try to solve the 2 portions of code myself, but I am unsure on how to take a position on a string (i.e. 'GAA') perfectly matches a codon, anything that was added onto it will be removed (e.g. 'TGAAC'), the beginning or end is part of another codon (not perfect multiple of 3), the extra codon on either end will be maintained (e.g.

The last step of code (which I need help with) will transform prot_contigs into new_prot_contigs =. These intergenic contigs are just stored in a new list. Thus, the contig is located in between two genes and nothing needs to be added. When the missing code is run, the computer searches to the left of the string and finds a stop codon (e.g.

I should also get intergenic_contigs = and intergenic_positions_contigs =. In order to code them into peptides, I need to reach back in the string until I find the start codon and extend the initial contig (e.g. These are the contigs that are located within genes. I should get extended_contigs = and extended_positions_contigs =. Intergenic_positions_contigs.append(#some code) Positions = #position indices for each contig on stringĮxtended_positions_contigs.append(#some code) String = 'GG*TAG*CCAATT*ATG*AACGAA*TAG*GAC' #remove '*', just for visualĬontigs = This is my code so far: from Bio.Seq import Seqįrom Bio.Alphabet import generic_dna, generic_protein The goal is to separate both coding and non-coding DNA. This is tricky because sometimes the first position of triplet codon within the contig will not be a multiple of three and could be located in the intergenic region (i.e. I need to take a substring, whose position is already known on the string, and I need to locate the nearest start and stop codons). I need to convert contigs into their respective protein sequences given a reference genome (i.e.