Sixteen protein sequences for enzymes with known activity against polyethylene terephthalate (PET) were clustered using CD-HIT to derive a reduced set of twelve centroid sequences. These twelve protein sequences were aligned in a structure-guided multiple sequence alignment by T-COFFEE. A profile hidden Markov model (HMM) was derived from this multiple sequence alignment by HMMER.
The profile HMM was trimmed by selecting alignment columns that corresponded to the region between amino acid positions 32 and 274 in the PETase from Ideonella sakaiensis (IsPETase, Uniprot identifier A0A0K8P6T7)