Intellectual property offices worldwide are now enforcing the World Intellectual Property Organization (WIPO) Standard ST.26, which prescribes requirements for the presentation of nucleotide and amino acid sequence listings in patent applications. The new standard applies to patent applications filed on or after 1 July 2022 (the “big bang” date).
Standard ST.26 (hereinafter, “ST.26”) is intended to establish a single format for sequence listings that is acceptable worldwide and is compatible with international databases, such as those maintained by the International Nucleotide Sequence Database Collaboration (INSDC). ST.26 replaces Standard ST.25 (hereinafter, “ST.25”) which, according to WIPO, uses a format that is not compliant with INSDC requirements, is interpreted and enforced inconsistently across different patent offices, and does not include rules covering sequence types that are common today.
The full text of ST.26 can be found here. In this article, we summarize some of the key differences between ST.26 and ST.25 and provide guidance for preparing sequence listings that are compliant with the new standard.
Sequence listings in XML:
Under ST.26, sequence listings must be presented as a single file in XML (eXtensible Mark-up Language), rather than the “plain text” TXT format used under ST.25. The use of XML is intended to facilitate searching of the sequence data, particularly in public databases.
XML-editing software must be used to prepare sequence listings that comply with ST.26. To assist applicants, WIPO has developed its own software, called WIPO Sequence, which is specifically designed to create and edit ST.26-compliant sequence listings. Notably, the software includes a built-in validation tool which, when used, generates a verification report identifying potential errors. It also allows users to upload, view and print XML sequence listings in a human readable format. In addition, ST.25-compliant sequence listings in TXT can be converted to XML using WIPO Sequence, although experience has shown that some care needs to be taken when converting sequence listings. In particular, Annex VII of ST.26 sets out recommendations for the transformation of a sequence listing from ST.25 to ST.26.
Inclusion criteria for sequence listings:
ST.26 establishes a number of inclusion criteria for sequence listings that are not found in ST.25. For example, ST.26 specifically requires inclusion of various non-typical sequence types in sequence listings, including D-amino acids, branched sequences, and nucleotide analogs.
In addition, ST.26 changes the minimum length requirement for sequence listings. Sequence listings are limited to unbranched sequences and linear regions of a branched sequences containing (i) 10 or more specifically defined nucleotides; or (ii) 4 or more specifically defined amino acids. Sequences that do not meet the minimum length requirement must be disclosed elsewhere in the patent specification instead of in a sequence listing. Importantly, ST.26 specifically prohibits the inclusion of sequences which are shorter than the minimum, whereas ST.25 did not.
Only “specifically defined” residues count toward the length, and the term “specifically defined” is defined in ST.26 as meaning “any nucleotide other than those represented by the symbol “n” and any amino acid other than those represented by the symbol “X”, listed in Annex I” (ST.26, paragraph 3(k)). Stated differently, nucleotides represented by “n” and amino acids represented by “X” do not count toward the sequence length. For example, a nucleotide sequence represented by 5’-acgtnnacgt-3’ has only eight specifically defined nucleotides and, therefore, is prohibited from being included in a sequence listing. It should be noted that Annex I defines the term “n” as meaning “a or c or g or t/u; “unknown” or “other”, and defines the term “X” as meaning “A or R or N or D or C or Q or E or G or H or I or L or K or M or F or P or O or S or U or T or W or Y or V; “unknown” or “other”.
With regard to nucleotides, Annex I provides other symbols for other combinations of alternative nucleotides, such as “v” being the symbol for “a or c or g; not t/u”, and “b” being the symbol for “c or g or t/u; not a”. Thus, the term “specifically defined”, in the context of a nucleotide sequence, in fact excludes only the symbol “n”, and includes these other symbols defining other combinations of alternative nucleotides.
ST.26 also requires that certain sequences should be included in the sequence listing as more than one sequence, each with its own sequence identification number. In particular, where a sequence contains an undefined gap in the sequence, or the symbol “n” or “X” is used to indicate something other than the conventional definition provided in Annex I (e.g. a spacer), then each sequence on either side of the gap or “n”/”X” must be included as a separate sequence, each with its own identifier number (ST.26, paragraphs 35 and 37), so long as each sequence meets the minimum length requirement.
In contrast, ST.26 requires that a sequence that contains regions of “specifically defined” residues (totaling at least 10 nucleotides or at least 4 amino acids, as appropriate) separated by one or more regions of contiguous “n” or “X” residues, wherein the exact number of “n” or “X” residues in each region is disclosed, must be included in the sequence listing as one sequence and assigned its own sequence identification number (ST.26, paragraph 36).
Annex VI of ST.26 provides a guidance document with illustrated examples which serve to further clarify the above-noted inclusion criteria.
Other changes:
ST.26 includes a multitude of additional changes over ST.25, which we are not able to detail in their entirety in this article. However, examples of these differences are that uracil in RNA sequences is represented by “t” instead of “u”, and amino acid residues are represented by one-letter symbols instead of three-letter symbols (e.g., arginine is represented by “R” instead of “Arg”). A further difference is that only the first of multiple applicants, the first of multiple inventors and the first of multiple priority applications are listed in the sequence listing (although more than one can be entered to the WIPO Sequence software when preparing the sequence listing). In addition, the annotation of variant sequences has been standardized, whereas this was not standardized under ST.25.
The WIPO Sequence software manages the above-noted changes automatically, although users are recommended to make sure that they are using the most recent version of the software, as WIPO is continuing to fix issues reported to it.
It should also be noted that, while ST.26 applies to all new applications filed on or after 1 July 2022, different patent offices have adopted different approaches to divisional applications, which benefit from the filing date of the parent application for the purpose of examination of novelty and inventive step. For example, the UKIPO has decided that ST.26 does not apply to divisional applications filed on or after 1 July 2022 (where the parent application was filed before 1 July 2022), while the EPO has decided that ST.26 does apply to divisional applications filed on or after 1 July 2022. In Canada, applications having a filing date before 1 July 2022 (including divisional applications having a presentation date on or after 1 July 2022) are permitted to comply with either ST.26 or ST.25.
Conclusion:
Overall, ST.26 represents a significant departure from ST.25, and applicants will need to ensure that the various new requirements introduced under ST.26 are met. While XML documents look very different from TXT documents, the new WIPO Sequence software permits applicants to prepare compliant sequence listings and view a more understandable version of the sequence listing. Thus, applicants or representatives who expect to file applications containing sequence listings are encouraged to download and familiarize themselves with WIPO Sequence, which automatically addresses many of the new requirements under ST.26.
We would also encourage applicants and representatives to familiarize themselves with any variations in approach adopted by their local patent office.
If you require assistance with preparing ST.26-compliant sequence listings, do not hesitate to contact one of our life sciences experts.
Contact can be made to kthomson@marks-clerk.com