PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821450
1983757
19841168
19851078
1986886
1987894
198818112
198928140
199033173
199141214
199252266
1993144410
1994297707
1995223930
19962651,195
19973801,575
19984602,035
19995592,594
20006373,231
20016803,911
20026914,602
20039725,574
200413796,953
200514558,408
2006159610,004
2007168911,693
2008155813,251
2009148414,735
2010142016,155
2011125217,407
2012135118,758
2013144420,202
2014167321,875
2015138923,264
2016159224,856
2017163926,495
2018163128,126
2019166229,788
2020205931,847
2021163433,481
2022204835,529
2023200237,531
2024202639,557
2025223941,796
202653542,331