PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199156297
199266363
1993232595
19944611,056
19953431,399
19964071,806
19975612,367
19987553,122
19998964,018
200010045,022
200110426,064
200211127,176
200315578,733
2004212010,853
2005234013,193
2006264615,839
2007296618,805
2008276021,565
2009282124,386
2010287027,256
2011264029,896
2012288832,784
2013309435,878
2014380039,678
2015314542,823
2016372046,543
2017400350,546
2018376454,310
2019403258,342
2020503363,375
2021450867,883
2022561373,496
2023523878,734
2024546484,198
2025617590,373
2026143091,803