PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199156297
199266363
1993232595
19944621,057
19953431,400
19964071,807
19975612,368
19987563,124
19998954,019
200010045,023
200110436,066
200211117,177
200315578,734
2004212010,854
2005234213,196
2006264515,841
2007296318,804
2008275921,563
2009282024,383
2010287427,257
2011263929,896
2012289132,787
2013309635,883
2014379739,680
2015314542,825
2016372446,549
2017390250,451
2018373754,188
2019406958,257
2020499763,254
2021446967,723
2022549973,222
2023528478,506
2024542583,931
2025623990,170
2026155791,727