PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977922
1978325
1979227
1980431
1981839
19821756
1983965
19841075
19851186
1986894
19879103
198820123
198933156
199034190
199145235
199253288
1993160448
1994316764
19952501,014
19962921,306
19974131,719
19985192,238
19996362,874
20007353,609
20017774,386
20028135,199
200311846,383
200416117,994
200517569,750
2006196611,716
2007214213,858
2008198815,846
2009196317,809
2010194619,755
2011171021,465
2012183823,303
2013193625,239
2014224927,488
2015185529,343
2016213331,476
2017214833,624
2018213135,755
2019225638,011
2020275440,765
2021220942,974
2022283945,813
2023272648,539
2024277951,318
2025225853,576