PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821450
1983757
19841168
19851078
1986886
1987894
198818112
198928140
199033173
199141214
199252266
1993144410
1994297707
1995223930
19962651,195
19973801,575
19984612,036
19995592,595
20006373,232
20016793,911
20026914,602
20039715,573
200413796,952
200514558,407
2006159810,005
2007169311,698
2008155713,255
2009148314,738
2010142116,159
2011125217,411
2012135218,763
2013144420,207
2014167321,880
2015139223,272
2016158924,861
2017163826,499
2018163028,129
2019166629,795
2020206531,860
2021163433,494
2022204335,537
2023199637,533
2024202439,557
2025224441,801
202648042,281