This repository has been archived by the owner on May 14, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.html
1580 lines (1495 loc) · 93.2 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<!--
This HTML is auto-generated. DO NOT EDIT THIS FILE! If you are writing a new
PEP, see http://www.python.org/dev/peps/pep-0001 for instructions and links
to templates. DO NOT USE THIS HTML FILE AS YOUR TEMPLATE!
-->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.11: http://docutils.sourceforge.net/" />
<title>PEP 458 -- Surviving a Compromise of PyPI</title>
<style type="text/css">
/*
:Author: David Goodger
:Contact: [email protected]
:date: $Date: 2006-05-21 22:44:42 +0200 (Sun, 21 May 2006) $
:version: $Revision: 4564 $
:copyright: This stylesheet has been placed in the public domain.
Default cascading style sheet for the PEP HTML output of Docutils.
*/
/* "! important" is used here to override other ``margin-top`` and
``margin-bottom`` styles that are later in the stylesheet or
more specific. See http://www.w3.org/TR/CSS1#the-cascade */
.first {
margin-top: 0 ! important }
.last, .with-subtitle {
margin-bottom: 0 ! important }
.hidden {
display: none }
.navigation {
width: 100% ;
background: #99ccff ;
margin-top: 0px ;
margin-bottom: 0px }
.navigation .navicon {
width: 150px ;
height: 35px }
.navigation .textlinks {
padding-left: 1em ;
text-align: left }
.navigation td, .navigation th {
padding-left: 0em ;
padding-right: 0em ;
vertical-align: middle }
.rfc2822 {
margin-top: 0.5em ;
margin-left: 0.5em ;
margin-right: 0.5em ;
margin-bottom: 0em }
.rfc2822 td {
text-align: left }
.rfc2822 th.field-name {
text-align: right ;
font-family: sans-serif ;
padding-right: 0.5em ;
font-weight: bold ;
margin-bottom: 0em }
a.toc-backref {
text-decoration: none ;
color: black }
blockquote.epigraph {
margin: 2em 5em ; }
body {
margin: 0px ;
margin-bottom: 1em ;
padding: 0px }
dl.docutils dd {
margin-bottom: 0.5em }
div.section {
margin-left: 1em ;
margin-right: 1em ;
margin-bottom: 1.5em }
div.section div.section {
margin-left: 0em ;
margin-right: 0em ;
margin-top: 1.5em }
div.abstract {
margin: 2em 5em }
div.abstract p.topic-title {
font-weight: bold ;
text-align: center }
div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
margin: 2em ;
border: medium outset ;
padding: 1em }
div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
font-weight: bold ;
font-family: sans-serif }
div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title {
color: red ;
font-weight: bold ;
font-family: sans-serif }
/* Uncomment (and remove this text!) to get reduced vertical space in
compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
margin-bottom: 0.5em }
div.compound .compound-last, div.compound .compound-middle {
margin-top: 0.5em }
*/
div.dedication {
margin: 2em 5em ;
text-align: center ;
font-style: italic }
div.dedication p.topic-title {
font-weight: bold ;
font-style: normal }
div.figure {
margin-left: 2em ;
margin-right: 2em }
div.footer, div.header {
clear: both;
font-size: smaller }
div.footer {
margin-left: 1em ;
margin-right: 1em }
div.line-block {
display: block ;
margin-top: 1em ;
margin-bottom: 1em }
div.line-block div.line-block {
margin-top: 0 ;
margin-bottom: 0 ;
margin-left: 1.5em }
div.sidebar {
margin-left: 1em ;
border: medium outset ;
padding: 1em ;
background-color: #ffffee ;
width: 40% ;
float: right ;
clear: right }
div.sidebar p.rubric {
font-family: sans-serif ;
font-size: medium }
div.system-messages {
margin: 5em }
div.system-messages h1 {
color: red }
div.system-message {
border: medium outset ;
padding: 1em }
div.system-message p.system-message-title {
color: red ;
font-weight: bold }
div.topic {
margin: 2em }
h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
margin-top: 0.4em }
h1 {
font-family: sans-serif ;
font-size: large }
h2 {
font-family: sans-serif ;
font-size: medium }
h3 {
font-family: sans-serif ;
font-size: small }
h4 {
font-family: sans-serif ;
font-style: italic ;
font-size: small }
h5 {
font-family: sans-serif;
font-size: x-small }
h6 {
font-family: sans-serif;
font-style: italic ;
font-size: x-small }
hr.docutils {
width: 75% }
img.align-left {
clear: left }
img.align-right {
clear: right }
img.borderless {
border: 0 }
ol.simple, ul.simple {
margin-bottom: 1em }
ol.arabic {
list-style: decimal }
ol.loweralpha {
list-style: lower-alpha }
ol.upperalpha {
list-style: upper-alpha }
ol.lowerroman {
list-style: lower-roman }
ol.upperroman {
list-style: upper-roman }
p.attribution {
text-align: right ;
margin-left: 50% }
p.caption {
font-style: italic }
p.credits {
font-style: italic ;
font-size: smaller }
p.label {
white-space: nowrap }
p.rubric {
font-weight: bold ;
font-size: larger ;
color: maroon ;
text-align: center }
p.sidebar-title {
font-family: sans-serif ;
font-weight: bold ;
font-size: larger }
p.sidebar-subtitle {
font-family: sans-serif ;
font-weight: bold }
p.topic-title {
font-family: sans-serif ;
font-weight: bold }
pre.address {
margin-bottom: 0 ;
margin-top: 0 ;
font-family: serif ;
font-size: 100% }
pre.literal-block, pre.doctest-block {
margin-left: 2em ;
margin-right: 2em }
span.classifier {
font-family: sans-serif ;
font-style: oblique }
span.classifier-delimiter {
font-family: sans-serif ;
font-weight: bold }
span.interpreted {
font-family: sans-serif }
span.option {
white-space: nowrap }
span.option-argument {
font-style: italic }
span.pre {
white-space: pre }
span.problematic {
color: red }
span.section-subtitle {
/* font-size relative to parent (h1..h6 element) */
font-size: 80% }
table.citation {
border-left: solid 1px gray;
margin-left: 1px }
table.docinfo {
margin: 2em 4em }
table.docutils {
margin-top: 0.5em ;
margin-bottom: 0.5em }
table.footnote {
border-left: solid 1px black;
margin-left: 1px }
table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
padding-left: 0.5em ;
padding-right: 0.5em ;
vertical-align: top }
td.num {
text-align: right }
th.field-name {
font-weight: bold ;
text-align: left ;
white-space: nowrap ;
padding-left: 0 }
h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
font-size: 100% }
ul.auto-toc {
list-style-type: none }
</style>
</head>
<body bgcolor="white">
<table class="navigation" cellpadding="0" cellspacing="0"
width="100%" border="0">
<tr><td class="navicon" width="150" height="35">
<a href="http://www.python.org/" title="Python Home Page">
<img src="http://www.python.org/pics/PyBanner004.gif" alt="[Python]"
border="0" width="150" height="35" /></a></td>
<td class="textlinks" align="left">
[<b><a href="http://www.python.org/">Python Home</a></b>]
[<b><a href="http://www.python.org/dev/peps/">PEP Index</a></b>]
[<b><a href="./pep-0458.txt">PEP Source</a></b>]
</td></tr></table>
<div class="document">
<table class="rfc2822 docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field"><th class="field-name">PEP:</th><td class="field-body">458</td>
</tr>
<tr class="field"><th class="field-name">Title:</th><td class="field-body">Surviving a Compromise of PyPI</td>
</tr>
<tr class="field"><th class="field-name">Version:</th><td class="field-body">$Revision$</td>
</tr>
<tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference external" href="http://svn.python.org/view/*checkout*/peps/trunk/pep-0458.txt">$Date$</a></td>
</tr>
<tr class="field"><th class="field-name">Author:</th><td class="field-body">Trishank Karthik Kuppusamy <trishank at nyu.edu>,
Donald Stufft <donald at stufft.io>,
Justin Cappos <jcappos at nyu.edu></td>
</tr>
<tr class="field"><th class="field-name">BDFL-Delegate:</th><td class="field-body">Nick Coghlan <<a class="reference external" href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>></td>
</tr>
<tr class="field"><th class="field-name">Discussions-To:</th><td class="field-body">DistUtils mailing list <<a class="reference external" href="mailto:distutils-sig@python.org?subject=PEP%20458">distutils-sig at python.org</a>></td>
</tr>
<tr class="field"><th class="field-name">Status:</th><td class="field-body">Draft</td>
</tr>
<tr class="field"><th class="field-name">Type:</th><td class="field-body">Standards Track</td>
</tr>
<tr class="field"><th class="field-name">Content-Type:</th><td class="field-body"><a class="reference external" href="http://www.python.org/dev/peps/pep-0012">text/x-rst</a></td>
</tr>
<tr class="field"><th class="field-name">Created:</th><td class="field-body">27-Sep-2013</td>
</tr>
</tbody>
</table>
<hr />
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#abstract" id="id117">Abstract</a></li>
<li><a class="reference internal" href="#rationale" id="id118">Rationale</a></li>
<li><a class="reference internal" href="#definitions" id="id119">Definitions</a></li>
<li><a class="reference internal" href="#overview" id="id120">Overview</a></li>
<li><a class="reference internal" href="#responsibility-separation" id="id121">Responsibility Separation</a></li>
<li><a class="reference internal" href="#metadata-management" id="id122">Metadata Management</a><ul>
<li><a class="reference internal" href="#why-do-we-need-consistent-snapshots" id="id123">Why Do We Need Consistent Snapshots?</a></li>
<li><a class="reference internal" href="#producing-consistent-snapshots" id="id124">Producing Consistent Snapshots</a></li>
<li><a class="reference internal" href="#metadata-validation" id="id125">Metadata Validation</a></li>
<li><a class="reference internal" href="#mirroring-protocol" id="id126">Mirroring Protocol</a></li>
<li><a class="reference internal" href="#backup-process" id="id127">Backup Process</a></li>
<li><a class="reference internal" href="#metadata-expiry-times" id="id128">Metadata Expiry Times</a></li>
<li><a class="reference internal" href="#metadata-scalability" id="id129">Metadata Scalability</a></li>
</ul>
</li>
<li><a class="reference internal" href="#key-management" id="id130">Key Management</a><ul>
<li><a class="reference internal" href="#number-of-keys" id="id131">Number Of Keys</a></li>
<li><a class="reference internal" href="#online-and-offline-keys" id="id132">Online and Offline Keys</a></li>
<li><a class="reference internal" href="#key-strength" id="id133">Key Strength</a></li>
<li><a class="reference internal" href="#diversity-of-keys" id="id134">Diversity Of Keys</a></li>
<li><a class="reference internal" href="#key-compromise-analysis" id="id135">Key Compromise Analysis</a></li>
<li><a class="reference internal" href="#in-the-event-of-a-key-compromise" id="id136">In the Event of a Key Compromise</a></li>
</ul>
</li>
<li><a class="reference internal" href="#appendix-rejected-proposals" id="id137">Appendix: Rejected Proposals</a><ul>
<li><a class="reference internal" href="#alternative-proposals-for-producing-consistent-snapshots" id="id138">Alternative Proposals for Producing Consistent Snapshots</a></li>
</ul>
</li>
<li><a class="reference internal" href="#references" id="id139">References</a></li>
<li><a class="reference internal" href="#acknowledgements" id="id140">Acknowledgements</a></li>
<li><a class="reference internal" href="#copyright" id="id141">Copyright</a></li>
</ul>
</div>
<div class="section" id="abstract">
<h1><a class="toc-backref" href="#id117">Abstract</a></h1>
<p>This PEP describes how the Python Package Index (PyPI <a class="footnote-reference" href="#id55" id="id1">[1]</a>) may be integrated
with The Update Framework <a class="footnote-reference" href="#id56" id="id2">[2]</a> (TUF). TUF was designed to be a plug-and-play
security add-on to a software updater or package manager. TUF provides
end-to-end security like SSL, but for software updates instead of HTTP
connections. The framework integrates best security practices such as
separating responsibilities, adopting the many-man rule for signing packages,
keeping signing keys offline, and revocation of expired or compromised signing
keys.</p>
<p>The proposed integration will render modern package managers such as pip <a class="footnote-reference" href="#id57" id="id3">[3]</a>
more secure against various types of security attacks on PyPI and protect users
against them. Even in the worst case where an attacker manages to compromise
PyPI itself, the damage is controlled in scope and limited in duration.</p>
<p>Specifically, this PEP will describe how PyPI processes should be adapted to
incorporate TUF metadata. It will not prescribe how package managers such as
pip should be adapted to install or update with TUF metadata projects from
PyPI.</p>
</div>
<div class="section" id="rationale">
<h1><a class="toc-backref" href="#id118">Rationale</a></h1>
<p>In January 2013, the Python Software Foundation (PSF) announced <a class="footnote-reference" href="#id58" id="id4">[4]</a> that the
python.org wikis for Python, Jython, and the PSF were subjected to a security
breach which caused all of the wiki data to be destroyed on January 5 2013.
Fortunately, the PyPI infrastructure was not affected by this security breach.
However, the incident is a reminder that PyPI should take defensive steps to
protect users as much as possible in the event of a compromise. Attacks on
software repositories happen all the time <a class="footnote-reference" href="#id59" id="id5">[5]</a>. We must accept the possibility
of security breaches and prepare PyPI accordingly because it is a valuable
target used by thousands, if not millions, of people.</p>
<p>Before the wiki attack, PyPI used MD5 hashes to tell package managers such as
pip whether or not a package was corrupted in transit. However, the absence of
SSL made it hard for package managers to verify transport integrity to PyPI.
It was easy to launch a man-in-the-middle attack between pip and PyPI to change
package contents arbitrarily. This can be used to trick users into installing
malicious packages. After the wiki attack, several steps were proposed (some
of which were implemented) to deliver a much higher level of security than was
previously the case: requiring SSL to communicate with PyPI <a class="footnote-reference" href="#id60" id="id6">[6]</a>, restricting
project names <a class="footnote-reference" href="#id61" id="id7">[7]</a>, and migrating from MD5 to SHA-2 hashes <a class="footnote-reference" href="#id62" id="id8">[8]</a>.</p>
<p>These steps, though necessary, are insufficient because attacks are still
possible through other avenues. For example, a public mirror is trusted to
honestly mirror PyPI, but some mirrors may misbehave due to malice or accident.
Package managers such as pip are supposed to use signatures from PyPI to verify
packages downloaded from a public mirror <a class="footnote-reference" href="#id63" id="id9">[9]</a>, but none are known to actually
do so <a class="footnote-reference" href="#id64" id="id10">[10]</a>. Therefore, it is also wise to add more security measures to
detect attacks from public mirrors or content delivery networks <a class="footnote-reference" href="#id65" id="id11">[11]</a> (CDNs).</p>
<p>Even though official mirrors are being deprecated on PyPI <a class="footnote-reference" href="#id66" id="id12">[12]</a>, there remain a
wide variety of other attack vectors on package managers <a class="footnote-reference" href="#id67" id="id13">[13]</a>. Among other
things, these attacks can crash client systems, cause obsolete packages to be
installed, or even allow an attacker to execute arbitrary code. In September
2013, we showed how the latest version of pip then was susceptible to these
attacks and how TUF could protect users against them <a class="footnote-reference" href="#id68" id="id14">[14]</a>.</p>
<p>Finally, PyPI allows for packages to be signed with GPG keys <a class="footnote-reference" href="#id69" id="id15">[15]</a>, although no
package manager is known to verify those signatures, thus negating much of the
benefits of having those signatures at all. Validating integrity through
cryptography is important, but issues such as immediate and secure key
revocation or specifying a required threshold number of signatures still
remain. Furthermore, GPG by itself does not immediately address the attacks
mentioned above.</p>
<p>In order to protect PyPI against infrastructure compromises, we propose
integrating PyPI with The Update Framework <a class="footnote-reference" href="#id56" id="id16">[2]</a> (TUF).</p>
</div>
<div class="section" id="definitions">
<h1><a class="toc-backref" href="#id119">Definitions</a></h1>
<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC <a class="reference external" href="http://www.ietf.org/rfc/rfc2119.txt">2119</a> <a class="footnote-reference" href="#id80" id="id81">[26]</a>.</p>
<p>In order to keep this PEP focused solely on the application of TUF on PyPI, the
reader is assumed to already be familiar with the design principles of
TUF <a class="footnote-reference" href="#id56" id="id18">[2]</a>. It is also strongly RECOMMENDED that the reader be familiar with the
TUF specification <a class="footnote-reference" href="#id70" id="id19">[16]</a>.</p>
<ul class="simple">
<li>Projects: Projects are software components that are made available for
integration. Projects include Python libraries, frameworks, scripts, plugins,
applications, collections of data or other resources, and various
combinations thereof. Public Python projects are typically registered on the
Python Package Index <a class="footnote-reference" href="#id71" id="id20">[17]</a>.</li>
<li>Releases: Releases are uniquely identified snapshots of a project <a class="footnote-reference" href="#id71" id="id21">[17]</a>.</li>
<li>Distributions: Distributions are the packaged files which are used to publish
and distribute a release <a class="footnote-reference" href="#id71" id="id22">[17]</a>.</li>
<li>Simple index: The HTML page which contains internal links to the
distributions of a project <a class="footnote-reference" href="#id71" id="id23">[17]</a>.</li>
<li>Consistent snapshot: A set of TUF metadata and PyPI targets that capture the
complete state of all projects on PyPI as they were at some fixed point in
time.</li>
<li>The <em>consistent-snapshot</em> (<em>release</em>) role: In order to prevent confusion due
to the different meanings of the term "release" as employed by <a class="reference external" href="http://www.python.org/dev/peps/pep-0426">PEP 426</a> <a class="footnote-reference" href="#id71" id="id24">[17]</a>
and the TUF specification <a class="footnote-reference" href="#id70" id="id25">[16]</a>, we rename the <em>release</em> role as the
<em>consistent-snapshot</em> role.</li>
<li>Continuous delivery: A set of processes with which PyPI produces consistent
snapshots that can safely coexist and deleted independently <a class="footnote-reference" href="#id72" id="id26">[18]</a>.</li>
<li>Developer: Either the owner or maintainer of a project who is allowed to
update the TUF metadata as well as distribution metadata and data for the
project.</li>
<li>Online key: A key that MUST be stored on the PyPI server infrastructure.
This is usually to allow automated signing with the key. However, this means
that an attacker who compromises PyPI infrastructure will be able to read
these keys.</li>
<li>Offline key: A key that MUST be stored off the PyPI infrastructure. This
prevents automated signing with the key. This means that an attacker who
compromises PyPI infrastructure will not be able to immediately read these
keys.</li>
<li>Developer key: A private key for which its corresponding public key is
registered with PyPI to say that it is responsible for directly signing for
or delegating the distributions belonging to a project. For the purposes of
this PEP, it is offline in the sense that the private key MUST not be stored
on PyPI. However, the project is free to require certain developer keys to
be online on its own infrastructure.</li>
<li>Threshold signature scheme: A role could increase its resilience to key
compromises by requiring that at least t out of n keys are REQUIRED to sign
its metadata. This means that a compromise of t-1 keys is insufficient to
compromise the role itself. We denote this property by saying that the role
requires (t, n) keys.</li>
</ul>
</div>
<div class="section" id="overview">
<h1><a class="toc-backref" href="#id120">Overview</a></h1>
<img alt="https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/figure1.png" src="https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/figure1.png" />
<p>Figure 1: A simplified overview of the roles in PyPI with TUF</p>
<p>Figure 1 shows a simplified overview of the roles that TUF metadata assume on
PyPI. The top-level <em>root</em> role signs for the keys of the top-level
<em>timestamp</em>, <em>consistent-snapshot</em>, <em>targets</em> and <em>root</em> roles. The
<em>timestamp</em> role signs for a new and consistent snapshot. The <em>consistent-
snapshot</em> role signs for the <em>root</em>, <em>targets</em> and all delegated targets
metadata. The <em>claimed</em> role signs for all projects that have registered their
own developer keys with PyPI. The <em>recently-claimed</em> role signs for all
projects that recently registered their own developer keys with PyPI. Finally,
the <em>unclaimed</em> role signs for all projects that have not registered developer
keys with PyPI. The <em>claimed</em>, <em>recently-claimed</em> and <em>unclaimed</em> roles are
numbered 1, 2, 3 respectively because a project will be searched for in each of
those roles in that descending order: first in <em>claimed</em>, then in
<em>recently-claimed</em> if necessary, and finally in <em>unclaimed</em> if necessary.</p>
<p>Every year, PyPI administrators are going to sign for <em>root</em> role keys. After
that, automation will continuously sign for a timestamped, consistent snapshot
of all projects. Every few months, PyPI administrators will move projects with
vetted developer keys from the <em>recently-claimed</em> role to the <em>claimed</em> role.
As we will soon see, they will sign for <em>claimed</em> with projects with offline
keys.</p>
<p>This PEP does not require project developers to use TUF to secure their
packages from attacks on PyPI. By default, all projects will be signed for by
the <em>unclaimed</em> role. If a project wishes stronger security guarantees, then
the project is strongly RECOMMENDED to register developer keys with PyPI so
that it may sign for its own distributions. By doing so, the project must
remain as a <em>recently-claimed</em> project until PyPI administrators have had an
opportunity to vet the developer keys of the project, after which the project
will be moved to the <em>claimed</em> role.</p>
<p>This PEP has <strong>not</strong> been designed to be backward-compatible for package
managers that do not use the TUF security protocol to install or update a
project from the PyPI described here. Instead, it is RECOMMENDED that PyPI
maintain a backward-compatible API of itself that does NOT offer TUF so that
older package managers that do not use TUF will be able to install or update
projects from PyPI as usual but without any of the security offered by TUF.
For the rest of this PEP, we will assume that PyPI will simultaneously maintain
a backward-incompatible API of itself for package managers that MUST use TUF to
securely install or update projects. We think that this approach represents a
reasonable trade-off: older package managers that do not TUF will still be able
to install or update projects without any TUF security from PyPI, and newer
package managers that do use TUF will be able to securely install or update
projects. At some point in the future, PyPI administrators MAY choose to
permanently deprecate the backward-compatible version of itself that does not
offer TUF metadata.</p>
<p>Unless a mirror, CDN or the PyPI repository has been compromised, the end-user
will not be able to discern whether or not a package manager is using TUF to
install or update a project from PyPI.</p>
</div>
<div class="section" id="responsibility-separation">
<h1><a class="toc-backref" href="#id121">Responsibility Separation</a></h1>
<p>Recall that TUF requires four top-level roles: <em>root</em>, <em>timestamp</em>,
<em>consistent-snapshot</em> and <em>targets</em>. The <em>root</em> role specifies the keys of all
the top-level roles (including itself). The <em>timestamp</em> role specifies the
latest consistent snapshot. The <em>consistent-snapshot</em> role specifies the
latest versions of all TUF metadata files (other than <em>timestamp</em>). The
<em>targets</em> role specifies available target files (in our case, it will be all
files on PyPI under the /simple and /packages directories). In this PEP, each
of these roles will serve their responsibilities without exception.</p>
<p>Our proposal offers two levels of security to developers. If developers opt in
to secure their projects with their own developer keys, then their projects
will be very secure. Otherwise, TUF will still protect them in many cases:</p>
<ol class="arabic simple">
<li>Minimum security (no action by a developer): protects <em>unclaimed</em> and
<em>recently-claimed</em> projects without developer keys from CDNs <a class="footnote-reference" href="#id73" id="id27">[19]</a> or public
mirrors, but not from some PyPI compromises. This is because continuous
delivery requires some keys to be online. This level of security protects
projects from being accidentally or deliberately tampered with by a mirror
or a CDN because the mirror or CDN will not have any of the PyPI or
developer keys required to sign for projects. However, it would not protect
projects from attackers who have compromised PyPI because they will be able
to manipulate the TUF metadata for <em>unclaimed</em> projects with the appropriate
online keys.</li>
<li>Maximum security (developer signs their project): protects projects with
developer keys not only from CDNs or public mirrors, but also from some PyPI
compromises. This is because many important keys will be offline. This
level of security protects projects from being accidentally or deliberately
tampered with by a mirror or a CDN for reasons identical to the minimum
security level. It will also protect projects (or at least mitigate
damages) from the most likely attacks on PyPI. For example: given access to
online keys after a PyPI compromise, attackers will be able to freeze the
distributions for these projects, but they will not be able to serve
malicious distributions for these projects (not without compromising other
offline keys which would entail more risk, time and energy). Details for
the exact level of security offered is discussed in the section on key
management.</li>
</ol>
<p>In order to complete support for continuous delivery, we propose three
delegated targets roles:</p>
<ol class="arabic simple">
<li><em>claimed</em>: Signs for the delegation of PyPI projects to their respective
developer keys.</li>
<li><em>recently-claimed</em>: This role is almost identical to the <em>claimed</em> role and
could technically be performed by the <em>unclaimed</em> role, but there are two
important reasons why it exists independently: the first reason is to
improve the performance of looking up projects in the <em>unclaimed</em> role (by
moving metadata to the <em>recently-claimed</em> role instead), and the second
reason is to make it easier for PyPI administrators to move
<em>recently-claimed</em> projects to the <em>claimed</em> role.</li>
<li><em>unclaimed</em>: Signs for PyPI projects without developer keys.</li>
</ol>
<p>The <em>targets</em> role MUST delegate all PyPI projects to the three delegated
targets roles in the order of appearance listed above. This means that when
pip downloads with TUF a distribution from a project on PyPI, it will first
consult the <em>claimed</em> role about it. If the <em>claimed</em> role has delegated the
project, then pip will trust the project developers (in order of delegation)
about the TUF metadata for the project. Otherwise, pip will consult the
<em>recently-claimed</em> role about the project. If the <em>recently-claimed</em> role has
delegated the project, then pip will trust the project developers (in order of
delegation) about the TUF metadata for the project. Otherwise, pip will
consult the <em>unclaimed</em> role about the TUF metadata for the project. If the
<em>unclaimed</em> role has not delegated the project, then the project is considered
to be non-existent on PyPI.</p>
<p>A PyPI project MAY begin without registering a developer key. Therefore, the
project will be signed for by the <em>unclaimed</em> role. After registering
developer keys, the project will be removed from the <em>unclaimed</em> role and
delegated to the <em>recently-claimed</em> role. After a probation period and a
vetting process to verify the developer keys of the project, the project will
be removed from the <em>recently-claimed</em> role and delegated to the <em>claimed</em>
role.</p>
<p>The <em>claimed</em> role offers maximum security, whereas the <em>recently-claimed</em> and
<em>unclaimed</em> role offer minimum security. All three roles support continuous
delivery of PyPI projects.</p>
<p>The <em>unclaimed</em> role offers minimum security because PyPI will sign for
projects without developer keys with an online key in order to permit
continuous delivery.</p>
<p>The <em>recently-claimed</em> role offers minimum security because while the project
developers will sign for their own distributions with offline developer keys,
PyPI will sign with an online key the delegation of the project to those
offline developer keys. The signing of the delegation with an online key
allows PyPI administrators to continuously deliver projects without having to
continuously sign the delegation whenever one of those projects registers
developer keys.</p>
<p>Finally, the <em>claimed</em> role offers maximum security because PyPI will sign with
offline keys the delegation of a project to its offline developer keys. This
means that every now and then, PyPI administrators will vet developer keys and
sign the delegation of a project to those developer keys after being reasonably
sure about the ownership of the developer keys. The process for vetting
developer keys is out of the scope of this PEP.</p>
</div>
<div class="section" id="metadata-management">
<h1><a class="toc-backref" href="#id122">Metadata Management</a></h1>
<p>In this section, we examine the TUF metadata that PyPI must manage by itself,
and other TUF metadata that must be safely delegated to projects. Examples of
the metadata described here may be seen at our testbed mirror of
<a class="reference external" href="http://mirror1.poly.edu/">PyPI-with-TUF</a> <a class="footnote-reference" href="#id82" id="id83">[27]</a>.</p>
<p>The metadata files that change most frequently will be <em>timestamp</em>,
<em>consistent-snapshot</em> and delegated targets (<em>claimed</em>, <em>recently-claimed</em>,
<em>unclaimed</em>, project) metadata. The <em>timestamp</em> and <em>consistent-snapshot</em>
metadata MUST be updated whenever <em>root</em>, <em>targets</em> or delegated targets
metadata are updated. Observe, though, that <em>root</em> and <em>targets</em> metadata are
much less likely to be updated as often as delegated targets metadata.
Therefore, <em>timestamp</em> and <em>consistent-snapshot</em> metadata will most likely be
updated frequently (possibly every minute) due to delegated targets metadata
being updated frequently in order to drive continuous delivery of projects.</p>
<p>Consequently, the processes with which PyPI updates projects will have to be
updated accordingly, the details of which are explained in the following
subsections.</p>
<div class="section" id="why-do-we-need-consistent-snapshots">
<h2><a class="toc-backref" href="#id123">Why Do We Need Consistent Snapshots?</a></h2>
<p>In an ideal world, metadata and data should be immediately updated and
presented whenever a project is updated. In practice, there will be problems
when there are many readers and writers who access the same metadata or data at
the same time.</p>
<p>An important example at the time of writing is that, mirrors are very likely,
as far as we can tell, to update in an inconsistent manner from PyPI as it is
without TUF. Specifically, a mirror would update itself in such a way that
project A would be from time T, whereas project B would be from time T+5,
project C would be from time T+3, and so on where T is the time that the mirror
first begun updating itself. There is no known way for a mirror to update
itself such that it captures the state of all projects as they were at time T.</p>
<p>Adding TUF to PyPI will not automatically solve the problem. Consider what we
call the <a class="reference external" href="https://groups.google.com/forum/#!topic/theupdateframework/8mkR9iqivQA">"inverse replay" or "fast-forward" problem</a> <a class="footnote-reference" href="#id84" id="id85">[28]</a>. Suppose that PyPI has
timestamped a consistent snapshot at version 1. A mirror is later in the
middle of copying PyPI at this snapshot. While the mirror is copying PyPI at
this snapshot, PyPI timestamps a new snapshot at, say, version 2. Without
accounting for consistency, the mirror would then find itself with a copy of
PyPI in an inconsistent state which is indistinguishable from arbitrary
metadata or target attacks. The problem would also apply when the mirror is
substituted with a pip user.</p>
<p>Therefore, the problem can be summarized as such: there are problems of
consistency on PyPI with or without TUF. TUF requires its metadata to be
consistent with the data, but how would the metadata be kept consistent with
projects that change all the time?</p>
<p>As a result, we will solve for PyPI the problem of producing a consistent
snapshot that captures the state of all known projects at a given time. Each
consistent snapshot can safely coexist with any other consistent snapshot and
deleted independently without affecting any other consistent snapshot.</p>
<p>The gist of the solution is that every metadata or data file written to disk
MUST include in its filename the <a class="reference external" href="https://en.wikipedia.org/wiki/Cryptographic_hash_function">cryptographic hash</a> <a class="footnote-reference" href="#id86" id="id87">[29]</a> of the file. How would
this help clients which use the TUF protocol to securely and consistently
install or update a project from PyPI?</p>
<p>Recall that the first step in the TUF protocol requires the client to download
the latest <em>timestamp</em> metadata. However, the client would not know in advance
the hash of the <em>timestamp</em> metadata file from the latest consistent snapshot.
Therefore, PyPI MUST redirect all HTTP GET requests for <em>timestamp</em> metadata to
the <em>timestamp</em> metadata file from the latest consistent snapshot. Since the
<em>timestamp</em> metadata is the root of a tree of cryptographic hashes pointing to
every other metadata or target file that are meant to exist together for
consistency, the client is then able to retrieve any file from this consistent
snapshot by deterministically including, in the request for the file, the hash
of the file in the filename. Assuming infinite disk space and no <a class="reference external" href="https://en.wikipedia.org/wiki/Collision_(computer_science)">hash
collisions</a> <a class="footnote-reference" href="#id88" id="id89">[30]</a>, a client may safely read from one consistent snapshot while PyPI
produces another consistent snapshot.</p>
<p>In this simple but effective manner, we are able to capture a consistent
snapshot of all projects and the associated metadata at a given time. The next
subsection will explicate the implementation details of this idea.</p>
<p>This PEP does not prohibit using advanced file systems or tools to produce
consistent snapshots (such solutions are mentioned in the Appendix). There are
two important reasons for why we chose this simple solution for the PEP.
Firstly, the solution does not mandate that PyPI use any particular file system
or tool. Secondly, as we will see later in this section, our generic
file-system based approach allows mirrors to use extant file transfer tools
such as rsync to efficiently transfer consistent snapshots from PyPI.</p>
</div>
<div class="section" id="producing-consistent-snapshots">
<h2><a class="toc-backref" href="#id124">Producing Consistent Snapshots</a></h2>
<p>Given a project, PyPI is responsible for updating, depending on the project,
either the <em>claimed</em>, <em>recently-claimed</em> or <em>unclaimed</em> metadata as well as
associated delegated targets metadata. Every project MUST upload its set of
metadata and targets in a single transaction. We will call this set of files
the project transaction. We will discuss later how PyPI MAY validate the files
in a project transaction. For now, let us focus on how PyPI will respond to a
project transaction. We will call this response the project transaction
process. There will also be a consistent snapshot process that we will define
momentarily; for now, it suffices to know that project transaction processes
and the consistent snapshot process must coordinate with each other.</p>
<p>Also, every metadata and target file MUST include in its filename the <a class="reference external" href="http://docs.python.org/2/library/hashlib.html#hashlib.hash.hexdigest">hex
digest</a> <a class="footnote-reference" href="#id90" id="id91">[31]</a> of its <a class="reference external" href="https://en.wikipedia.org/wiki/SHA-2">SHA-256</a> <a class="footnote-reference" href="#id92" id="id93">[32]</a> hash. For this PEP, it is RECOMMENDED that PyPI
adopt a simple convention of the form digest.filename.ext, where filename is
the original filename without a copy of the hash, digest is the hex digest of
the hash, and ext is the filename extension.</p>
<p>When an <em>unclaimed</em> project uploads a new transaction, a project transaction
process MUST add all new targets and relevant delegated <em>unclaimed</em> metadata.
(We will see later in this section why the <em>unclaimed</em> role will delegate
targets to a number of delegated <em>unclaimed</em> roles.) Finally, the project
transaction process MUST inform the consistent snapshot process about new
delegated <em>unclaimed</em> metadata.</p>
<p>When a <em>recently-claimed</em> project uploads a new a transaction, a project
transaction process MUST add all new targets and delegated targets metadata for
the project. If the project is new, then the project transaction process MUST
also add new <em>recently-claimed</em> metadata with public keys and threshold number
(which MUST be part of the transaction) for the project. Finally, the project
transaction process MUST inform the consistent snapshot process about new
<em>recently-claimed</em> metadata as well as the current set of delegated targets
metadata for the project.</p>
<p>The process for a <em>claimed</em> project is slightly different. The difference is
that PyPI administrators will choose to move the project from the
<em>recently-claimed</em> role to the <em>claimed</em> role. A project transaction process
MUST then add new <em>recently-claimed</em> and <em>claimed</em> metadata to reflect this
migration. As is the case for a <em>recently-claimed</em> project, the project
transaction process MUST always add all new targets and delegated targets
metadata for the <em>claimed</em> project. Finally, the project transaction process
MUST inform the consistent snapshot process about new <em>recently-claimed</em> or
<em>claimed</em> metadata as well as the current set of delegated targets metadata for
the project.</p>
<p>Project transaction processes SHOULD be automated, except when PyPI
administrators move a project from the <em>recently-claimed</em> role to the <em>claimed</em>
role. Project transaction processes MUST also be applied atomically: either
all metadata and targets, or none of them, are added. The project transaction
processes and consistent snapshot process SHOULD work concurrently. Finally,
project transaction processes SHOULD keep in memory the latest <em>claimed</em>,
<em>recently-claimed</em> and <em>unclaimed</em> metadata so that they will be correctly
updated in new consistent snapshots.</p>
<p>All project transactions MAY be placed in a single queue and processed
serially. Alternatively, the queue MAY be processed concurrently in order of
appearance provided that the following rules are observed:</p>
<ol class="arabic simple">
<li>No pair of project transaction processes must concurrently work on the same
project.</li>
<li>No pair of project transaction processes must concurrently work on
<em>unclaimed</em> projects that belong to the same delegated <em>unclaimed</em> targets
role.</li>
<li>No pair of project transaction processes must concurrently work on new
<em>recently-claimed</em> projects.</li>
<li>No pair of project transaction processes must concurrently work on new
<em>claimed</em> projects.</li>
<li>No project transaction process must work on a new <em>claimed</em> project while
another project transaction process is working on a new <em>recently-claimed</em>
project and vice versa.</li>
</ol>
<p>These rules MUST be observed so that metadata is not read from or written to
inconsistently.</p>
<p>The consistent snapshot process is fairly simple and SHOULD be automated. The
consistent snapshot process MUST keep in memory the latest working set of
<em>root</em>, <em>targets</em> and delegated targets metadata. Every minute or so, the
consistent snapshot process will sign for this latest working set. (Recall
that project transaction processes continuously inform the consistent snapshot
process about the latest delegated targets metadata in a concurrency-safe
manner. The consistent snapshot process will actually sign for a copy of the
latest working set while the actual latest working set in memory will be
updated with information continuously communicated by project transaction
processes.) Next, the consistent snapshot process MUST generate and sign new
<em>timestamp</em> metadata that will vouch for the <em>consistent-snapshot</em> metadata
generated in the previous step. Finally, the consistent snapshot process MUST
add new <em>timestamp</em> and <em>consistent-snapshot</em> metadata representing the latest
consistent snapshot.</p>
<p>A few implementation notes are now in order. So far, we have seen only that
new metadata and targets are added, but not that old metadata and targets are
removed. Practical constraints are such that eventually PyPI will run out of
disk space to produce a new consistent snapshot. In that case, PyPI MAY then
use something like a "mark-and-sweep" algorithm to delete sufficiently old
consistent snapshots: in order to preserve the latest consistent snapshot, PyPI
would walk objects beginning from the root (<em>timestamp</em>) of the latest
consistent snapshot, mark all visited objects, and delete all unmarked
objects. The last few consistent snapshots may be preserved in a similar
fashion. Deleting a consistent snapshot will cause clients to see nothing
thereafter but HTTP 404 responses to any request for a file in that consistent
snapshot. Clients SHOULD then retry their requests with the latest consistent
snapshot.</p>
<p>We do <strong>not</strong> consider updates to any consistent snapshot because <a class="reference external" href="https://en.wikipedia.org/wiki/Collision_(computer_science)">hash
collisions</a> <a class="footnote-reference" href="#id88" id="id94">[30]</a> are out of the scope of this PEP. In case a hash collision is
observed, PyPI MAY wish to check that the file being added is identical to the
file already stored. (Should a hash collision be observed, it is far more
likely the case that the file is identical rather than being a genuine
<a class="reference external" href="https://en.wikipedia.org/wiki/Collision_attack">collision attack</a> <a class="footnote-reference" href="#id95" id="id96">[33]</a>.) Otherwise, PyPI MAY either overwrite the existing file
or ignore any write operation to an existing file.</p>
<p>All clients, such as pip using the TUF protocol, MUST be modified to download
every metadata and target file (except for <em>timestamp</em> metadata) by including,
in the request for the file, the hash of the file in the filename. Following
the filename convention recommended earlier, a request for the file at
filename.ext will be transformed to the equivalent request for the file at
digest.filename.ext.</p>
<p>Finally, PyPI SHOULD use a <a class="reference external" href="https://en.wikipedia.org/wiki/Transaction_log">transaction log</a> <a class="footnote-reference" href="#id97" id="id98">[34]</a> to record project transaction
processes and queues so that it will be easier to recover from errors after a
server failure.</p>
</div>
<div class="section" id="metadata-validation">
<h2><a class="toc-backref" href="#id125">Metadata Validation</a></h2>
<p>A <em>claimed</em> or <em>recently-claimed</em> project will need to upload in its
transaction to PyPI not just targets (a simple index as well as distributions)
but also TUF metadata. The project MAY do so by uploading a ZIP file
containing two directories, /metadata/ (containing delegated targets metadata
files) and /targets/ (containing targets such as the project simple index and
distributions which are signed for by the delegated targets metadata).</p>
<p>Whenever the project uploads metadata or targets to PyPI, PyPI SHOULD check the
project TUF metadata for at least the following properties:</p>
<ul class="simple">
<li>A threshold number of the developers keys registered with PyPI by that
project MUST have signed for the delegated targets metadata file that
represents the "root" of targets for that project (e.g. metadata/targets/
project.txt).</li>
<li>The signatures of delegated targets metadata files MUST be valid.</li>
<li>The delegated targets metadata files MUST NOT be expired.</li>
<li>The delegated targets metadata MUST be consistent with the targets.</li>
<li>A delegator MUST NOT delegate targets that were not delegated to itself by
another delegator.</li>
<li>A delegatee MUST NOT sign for targets that were not delegated to itself by a
delegator.</li>
<li>Every file MUST contain a unique copy of its hash in its filename following
the digest.filename.ext convention recommended earlier.</li>
</ul>
<p>If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to
reject publishing any set of metadata or targets that do not meet these
requirements.</p>
<p>PyPI MUST enforce access control by ensuring that each project can only write
to the TUF metadata for which it is responsible. It MUST do so by ensuring
that project transaction processes write to the correct metadata as well as
correct locations within those metadata. For example, a project transaction
process for an <em>unclaimed</em> project MUST write to the correct target paths in
the correct delegated <em>unclaimed</em> metadata for the targets of the project.</p>
<p>On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects
in a backward-incompatible manner. Note that PyPI will NOT be able to
automatically rewrite existing TUF metadata on behalf of projects in order to
upgrade the metadata to the new backward-incompatible format because this would
invalidate the signatures of the metadata as signed by developer keys.
Instead, package managers SHOULD be written to recognize and handle multiple
incompatible versions of TUF metadata so that <em>claimed</em> and <em>recently-claimed</em>
projects could be offered a reasonable time to migrate their metadata to newer
but backward-incompatible formats.</p>
<p>The details of how each project manages its TUF metadata is beyond the scope of
this PEP.</p>
</div>
<div class="section" id="mirroring-protocol">
<h2><a class="toc-backref" href="#id126">Mirroring Protocol</a></h2>
<p>The mirroring protocol as described in <a class="reference external" href="http://www.python.org/dev/peps/pep-0381">PEP 381</a> <a class="footnote-reference" href="#id63" id="id37">[9]</a> SHOULD change to mirror
PyPI with TUF.</p>
<p>A mirror SHOULD have to maintain for its clients only one consistent snapshot
which would represent the latest consistent snapshot from PyPI known to the
mirror. The mirror would then serve all HTTP requests for metadata or targets
by simply reading directly from this consistent snapshot directory.</p>
<p>The mirroring protocol itself is fairly simple. The mirror would ask PyPI for
<em>timestamp</em> metadata from the latest consistent snapshot and proceed to copy
the entire consistent snapshot from the <em>timestamp</em> metadata onwards. If the
mirror encounters a failure to copy any metadata or target file while copying
the consistent snapshot, it SHOULD retrying resuming the copy of that
particular consistent snapshot. If PyPI has deleted that consistent snapshot,
then the mirror SHOULD delete the failed consistent snapshot and try
downloading the latest consistent snapshot instead.</p>
<p>The mirror SHOULD point users to a previous consistent snapshot directory while
it is copying the latest consistent snapshot from PyPI. Only after the latest
consistent snapshot has been completely copied SHOULD the mirror switch clients
to the latest consistent snapshot. The mirror MAY then delete the previous
consistent snapshot once it finds that no client is reading from the previous
consistent snapshot.</p>
<p>On the other hand, as mentioned earlier, the mirror MAY use extant file
transfer software such as <a class="reference external" href="https://rsync.samba.org/">rsync</a> <a class="footnote-reference" href="#id99" id="id100">[35]</a> to mirror PyPI. In that case, the mirror MUST
first obtain the last known <em>timestamp</em> metadata from PyPI. The mirror MUST NOT
immediately publish the last known <em>timestamp</em> metadata from PyPI. Instead,
the mirror MUST first iteratively transfer all new files from PyPI until there
are no new files left to transfer. Finally, the mirror MUST publish the last
known <em>timestamp</em> it fetched from PyPI so that package managers such as pip may
be directed to the latest consistent snapshot known to the mirror.</p>
<p>Even after this PEP is implemented, the main PyPI server will continue to
operate as a pure web service, exposing only HTTPS resources and the legacy
XML-RPC endpoints.</p>
<p>As Nick Coghlan has observed, since the TUF metadata are simply flat files, it
becomes feasible for a mirror to retrieve a consistent snapshot via the web
API, save it to disk and republish it via pure file system interfaces such as
FTP, NFS or rsync. A mirror could then copy PyPI with rsync via the method
outlined above. Since the <em>timestamp</em> metadata acts as the root defining the
consistent snapshot of interest, it would not matter should the actual rsync
operation add new files from new consistent snapshots to the mirror, because
the new files would not be described in the metadata tree anchored from the
last known timestamp metadata that was copied before the rsync operation
started. This is an improvement that this PEP provides as a side effect of how
consistent snapshots and TUF metadata work.</p>
</div>
<div class="section" id="backup-process">
<h2><a class="toc-backref" href="#id127">Backup Process</a></h2>
<p>In order to be able to safely restore from static snapshots later in the event
of a compromise, PyPI SHOULD maintain a small number of its own mirrors to copy
PyPI consistent snapshots according to some schedule. The mirroring protocol
can be used immediately for this purpose. The mirrors must be secured and
isolated such that they are responsible only for mirroring PyPI. The mirrors
can be checked against one another to detect accidental or malicious failures.</p>
</div>
<div class="section" id="metadata-expiry-times">
<h2><a class="toc-backref" href="#id128">Metadata Expiry Times</a></h2>
<p>The <em>root</em> and <em>targets</em> role metadata SHOULD expire in a year, because these
metadata files are expected to change very rarely.</p>
<p>The <em>claimed</em> role metadata SHOULD expire in three to six months, because this
metadata is expected to be refreshed in that time frame. This time frame was
chosen to induce an easier administration process for PyPI.</p>
<p>The <em>timestamp</em>, <em>consistent-snapshot</em>, <em>recently-claimed</em> and <em>unclaimed</em> role
metadata SHOULD expire in a day because a CDN or mirror SHOULD synchronize
itself with PyPI every day. Furthermore, this generous time frame also takes
into account client clocks that are highly skewed or adrift.</p>
<p>The expiry times for the delegated targets metadata of a project is beyond the
scope of this PEP.</p>
</div>
<div class="section" id="metadata-scalability">
<h2><a class="toc-backref" href="#id129">Metadata Scalability</a></h2>