-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathnotes.tex
1911 lines (1500 loc) · 72.6 KB
/
notes.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4, 12pt, english, USenglish]{scrreprt}
% \usepackage{venn}
\usepackage[latin1]{inputenc}
\usepackage{makeidx}
% \usepackage{pdftricks}
\usepackage{graphicx}
% \usepackage[final]{pdfpages}
\usepackage{geometry, upgreek, booktabs, babel}
\usepackage[journal=rsc,xspace=true]{chemstyle}
\usepackage[version=3]{mhchem}
% \usepackage[footnotes]{notes2bib}
\usepackage[final]{microtype}
\usepackage[final, inactive]{pst-pdf}
\usepackage[colorlinks]{hyperref}
% equals with a "set" on top
\newcommand{\defeq}{\ensuremath{\stackrel{\mbox{set}{=}}}}
\renewcommand{\topfraction}{.85}
\renewcommand{\bottomfraction}{.7}
\renewcommand{\textfraction}{.15}
\renewcommand{\floatpagefraction}{.66}
\renewcommand{\dbltopfraction}{.66}
\renewcommand{\dblfloatpagefraction}{.66}
\setcounter{topnumber}{9}
\setcounter{bottomnumber}{9}
\setcounter{totalnumber}{20}
\setcounter{dbltopnumber}{9}
\newcommand{\xscreenshot}[2]{
\begin{figure}[htb]
\begin{center}
\em Missing imagefile file #1
\end{center}
\label{#1}
\caption{#2}
\end{figure}}
\newcommand{\zcreenshot}[3]{
\begin{figure}[htb]
\includegraphics[width=#3]{screenshots/#1.jpg}
\label{#1}
\caption{#2}
\end{figure}}
\newcommand{\screenshot}[2]{
\begin{figure}[htb]
\includegraphics[width=150mm]{screenshots/#1.jpg}
\label{#1}
\caption{#2}
\end{figure}}
\newcommand{\sscreenshot}[3]{
\begin{figure}[htb]
\includegraphics[width=7500mm]{screenshots/#1.jpg}
\label{#2}
\caption{#3}
\end{figure}}
% XXX Should put a little arrow above its parameter.
\newcommand{\vectorXX}[1]{\ensuremath{#1}}
\newcommand{\fApartial}[1]{\ensuremath{\frac{\partial f}{\partial A_{#1}}}}
\newcommand{\jpartial}[1]{\ensuremath{\frac{\partial J}{\partial \theta_{#1}}}}
\newcommand{\thetaipartial}{\ensuremath{\frac{\partial}{\partial{\theta_i}}}}
\newcommand{\thetapartial}{\ensuremath{\frac{\partial}{\partial{\theta}}}}
\newcommand{\half}{\ensuremath{\frac{1}{2}}}
\newcommand{\sumim}{\ensuremath{\sum_{i=1}^{m}}}
\newcommand{\intinf}{\ensuremath{\int_{-\infty}^{\infty}}}
\newcommand{\Ft}{\ensuremath{{\cal{F}}}}
\newcommand{\ft}[1]{\ensuremath{{\cal{F}}({#1})}}
\newcommand{\sinc}[1]{\ensuremath{\mbox{sinc}{#1}}}
\newcommand{\bb}[1]{\ensuremath{{\bf{#1}}}} % Should be blackboard bold
\newcommand{\proj}[2]{\ensuremath{{\bb {#1}}_{#2}}}
\newcommand{\braces}[1]{\ensuremath{\left\{{#1}\right\}}}
\newcommand{\brackets}[1]{\ensuremath{\left[{#1}\right]}}
\newcommand{\abrackets}[1]{\ensuremath{\left<{#1}\right>}}
\newcommand{\parens}[1]{\ensuremath{\left({#1}\right)}}
\newcommand{\absval}[1]{\ensuremath{\left|{#1}\right|}}
\newcommand{\sqbraces}[1]{\ensuremath{\left[{#1}\right]}}
\newcommand{\commutator}[2]{\sqbraces{{#1}, {#2}}}
\newcommand{\kofn}[2]{\ensuremath{\parens{
\begin{array}{c}
{#2}\\
{#1}
\end{array}
}}}
\newcommand{\dyad}[1]{\ensuremath{\ket{{#1}}\bra{{#1}}}}
\newcommand{\trace}[1]{\ensuremath{\mbox{tr}\, {#1} }}
\newcommand{\erf}[1]{\mbox{erf}\left(#1\right)}
\newcommand{\erfc}[1]{\mbox{erfc}\left(#1\right)}
\newcommand{\mXXX}[1]{\marginpar{\tiny{\bf Rmz:} {\it #1}}}
\newcommand{\celcius}{\ensuremath{^\circ}C}
\newcommand{\ev}[1]{\ensuremath{\left\langle{}#1{}\right\rangle}}
\newcommand{\ket}[1]{\ensuremath{\mid{}#1{}\rangle}}
\newcommand{\bra}[1]{\ensuremath{\langle{}#1{}\mid}}
\newcommand{\braKet}[2]{\ensuremath{\left\langle{}#1{}\mid{#2}\right\rangle}}
\newcommand{\BraKet}[3]{\ensuremath{\left\langle{}#1{}\mid{#2}\mid{#3}\right\rangle}}
\newcommand{\evolvesto}[2]{\ensuremath{{#1}\mapsto{#2}}}
\newcommand{\inrange}[3]{\ensuremath{{#1} \inx \braces{{#2}, \ldots,{#3}}}}
\newenvironment{wikipedia}[1]
{
{\bf From wikipedia: {\it #1}}
\begin{quote}
}
{
\end{quote}
}
\newcommand{\idx}[1]{{\em #1}\index{#1}}
\newcommand{\idX}[1]{{#1}\index{#1}}
\usepackage{url}
\newcommand{\tm}{\ensuremath{^{\mbox{tm}}}}
\newcommand{\aangstrom}{\AA{}ngstr\"{o}m{}\ }
%\newcommand{\aaunit}{\mbox{\AA}} % Just use A with ring, once encoding works properly
\newcommand{\aaunit}{\angstrom} % Just use A with ring, once encoding works properly
\newcommand{\munchen}{M\"unchen}
\newcommand{\zurich}{Z\"urich}
\newcommand{\schrodinger}{Schr\"odinger}
\newcommand{\ReneJustHauy}{Ren\'e-Just Ha\"uy}
%% Lavousier (with a lot fo weird spelling)
%% Crystallographic notation
%Coordinate
\newcommand{\crCoord}[3]{\mbox{\(#1,#2,#3\)}}
%Direction
\newcommand{\crDir}[3]{\mbox{\(\left[#1 #2 #3\right]\)}}
%Family of directions
\newcommand{\crDirfam}[3]{\mbox{\(\left<{}#1 #2 #3\right>\)}}
%Plane
\newcommand{\crPlane}[3]{\mbox{\(\left(#1 #2 #3\right)\)}}
%Family of planes
\newcommand{\crPlanefam}[3]{\left\{#1 #2 #3\right\}}
\newcommand{\oneCol}[2]{
\ensuremath{\left(\begin{array}{r}{#1}\\{#2}\end{array}\right)}
}
\newcommand{\twoCol}[4]{
\ensuremath{\left(\begin{array}{rr}{#1}&{#2}\\{#3}&{#4}\end{array}\right)}
}
%Negative number
\newcommand{\crNeg}[1]{\bar{#1}}
\makeindex
\begin{document}
\title{Lecture notes from the course \\
Statistical Mechanics (PHY 29)\\
taught by \\
Leonard Susskind \\
Spring 2009}
\author{Bj\o{}rn Remseth \\ [email protected]}
\maketitle
\tableofcontents
% Comment out this in final version!
% \parskip=\bigskipamount
% \parindent=0pt.
\begin{abstract}
\end{abstract}
\chapter*{Introduction}
These are my notes for the course in Statistical Mechanics (Stanford U.) taught by
Leonard Susskind.
I usually watched the videos while typing notes in \LaTeX. I have
experimented with various note-taking techniques including free text,
mindmaps and handwritten notes, but I've ended up using \LaTeX, since
it's not too hard, it gives great readability for the math that
inevitably pops up in the things I like to take notes about, and it's
easy to include various types of graphics. Also, it fits nicely into
the rest of the set of tools I use to follow these lectures: More
often than not I'm on a train during may daily commute. My
handwriting is bad on any given day, but when combined with a bumpy
train it's totally unreadable, even by me. However, having one window
with Emacs, another with \LaTeX, and a screengrabber program nearby,
it is easy to get in ``flow'' and stay there while producing notes
that are possible to read. It's nice :-) The graphics in this
document is exclusively screenshots copied directly out of the videos,
and to a large extent, but not completely, the text is based on
Susskind's narrative. I haven't been very creative, that wasn't my
purpose. I did take more screenshots than are actually available in
this text. Some of them are indicated in figures stating that a
screenshot is missing. I may or may not get back to putting these
missing screenshots back in, but for now the are just not there. Deal
with it .-)
This document will every now and then be made available on
\url{http://dl.dropbox.com/u/187726/statistical-mechanics-notes.pdf}. The
source code can be cloned on git on \url{https://github.com/la3lma/statistical-mechanics}.
A word of warning: These are just my notes. They should't be
interpreted as anything else. I take notes as an aid for myself.
When I take notes I find myself spending more time with the subject at
hand, and that alone lets me remember it better. I can also refer to
the notes, and since I've written them myself, I usually find what
I'm looking for ;). I state this clearly since the use of \LaTeX\ will
give some typographical cues that may lead the unwary reader to
believe that this is a textbook or something more ambitious. It's
not. This is a learning tool for me. If anyone else reads this and
find it useful, that's nice. I'm happy, for you, but I didn't have
that, or you in mind when writing this. That said, if you have any
suggestions to make the text or presentation better, please let me
know. My email address is [email protected].
\chapter{Getting started with thermodynamics}
\section{Dynamic systems}
The first lecture starts with a longish quote, so I'll quote it in
full:
\begin{quote}\it
Statistical mechanics is often thought of as the theory of how atoms
comine to form gases liquids solids and even plasmas and black body
radiation. But it is both much more and less than thhat.
Statistical mechanics is a useful tools in many areas of science where
a large number of variables has to be dealt with using statistical
methods.
\end{quote}
Here he breaks from reading the quote to interject: ``My son who
studies neural networks uses, in fact about six months ago he called
me up and said ``pop, did you ever hear about this thing called the
partition function and I'm just learning about it for using it in
Neural Networks.''
\begin{quote} \it
I have no doubt that some of the financial wizards of \idx{AIG} and \idx{Lehman
brothers} used it. Saying that statistical mechanics is the theory of
gases is rather like saying that calculus is the theory of planetary orbits.
\end{quote}
What it really is is a mathematical structure with
application. Putting it in a nutshell, one can perhaps say that
statistical mechanics is just probability theory. Now Susskind has
never understood the difference between statistics and probabilities.
It is probabilities under certain specific circumstanses.
It is however a bit tricky to say actually how statistical mechanics
really connects to reality. \mXXX{ This is perhaps just a way of
saying that statistical mechanics is a purely empirical branch of
physics :-)}
Let's start with coinflipping (fair coin, equal probabilities of heads
and tails that sums to one). A convincing argument that leads to the
fairness, is that coins are fairly symmetrical (apart from some small
details).
There is a notion of a-priori probability, in this case because it is
a symmetry.
Let's take another example, take a dice. There a are six sides, that
are named after colors (r, y, b, g, o, p). The six sides of the die
are symmetric. There is a symmetry operation of turning the die 90
degrees about any axis. If you believe that the die is symmetric
enough, then you are forced to believe that the probability of getting
any one of the color is 1/6. However in most situations there aren't
such symmetries.
When there aren't such symmetries, is there starting point where you
can start thinking about such probabilities? The answer is ``not
obviously'', but let's look at an example. Let's consider a die where
we replace the purple color by red.
\screenshot{purplered}{Purple side of die replaced by red color}
It now have five colors. You have no reason to believe there is no
symmetry. So what is the probability of flipping a red, so if you
didn't know better you would perhaps say that the probability of
flipping red was 1/5 but in fact it is (of course) 2/6 (= 1/3). The
reason is that the real symmetry of the system acts on the six faces
not on the five colors. But it is very easy to assume that a die
doesn't have any symmetry at all, that is weighted and off balance
(unfair die). And then, where would you get your a priori
probabilities from. Well, one way would be to flip it a zillion times
and then count how many of the different colors you got, but that's
not we're gonna do. So somewhere else we have to get the idea of a
prior probabilities. We might go back astep and say ``{\em if the die is
not a fair die, then the probabilities may depend on all kinds of
things, it may depend on details such as the way the hand flips it,
the air currents in the room, the way the surface it may or may not
bounce off that are extraneous to the system itself and of course it
may be depend on the environment and not the system itself}''. So
let's introduce one element: Let's think of the die as a dynamical
system that changes with time according to some law of motion. If you
know what the system is at one instance in time, you will know what it
is at the next instance in time. Now, with the motion of particles
the motion is smooth and you can divide it into infinitesimal amounts
of time. For a die it is (perhaps) a bit difference. Assume that the
die performs one operation per \idx{elementary} time interval, and
at each time the die rearranges itself only depending on what is shown
at the top of the die. You can then represent the motion of a die
as a rule:
\screenshot{dynamictheory}{Purple side of die replaced by red color.}
\[
\begin{array}{lcl}
R \rightarrow B \\
B \rightarrow Y\\
Y \rightarrow G\\
O \rightarrow P\\
P \rightarrow R\\
\end{array}
\]
That would be a complete dynamical theory for colors of the dice.
Now assuming that the state succession is very fast, then it is very
clear that there will be an equal probabilty of any one of the six
colors, since they spend the same amount of time being in the same
state.
If you change some of the colors, each state would still occupy 1/6 of
the time, so there are many possible laws. However, there are other
laws that don't give an answer altogether, here is an example:
\[
\begin{array}{lcl}
R \rightarrow B \\
B \rightarrow G\\
Y \rightarrow P\\
P \rightarrow O\\
O \rightarrow Y\\
\end{array}
\]
\subsection{Conserved quantities}
In every state of the system the system says what the system should do
but it has the funny property of having two orbits or cycles. Now,
there is no way a priori to konw which cycle you are in. So this is a
counterexample that there is equal a priori probability of being in a
state. However, the two-cycle example has something called a
preserved state. Let's call it Z (``zilch''). It's 1 for the RGB orbit, and
and 0 for the YPO orbit. The zilch is preserved, this is a
\idx{conservation law}. So in this case, and indeed in any case a
conservation law means that the system breaks up into different orbits
with each orbit representing some \idx{conserved quantity}.
You now have two possibilities, either you fix the zilch, you then
have equal probabilities of the different states with the preserved
\idx{quantum number} (zilch), and the other possibility is you only may know
statistical information about the states then you use that.
In physics, or at least in thermodynamics, energy is the most
important conserved quantity. Momentum isn't so important. The
reason is that when thinking about systems contained within
containers, and in statistical mechanics that's what we usually do.
When a molecule bump into the container it gives a little momentum,
but it's not so important. Electric charge can be important. Angular
momentum usually don't matter that much.
Now a simple rule could be that you take all the conserved quantities
and fix them, then you study the system subject to the constraint that
the conserved quantities have certain values. That's the essence of
statistical mechanics: Calculating probabilities of things happening
subject to constraints that usually take form that some (one or more)
conserved quantity is fixed.
Using two dice we can think up an example. We now number the sides
from 1 to 6. The rule is now that the dice interact in the sense that
when one flip the other flips. They flip in such a way that the sum
of the numbers don't change. Each number has a cycle associated with
it, so (1,1) must go to (1,1). The 3 cycle flips between (1,2) and
(2,1) and so forth. There are a bunch of disconnected orbits. Once
you fix the sum, you can throw away all the others and concentrate on
the subsystem you're interesting in.
The most important conserved quantity is energy so that is where most
of our energy will be conserved. In chemistry there are more, for
instance the number of instances of atoms ofthe various elements are
conserved. In nuclear physics there are other conserved quantities.
But let us go a step back and look at information. Information plays a
very great role. Let's look at a much worse rule of movement than the
ones above.
\[
\begin{array}{lcl}
R \rightarrow R \\
B \rightarrow R \\
B \rightarrow R\\
O \rightarrow R\\
P \rightarrow R \\
\end{array}
\]
This is a perfectly good deterministic law. There are no conserved
quantities here. It's certainly true that over reasonable lengths of
time the most likely thing you'll find is red. You'll simply find
nothing else after a while. However it is not true that that there is
equal a priori probabilities of any colors. This system lacks one
property that real systems of movement has that Susskind calls
``\idx{conservation of information}'', but you could equally well call it
the ``\idx{conservation of distinctions}''. It would mean that distinctions
don't merge. Because the paths merge information is lost. This is
irreversibility, but it is not thermodynamical irreversibility. The
rules of statistical mechanics is fundamentally based on the fact that
physics at the deepest level that the laws are consistent with the
conservation of information (distinction). This is a strong
restriction, without it we'll get nowhere. It is a good physical
assumption and it is a consequence of a basic assumption of classical
mechanics (\idx{Liouville's theorem}). There is a \idx{quantum mechanical}
version (\idx{unitarity}) but we'll not do much using QM in this course.
\subsection{Conditions for applying thermodynamic theory}
The classical world is fully deterministic, but it apparently
statistical because it is coupled to a much larger system (a heat
bath) about which you know very little. You don't know enough about
the heatbath to specify the details of it. Things are random not
because there is any intrinsic randomness in the laws of physics, but
simply because you don't know enough. That's the basic idea of
entropy.
This principle of conservation of distinction is so important. It is
rarely mentioned because it is so deeply assumed by anyone that does
classical physics. Susskind would call it the zeroeth law of
thermodynamics, except that that name is used by something else, so
perhaps we should call it the -1th law of thermodynamics.
Classical mechanics deal with continous systems with momenta. Each
coordinate has associated with a set of momenta. Of course there is
deeper idea about the idea of momentum but for the purpose of this
class it can be just ordinary momentum.
So what is the state of a system? Well, for a simple die it was just
the label color. If it's two dice, it is a pair of colors. For a
single point particle, it is a collection of coordinates and the
corresponding momenta. The historical symbol for momentum is ``p''
and for coordinate is ``x''. For an ordinary particle it would be
three coordinates for position and three for momentum, so the phase
space for a particle is six dimensional, and the configuration space
(the position coordinates) is three dimensional.
What constitutes the state is the pair of the x and the p, this is
because if you want to know where a particle is next, you not only
need to know where it is, but you have to know where it is moving.
So for a single particle the (x,p) is like the color of the surface of
the dice, it labels the state of a particle. For a many-particle
system there are many x-es and many p-s.
But first, what is a history? What is an orbit? You have some
starting point, and then you have laws of motion, Newton's or others,
and then you use them to follow the particle in time. Little
``ticks'' that are of some length. The motion is of course continous
but you can divide it up. You will then get the trajectory in the
phase space. There are diffent types of trajectories. In some
systems you just give a particle a litte push and it goes of into
infinity, but for the kinds of systems we are interested in that are
contained within finite boxes, there is a rule that the the orbit will
usualy wind up coming back or something, perhaps not the same point ;-)
What doesn't happen is that trajectories never, ever merges. This is
the analog of saying that distinctions are preserved
\begin{itemize}
\item Trajectories never cross.
\item What doesn't happen is that trajectories never, ever merges. This is
the analog of saying that distinctions are preserved
\item A given starting point never splits. It is always
determinstic. \mXXX{ this should just be two points}
\end{itemize}
The different trajectories, whatever they do, may be labelled by their
energy, and that energy doesn't change. You pick it once and for all
and it stays that way forever.
Consider an \idx{harmonic oscillator}\mXXX{When physicists don't know
what to do, they often look at how the harmonic oscillator will work
in whatever conditions are being studied.}. The motion in phase
space is just an ellipsis. However, there are many trajectores, and
what distinguishes them is the energy of the oscillator and where you
started them. They don't cross, but that's a general principle.
\subsection{Liouville's theorem}
Never crossing and never merging is not quite enough for us, because
there is a situation that is almost as bad as merging, and that is
where the trajectories don't actually merge, but come so closely
together that they get asymptotically closer. For practical purposes
you would then lose distinction (just wait long enough), but that
doesn't happen. Trajectories don't merge in that way either. There
is a theorem that states this (\idx{Liouville's theorem}). It says.
\screenshot{liouville}{An illustration of Liouville's theorem.}
\begin{quote}
If you start with all of the points in a patch of phase space, at time
t=0, and follow each one of them for a certain length of time, the
volume of the patch of phase space doesn't change. \footnote{The unit
of momentum is called ``\idx{action}'', and for a three dimension
particle, the action is of dimension \(l^3p^3\), so when we're
talking about ``volumes of phasespace'', it is volums over this
type of unit.}
\end{quote}
So if the points contract in one direction then they spread out in
another. This is another way to say that distinctions are not lost.
\subsection{Ergodicity}
Now an important point: Can it happen that the volume of this space
stays conserved, but it branches out in some horrible, fractallated
way so that it apparantly fills up the alloted volume in the
phasespace. The answer is ``yes'', and it usually happens. However, it
preseves it topology, no merges etc.
Liouville's theorem can be proved with a starting point either in the
\idx{Lagrangian} form of classical mechanics, the \idx{Hamiltonian} form or the
principle of least action. It all traces back to the \idx{principle of
least action}.
If we have a system that is enclosed so that it doesn't escape into
infinity, and it shares the property with there are no merges and
splittings etc., then if the system moves throught he phase space fast
enough (or you wait long enough), then there is equal a priori
probability (under the constraint of conserved energy, leading to
surfaces in phase-space), are uniform in
the phase space. \mXXX{Wow!} Each volume of the phase space has equal
a priori probability.
A sidenote about \idx{ergodicity}. \idx{Ergotic} is a bit related to ``chaotic'',
it means that the phase point wanders about thoroughly througout the
phasespace and pretty much touches every point int he phase space. In
the above, Susskind is assuming that the system is ergodic. When a
system is {\em not} ergodic it means there are extra conserved quantities:
When a system is not ergodic that means that the phase space divides
up into different pieces which carries different conserved quantities.
Then you have to pick a value of the conserved quantity.
\screenshot{harmonicoscilatorphasespace}{The phase space of the
harmonit oscillator (to the right), orbits separated by energy
levels.}
If a path in the phasespace doesn't touch every point in the
phasespace, it means that there are conserved quantities. For example
for the harmonic oscillator the phase space points stay on a given
ellipse. The time average for each point on every point on the
ellipse will be uniform for the single ellipse subspace.
{\em Comment from the audience}: Now, you can map the points of the
unit circle to a circle with area two, and that does not preserve area
but it is one to one. Susskind response that what we're looking at is
more than one to one, it is phase-space volume preserving. The basic
justification for the principle of equal voume in phase space really
is quantum mechanical. Statistical mechanics as it is delt with in
this class is fully classical so he's relucant to go to much into it,
but if we were to do it we would divide volume into small volumes that
are of \(\hbar\) length on the sides, defined with the maximum amount
of certainty allowable and take it from there. But we wont. However,
it is only i quantum mechanics it is true that the number of states in
a system is discrete and that you can actually count them and make it
more like the ``loopy'' system described earlier.
\section{Entropy}
Now let's get to \idx{entropy}. We've discussed energy. \idx{Energy}
is a conserved quantity, we fix it for a system, and within a
perticular value of energy the system may hop around the states it is
allowed to be in, and you then get the average probabilitiy that the
system spend in any given state is equal for all states (average over
the statespace).
Now \idx{entropy}. You might have thought that the next topic in
thermodynamics would be \idx{temperature} at you might think temperature is
more intuitive than entropy, but it's really not. You have of course
a body sense of what temperature is, but you don't have a sense of
what hot and cold is. However, it is \idx{energy} and \idx{entropy} that are
the more basic concepts so let's talk about that, but before we do
that let's talk about probability distributions.
\screenshot{probabilitydistribution}{Just this probability
distribution from somewhere}
It has a horizontal axis for the states, and a vertical axis for the
probabilities. It may be discrete or continous. The only two
requirements for something to be a probability distribution is that it
is positive everywhere and that it sums (or integrates) to one. For
integrals we talk about probability densities.
In the discrete case, if there is some function of i (the system
state), some quantity \(F\)
that depend on the system. What is the average value of \(F\) (we'll use
the standard fysicist notation for averages and put a bracket around
it: \(\ev{F}\) ). The definition of the average:
\[
\ev{F} = \sum_i F(i) P(i)
\]
Just sum the values, and weight them by their probabilities.
Now keep in mind that the space of states can be multidimensional, but
they are enumerated by the index. Suppose that the only thing we know
about the system is that is in one of ``\(m\)'' possible states. What is
the probability of being anywhere else than in one of the m states is
zero. The probability of being in one of the m states is uniformly
\(1/m\). Btw, saying ``nothing but'', that is another way of saying
``equally probably'' that they are in any one of the ``\(m\)''. A
statement of complete ignorance is equivalent to a statement of equal
probabilities.
The ``\(m\)'' is a measure of our ignorance. The bigger ``\(m\)'', the bigger
our ignorance. There are many ways of we can be ignorant: The state
may be very small (microscopic) and there may be very many of them.
All you know is some restrictive information. ``\(m\)'' is not the only
statement of ignorence. Any monotonically increasing measure of \(m\).
The one called ``entropy'' is the logarithm of \(m\), called \(S\) (for
entroy, Susskind don't know why)
\[
\begin{array}{lccl}
S &=& &\log m \\
S&=& - &\log (1/m) \\
\end{array}
\]
The basis of the logarithm can differ a bit between branches of
science. In Physics it is always ``e'', but in information theory it
is usually ``2''. The conversion factor is just a factor \(\log(2)\).
Why take the logarithm? because it's useful :-) Let's look at an
example: Assume that you have N coins each of which can be heads or
tails. Supposing you know nothing about the system, each of the \(2^N\)
configurations are equally probably, so the entropy is \(N\cdot log(2)\).
Since the entropy is proportional to the number of coins, it makes
sense to talk about entropy per coin. By taking the logarithm, we
change the description from something really large, to something that
is additive with respect to the number of components in the system.
You have something that is proportional to the degrees of
freedom. Entropy like energy is something that adds up for a system.
Entropy is measured in bits.
Supposing we know everything about the system, in this case exactly
which configuration we're in. That would mean that ne of the states
has probabiity 1 and all the other has zero. That would give an
entropy of \(zero =\log(1)\). With this definition of entropy, entropy
isn't just defined by the system, but your state of knowledge about
it. That is a bit rediculous, because we will treat is as a quantity
of the system (we'll come to why), but really it's a function of the
probability distribution.
\screenshot{entropycontinous}{Entropy based on a general probability distribution}
Now let's find to some other definition. Consider a probability
that has some width, close to zero (but not zero) and close to one
(but not 1) at various places. Can we generalize the the number of
states under the probability distribution (the logarithm of it). The
definition is simply the average of the \(-\ev{log(P(x))}\) (negative
since \(P < 1\) so \(\log(P) < 0\). How do we calculate that? We just use the
formula for averages.
\[
\begin{array}{lccl}
\ev{F(i)} &=& &\sum F(i)P(i) \\
S &=& -&\sum P(i) log(P(i))
\end{array}
\]
This is the final definition of entropy. The contributions from the
probable regions are more important than the others.
\idx{Bolzmann's constant} traces back to the definition of
temperature. Energy and entropy together determine temperature. In
the early days physicists, steam engineers etc, were interested in
temperature they really didn't know what temperature was. They knew
how to measure it (with thermometers). They didn't even have the idea
of an absolute zero temperature. The temperature of a gas is a
measure of the energy of the individual molecules of the gas (more or
less). The conversion factor between entropy and temperature wasn't
known. Bolzmann (and Maxwell) realized for an ideal gas energy is a
measurement of the energy of the molecules. The conversion factor was
unknown basically because they didn't know how many molecules there
were in a volume of gas. Bolzmann's constant was unknown. Today we
know, so Bolzmann's constant is known. We can, if we like, work with
units of temperature which is really just units of energy. We can
measure temperature in Joules. That's just a historical fact.
\idx{Bolzmann} did not know the value of his own constant. Bolzmann
was so depressed by the fact that he didn't know his own constant that
he committed suicide, and the next year Susskind believe
\idx{Einstein} figured out what Bolzmann's constant was from the
brownian motion (Susskind then says he don't know why Bolzmann
committed suicide, but believes it wasn' because he didn't know his
own constant). \idx{Newton} btw. didn't know his own constant either.
It took many years for Newton's constant to be measured. \idx{Planck}
knew his constant ;) One of the hard constants to measure was the
electric charge. It was easy to measure the ration of charge to mass,
then it took some time before \idx{Milliken} figured out how to
measure the charge separate from the mass. Whoever figured out that
electrons had charge didn't know the value of the charge (it may have
been \idx{Benjamin Franklin}, Susskind thinks).
\subsection{Thermal equilibrium}
Let us now postulate the existence of something called ``\idx{thermal
equilibrium}'' of a system. Let's state a necessary, but not
necessarily sufficient condition: Thermal equilibrium is not a
property of an isolated system. If you have a truely isolated system,
it is {\em not} in thermal equilibrium. It has an energy, and it's fixed.
Thermal equilibrium is a property of a system \(A\) in contact with a much
bigger system \(B\), called the ``\idx{heat bath}''. It is a very big system
with many more degrees of freedom. The combined system \((A+B)\) has a
given total energy (that's an assumption). For our purposes the
bigger system can be thought of as a closed and isolated system. It
is either contained within insulating walls that don't allow any heat
energy in or out, or something similar. \(A+B\) can be an isolated system
but \(A\) is not. One more thing. \(A+B\) is isolated, but neither A nor B is
isolated. They weakly interact. Thies meahns that the energy of the
interaction are very small compared to the energy of either \(A\) and \(B\),
but the interactions, whatever they are, allow energy to go back and
forth between \(A\) and \(B\). The whole system has a definite energy, so
whole system has a definite value of ``zilch'', (``zilch'' being
energy in this case). The whole system will move around its
phasespace on a surface of constant energy, but it will move around
and hop from pint to point. Among other things, the points will give
different ways of partitioning the given amount of energy into the
energy of the heatbath of the energy of \(A\). Neither of these energies
will have a definite predictable value. It will fluctuate, it will
have a \idx{probability distribution}.
\subsection{The relation between energy, temperature and entropy}
\screenshot{energyofaovertime}{Time evolution of the energy of A}
If you wait long enough there will be a probability of the energy of
\(A\). The various configurations of \(A\) may be discrete or
continous, but over time the energy will have a time evolution. The
probability of an energy of \(A\) is a function of both the energy of
\(A\) and the average energy of \(A\).
\[
P(i, E_{\mbox{total}})
\]
There are two things you can calculate if the probability
distribution. You can calculate the average energy of \(A\), and you can
calculate the entropy of \(A\). (\(S_A = \ev{\log P_A}\)).
\[
T = \frac{\partial{E}}{\partial{S}} \log 2
\]
In general entropy increases with energy, which leads to the question:
``How much change of energy is necessary to change the entropy by an
amount of one bit''? This change of energy per unit of entropy, is
called ``\idx{temperature}''. To state it colorfully: {\em Temperature is {\idx the energy
needed to hide one more bit of information}}.
An example of this is \idx{Landauers principle} in computing: If you want
to erase a bit of information in a computer, remembering that we can't
destroy information (principle of conservation of distinction), you
wind up putting at least that bit into the heatbath surrounding the computer.
How much energy do you put out of the and into the heatbath to hide a
bit, well it's the temperature times the logarithm of two. That is
the energy you put into the environment when you erase a bit.
\idx{Bolzmann's constant} incidentally is basicically the inverse of
\idx{Avogadros number} (or something very close to it), this means that when
you change a bit of information at a temperature \(T\) (which is the
temperature of the surroundings).
The standard thermodynamic defintion of entropy also differs by the
standard definition of entropy by a factor of \(b_k\) (upstairs or
downstairs, Susskind can't remember :-).
Einstein realized that the quantities of thermodynamics fluctuated.
This would have consequences on small impurities of the system and
knock them around. What he did was essentially to calculate the
fluctuations using statistical mechanics to various quantities like
pressure and so on and demonstrated how that would knock small
impurities about in the system.
Next time we will work out and derive the Bolzmann distiribution. The
Bolzmann distribution is the probability distribrution for the states
in the system \(A\).
\chapter{Finding Bolzmann's constant}
We'll start this presentation by reviewing some math we are going to
need. It's nothing fancy. Just stuff many of us has had in college or
high school, but we'll need it so we review it and then we'll use it.
\section{Lagrange multipliers}
We're constantly minimizing or maximizing things subject to
constraints. The thing we're most likely to maximize or minimize is
entropy, but we'll get to that later. \idx{Thermal equilibrium} is a
state of \idx{maximum entropy} so that means that we want to
calculate the maximum as a variable of the functions that define it.
Then there are \idx{constraints}. We don't just want to maximize a
function, we want to do so subject to some constraints. Some
plausible constraints are: That we know the total energy of the
system, or the average energy of the system. For example we might
have a container full of a lot of molecules, we'll study one small
piece of it. If we're told what the energy of a part is. We know what
the average energy is, and we want to {\em maximize the entropy
subject to the constraint that total energy is fixed}. That is the
kind of problem that occurs over and over. A related problem could be
that we know the total amount of electric charge in a region, and we
may want to maximize the total amount of entropy given a total amount
of electric charge, or the total amount of angular momentum or
whatever.
\screenshot{contourofxsquaredysquared}{Contour plot of \(x^2 +y^2\) and \(x + 2y = 1\)}
We constantly face problems of maximization and minimization of
several variables subject to constraints on those variables. What
does that mean? Let's take an example: We want to minimize \(x^2 +
y^2\). Now anyone can minimize that, it's of course \(x=y=0)\). But
suppose we add a constraint and say that we want the minimum, give
that \(x + 2y = 1\). To draw a picture we can draw the contours of \(x^2 +
y^2\). Right at the center it is minimized, and as we move away the
value gets larger. If we then draw the \(x + 2y = 1\) (figure
\ref{contourofxsquaredysquared}). By inspection it's not so hard to
see that the minimum must be where the straight line is closest to the
origin. How do you find that point? One of the ways to do that would
be to solve the linear equation with respect to one of the variables,
plug that into the quadratic formula and solve. Doing that you get
\(x=1 - 2y\) which when substituted into the quadratic formula gives
\((1 - 4y + 4y^2) + y^2 = 1 - 4y + 5 y^2\). Taking the derivative of
this gives \(10y - 4\) which is equal to zero at \(y = 2/5\), by
substituting into the straight line we find to be \(x + 2\cdot
2/5 = 1\) which makes \(x = 1/5\), so the minimum is at the point
\((1/5, 2/5)\). That's one way of doing it, but in many cases the
constraint is just to complicated to solve. It might be a very
complicated function. However, there is another way of doing it, and
that is called the method of \idx{Lagrange multipliers}.
The rule for Lagrange multipliers is: Given that you have some
function in some variables, e.g. \(F(x,y)\) and we want to minimize
it subject to the constraint that some other function \(G(x,y) = 0\).
The trick you take is to multiply the constraint by a new variable
often called \(\lambda\) (lambda), called an \idx{lagrange
multiplier}. You then add that multiplier timest the constrant to
the function you want to minimize, and get a new function:
\[
F(x, y) + \lambda G(x,y) = 0
\]
What we then do is to minimize the new function, ignoring the
constraint. You minimize by differentiating with respect to the
variables and you set the result to zero, so we get a set of equations:
\[
\begin{array}{lclcl}
\frac{\partial{F}}{\partial{x}} &+& \lambda \frac{\partial{G}}{\partial{x}} &=& 0 \\
\frac{\partial{F}}{\partial{y}} &+& \lambda \frac{\partial{G}}{\partial{y}} &=& 0 \\
\end{array}
\]
So we have two equations and two unknowns so we can solve it for \(x\) and
\(y\), but what about the \(\lambda\)? The answer will depend on
\(\lambda\) . What we do with the \(\lambda\) is to adjust it so that
the original constraint \(G(x,y) = 0\) is really satisfied.
We'll see {\em how} it works in the example we solved above, then we'll see
{\em why} it works.
\subsection{How it works}
We assume that \(F(x,y) = x^2 + y^2\) and we want to minimize that
subject to the constraint that \(G(x,y) = x + 2y -1 = 0 \). We follow
the procedure above and get that we concoct the function:
\[
Z (x,y) = F(x, y) + \lambda G(x,y)
\]
Then we find the partial derivatives and set them to zero:
\[
\begin{array}{lclcl}
\frac{\partial{F}}{\partial{x}} &+& \lambda\frac{\partial{G}}{\partial{x}} &=& 0 \\
2x &+& \lambda &=&0\\
&& x &=& - \lambda/2\\
\end{array}
\]
and
\[
\begin{array}{lclcl}
\frac{\partial{F}}{\partial{y}} &+& \lambda \frac{\partial{G}}{\partial{y}} &=& 0 \\
2y &+& \lambda 2 &=& 0\\
&& y&=& - \lambda\\
\end{array}
\]
We now choose \(\lambda\) so that the constraint \(x + 2y -1 = 0)\) is
satisfied. We do this by substituting the solutions for \(x)\) and
\(y\) in terms of \(\lambda\) and get \(-\lambda/2 - 2\lambda -1 = -
-\lambda 5/2 -1 = 0\) which implies that \(\lambda = -2/5\). We can
then use this value of \(\lambda\), substitute back, and get that
\(x=1/5\) and \(y = 2/5\), which is exactly the same result as we got
above, as it should be :-)
\subsection{Why it works}
It only takes a little bit of algebra to show why this works. This is
in fact a very general function, the constraint didn't have to be
linear, and the function to optimize on certainly didn't have to be quadratic.
We'll proceed by solving a general problem both with substitutions and
the Lagrange multiplier method.
\subsubsection{Using substitutions}
We start with a function \(F(x,y)\) subject to some constraint
\(G(x,y) = 0\), and we want to minimize \(F\). We can go about this
by solving \(G\) with respect to one of the variables, so we get
some curve e.g. \(y(x)\). We then plug that into \(F\), so that we
get \(F(x,y(x)\). This is now a function of a single variable \(x\)
since \(y\) is now a definite function of \(x\). The next is to
differentiate with respect to \(x\) to find the minimum with respect
to \(x\):
\[
\begin{array}{lcl}
F(x,y)&=& F(x,y(x)) \\
D(F(x,y)) = \frac{\partial F}{\partial x} + \frac{\partial F}{\partial y}\frac{d y}{d x}
\end{array}
\]
\screenshot{smalldifferentialonG}{A small differential on G does not change G}
We then set \(D(F(x,y)) = 0\). But what about \(\frac{d y}{d x}\),
let's see what we can figure out about it. The curve \(G(x,y)\)) is a
curve of ``constant \(G\)''. Let's assume we're moving along it and
make small change along it, by definition this will not change
\(G\). So, we can also write:
\[
\frac{\partial G}{\partial x} dx + \frac{\partial G}{\partial y} dy = 0
\]
this says ``G doesn't change when we make a small differential
displacement''. The reason we did this that it allows us to solve for
the \(\frac{dy}{dx}\) in terms of things involving the constraint
\(G\):
\[
\frac{dy}{dx} = \frac{-\frac{\partial G}{\partial x}}{\frac{\partial{G}}{\partial y}}
\]
Now we plug that in into the equation above and get:
\[
\begin{array}{lclcl}