Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Part II -- More Specific

After freshman calculus, it would be good to start with abstract algebra. There you will get handy with (relatively simple versions of, and, thus, a good place to start -- and, I have to say, put you ahead of a surprisingly large fraction of the best chaired professors of computer science) theorems, proofs, sets, axioms, groups (once I published a paper in multi-variate, distribution-free statistical hypothesis tests where the core of the math used some group theory), rings, fields, the integers, rationals, reals, and complex numbers and their leading properties, some important algorithms, e.g., Euclidean greatest common divisor (also the way to find multiplicative inverses in the finite field of the integers modulo a prime number and, thus, the core of a super cute way to do numerical matrix inversion exactly using only short precision arithmetic), number theory and prime numbers (crucial in cryptography), vector spaces (the core of multi-variate statistics and more), and some of the classic results. Some of this material is finite mathematics at times of high interest in computing -- e.g., error correcting coding. For such a course, a good teaching math department would be good. Have a good prof read and correct your early efforts at writing proofs -- could help you a lot.

But starting with linear algebra is also good. There are lots of good books. The grand classic, best as a second text, is P. Halmos, Finite Dimensional Vector Spaces. He wrote this in 1942 after he had gotten his Ph.D. from J. Doob (long the best guy in the US in stochastic processes) and was an assistant to von Neumann at the Institute for Advanced Study (for much of the 20th century a good candidate for the best mathematician in the world). The book is really a finite dimensional introduction to von Neumann's Hilbert space theory.

In

http://www-history.mcs.st-and.ac.uk/BigPictures/Ulam_Feynman...

von Neumann is the guy on the right. The guy on the left is S. Ulam (has a cute result the French mathematician LeCam once called tightness I used once). The guy in the middle is just a physicist! Of course, in that picture they were working up ways to save 1 million US casualties in the Pacific in WWII and were astoundingly successful. Ulam is best known for the Teller-Ulam configuration which in its first test yielded an energy of 15 million tons of TNT. There are rumors that von Neumann worked out the geometry for the US W-88, 475 kilotons in a small package as at

http://en.wikipedia.org/wiki/W88

Von Neumann also has a really nice book on quantum mechanics the first half of which is a totally sweetheart introduction to linear algebra.

Of course, Ulam was an early user of Monte Carlo simulation, still important.

Other linear algebra authors include G. Strang, E. Nering, Hoffman and Kunze, R. Bellman, B. Noble, R. Horn. Also for numerical linear algebra, e.g., G. Forsythe and C. Moler, the LINPACK materials, etc. There are free, on-line PDF versions for some of these. Since the subject has not changed much since Halmos in 1942, don't necessarily need the latest paper copy at $100+!



Part III -- Statistics

For statistics, that is a messy field. It has too many introductory texts that over simplify the subject and not enough well done intermediate or advanced texts.

Also the subject has essentially a lie: They explain that a random variable has a distribution. Right, it does. Then they mention some common distributions, especially Gaussian, exponential, Poisson, multinomial, and uniform. Then the lie: The suggestion is that in practice we collect data and try to find the distribution. Nope: Mostly not. Mostly in practice, we can't find the distribution, not even of one random variable and much less likely for the joint distribution of several random variables (that is, of a vector valued random variable). Or, to estimate the distribution of a vector valued random variable commonly would encounter the curse of dimensionality and require really big big data. Instead, usually we use limit theorems, techniques that don't need the distribution, or in some cases make, say, a Gaussian assumption and get a first-cut approximation.

Early in my career I did a lot in applied statistics but later concluded I'd done a lot of slogging through a muddy swamp of low grade material.

A clean and powerful first cut approach to statistics is just via a good background in probability: With this approach, for statistics, you take some data, regard that as values of some random variables with some useful properties, stuff the data into some computations, and get out data that you regard as the values of some more random variables which are the statistics. The big deal is what properties the output random variables have -- maybe they are unbiased, minimum variance, Gaussian, maximum likelihood, estimates of something, etc.

For this work you will want to know the classic limit theorems of probability theory -- weak and strong laws of large numbers, elementary and advanced (Lindeberg-Feller) versions of the central limit theorem, the law of the iterated logarithm (and its astounding application to an envelope of Brownian motion), and martingales and the martingale convergence theorem ("the most powerful limit theorem in mathematics" -- it's possible to have making applications of that result much of a successful academic career). And, generally beyond the elementary statistics books, you will want to understand sufficient statistics (and the astounding fact that, for the Gaussian, sample mean and variance are sufficient with generalizations to the exponential family) and, also, U-statistics where the order of the input data makes no difference (and order statistics are always sufficient). Sufficient statistics is really from (a classic paper by Halmos and Savage and) the Radon-Nikodym theorem (with a famous, very clever, cute proof by von Neumann), and that result is in, say, the first half of W. Rudin, Real and Complex Analysis (with von Neumann's proof).

Also with the Radon-Nikodym theorem, can quickly do the Hahn decomposition and, then, knock off a very general proof of the Neyman-Pearson result in statistics. How 'bout that!

Thus, to some extent to do well with statistics, both for now and for the future, especially if you want to do some work that is original, you will need much of the rest of a good ugrad major in math and the courses of a Master's in selected topics in pure/applied math.

So, for such study, sure, at one time Harvard's famous Math 55 used the Halmos text above along with W. Rudin, Principles of Mathematical Analysis (calculus done very carefully and a good foundation for more), and Spivak, Calculus on Manifolds, e.g., for people interested in more modern approaches to relativity theory (but Cartan's book is available in English now). It may be that you are not interested in relativity theory or the rest of mathematical physics -- fine, and that can help you set aside some topics.

Then, Royden, Real Analysis and the first half of Rudin's R&CA as above, along with any of several alternatives, cover measure theory and the beginnings of functional analysis. Measure theory does calculus again and in a more powerful way -- in freshman calculus, want to integrate a continuous function defined on a closed interval of finite length, but in measure theory get much more generality.

And measure theory also provides the axiomatic foundation for modern probability theory and of random variables. Seeing that definition of a random variable is a real eye opener, for me a life-changing event: Get a level of understanding of randomness that cuts out and tosses into the dumpster or bit bucket nearly all the elementary and popular (horribly confused) treatments of randomness.

Functional analysis? Well, in linear algebra you get comfortable with vector spaces. So, for positive integer n and the set of real numbers R, you get happy in the n-dimensional vector space R^n. But, also be sure to see the axioms of a vector space where R^n is just the leading example. You want the axioms right away for, say, the (affine) vector subspace of R^n that is the set of all solutions of a system of linear equations. How 'bout that!

Then in functional analysis, you work with functions and where each function is regarded as a point in a vector space. The nicest such vector space is Hilbert space which has an inner product (essentially the same as angle or in probability covariance and in statistics correlation) and gives a metric in which the space is complete -- that is, as in the real numbers but not in the rationals, a sequence that appears to converge really has something to converge to. Then wonder of wonders (really, mostly due just to the Minkowski inequality), the set of all real valued random variables X such that the expectation (measure theory integral) E[X^2] is finite is a Hilbert space, right, is complete. Amazing, but true.

Then in Hilbert space, get to see how to approximate one function by others. So, in particular, get to see how to approximate a random variable don't have by ones you do have -- might call that statistical estimation and would be correct.

Then can drag out the Hahn-Banach result and do projections, that is, least squares, that is, in an important sense (from a classic random variable convergence result you should be sure to learn), best possible linear approximations. And maybe such an approximation is the ad targeting that makes you the most money.

So, that projection is a baby version of regression analysis. There's a problem here: The usual treatments of regression analysis make a long list of assumptions that look essentially impossible in practice to verify or satisfy and, thus, leave one with what look like unjustified applications.

Nope: Just do the derivations yourself with fewer assumptions and get fewer results but still often enough in practice. And they are still solid results.

For the usual text derivations, by assuming so much, they get much more, especially lots of confidence intervals. In practice often you can use those confidence interval results as first-cut, rough measures of goodness of fit or some such.

But the idea of just a projection can give you a lot. In particular there is an easy, sweetheart way around the onerous, hideous, hated over fitting -- it seems silly that having too much data hurts, and it shouldn't hurt and doesn't have to!

And the now popular practice in machine learning of just fitting with learning data and then verifying with test data, with some more considerations which are also appropriate, can also be solid with even fewer assumptions.

Go for it!


Fabulous write up. Do you somewhere besides here to publish it so it's easier for people to wander across in the future? I'd be interested to see what else you've written.


> Fabulous

All praise welcome!

> I'd be interested to see what else you've written.

For some more math, more technical,

https://news.ycombinator.com/item?id=8919311




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: