After freshman calculus, it would be good
to start with abstract algebra. There
you will get handy with (relatively
simple versions of, and, thus, a good
place to start -- and, I have to say, put
you ahead of a surprisingly large fraction
of the best chaired professors of computer
science) theorems, proofs, sets, axioms,
groups (once I published a paper in
multi-variate, distribution-free
statistical hypothesis tests where the
core of the math used some group theory),
rings, fields, the integers, rationals,
reals, and complex numbers and their
leading properties, some important
algorithms, e.g., Euclidean greatest
common divisor (also the way to find
multiplicative inverses in the finite
field of the integers modulo a prime
number and, thus, the core of a super cute
way to do numerical matrix inversion
exactly using only short precision
arithmetic), number theory and prime
numbers (crucial in cryptography), vector
spaces (the core of multi-variate
statistics and more), and some of the
classic results. Some of this material is
finite mathematics at times of high
interest in computing -- e.g., error
correcting coding. For such a course, a
good teaching math department would be
good. Have a good prof read and correct
your early efforts at writing proofs --
could help you a lot.
But starting with linear algebra is also
good. There are lots of good books. The
grand classic, best as a second text, is
P. Halmos, Finite Dimensional Vector
Spaces. He wrote this in 1942 after he
had gotten his Ph.D. from J. Doob (long
the best guy in the US in stochastic
processes) and was an assistant to von
Neumann at the Institute for Advanced
Study (for much of the 20th century a good
candidate for the best mathematician in
the world). The book is really a finite
dimensional introduction to von Neumann's
Hilbert space theory.
von Neumann is the guy on the right. The
guy on the left is S. Ulam (has a cute
result the French mathematician LeCam once
called tightness I used once). The guy
in the middle is just a physicist! Of
course, in that picture they were working
up ways to save 1 million US casualties in
the Pacific in WWII and were astoundingly
successful. Ulam is best known for the
Teller-Ulam configuration which in its
first test yielded an energy of 15 million
tons of TNT. There are rumors that von
Neumann worked out the geometry for the US
W-88, 475 kilotons in a small package as
at
Von Neumann also has a really nice book on
quantum mechanics the first half of which
is a totally sweetheart introduction to
linear algebra.
Of course, Ulam was an early user of Monte
Carlo simulation, still important.
Other linear algebra authors include G.
Strang, E. Nering, Hoffman and Kunze, R.
Bellman, B. Noble, R. Horn. Also for
numerical linear algebra, e.g., G.
Forsythe and C. Moler, the LINPACK
materials, etc. There are free, on-line
PDF versions for some of these. Since the
subject has not changed much since Halmos
in 1942, don't necessarily need the latest
paper copy at $100+!
For statistics, that is a messy field. It
has too many introductory texts that over
simplify the subject and not enough well
done intermediate or advanced texts.
Also the subject has essentially a lie:
They explain that a random variable has a
distribution. Right, it does. Then they
mention some common distributions,
especially Gaussian, exponential, Poisson,
multinomial, and uniform. Then the lie:
The suggestion is that in practice we
collect data and try to find the
distribution. Nope: Mostly not. Mostly
in practice, we can't find the
distribution, not even of one random
variable and much less likely for the
joint distribution of several random
variables (that is, of a vector valued
random variable). Or, to estimate the
distribution of a vector valued random
variable commonly would encounter the
curse of dimensionality and require
really big big data. Instead, usually
we use limit theorems, techniques that
don't need the distribution, or in some
cases make, say, a Gaussian assumption and
get a first-cut approximation.
Early in my career I did a lot in applied
statistics but later concluded I'd done a
lot of slogging through a muddy swamp of
low grade material.
A clean and powerful first cut approach to
statistics is just via a good background
in probability: With this approach, for
statistics, you take some data, regard
that as values of some random variables
with some useful properties, stuff the
data into some computations, and get out
data that you regard as the values of some
more random variables which are the
statistics. The big deal is what
properties the output random variables
have -- maybe they are unbiased, minimum
variance, Gaussian, maximum likelihood,
estimates of something, etc.
For this work you will want to know the
classic limit theorems of probability
theory -- weak and strong laws of large
numbers, elementary and advanced
(Lindeberg-Feller) versions of the central
limit theorem, the law of the iterated
logarithm (and its astounding application
to an envelope of Brownian motion), and
martingales and the martingale convergence
theorem ("the most powerful limit theorem
in mathematics" -- it's possible to have
making applications of that result much of
a successful academic career). And,
generally beyond the elementary statistics
books, you will want to understand
sufficient statistics (and the
astounding fact that, for the Gaussian,
sample mean and variance are sufficient
with generalizations to the exponential
family) and, also, U-statistics where
the order of the input data makes no
difference (and order statistics are
always sufficient). Sufficient statistics
is really from (a classic paper by Halmos
and Savage and) the Radon-Nikodym theorem
(with a famous, very clever, cute proof by
von Neumann), and that result is in, say,
the first half of W. Rudin, Real and
Complex Analysis (with von Neumann's
proof).
Also with the Radon-Nikodym theorem, can
quickly do the Hahn decomposition and,
then, knock off a very general proof of
the Neyman-Pearson result in statistics.
How 'bout that!
Thus, to some extent to do well with
statistics, both for now and for the
future, especially if you want to do some
work that is original, you will need much
of the rest of a good ugrad major in math
and the courses of a Master's in selected
topics in pure/applied math.
So, for such study, sure, at one time
Harvard's famous Math 55 used the Halmos
text above along with W. Rudin,
Principles of Mathematical Analysis
(calculus done very carefully and a good
foundation for more), and Spivak,
Calculus on Manifolds, e.g., for people
interested in more modern approaches to
relativity theory (but Cartan's book is
available in English now). It may be that
you are not interested in relativity
theory or the rest of mathematical physics
-- fine, and that can help you set aside
some topics.
Then, Royden, Real Analysis and the
first half of Rudin's R&CA as above,
along with any of several alternatives,
cover measure theory and the beginnings of
functional analysis. Measure theory does
calculus again and in a more powerful way
-- in freshman calculus, want to integrate
a continuous function defined on a closed
interval of finite length, but in measure
theory get much more generality.
And measure theory also provides the
axiomatic foundation for modern
probability theory and of random
variables. Seeing that definition of a
random variable is a real eye opener, for
me a life-changing event: Get a level of
understanding of randomness that cuts
out and tosses into the dumpster or bit
bucket nearly all the elementary and
popular (horribly confused) treatments of
randomness.
Functional analysis? Well, in linear
algebra you get comfortable with vector
spaces. So, for positive integer n and
the set of real numbers R, you get happy
in the n-dimensional vector space R^n.
But, also be sure to see the axioms of a
vector space where R^n is just the leading
example. You want the axioms right away
for, say, the (affine) vector subspace of
R^n that is the set of all solutions of a
system of linear equations. How 'bout
that!
Then in functional analysis, you work
with functions and where each function is
regarded as a point in a vector space.
The nicest such vector space is Hilbert
space which has an inner product
(essentially the same as angle or in
probability covariance and in statistics
correlation) and gives a metric in which
the space is complete -- that is, as in
the real numbers but not in the rationals,
a sequence that appears to converge really
has something to converge to. Then wonder
of wonders (really, mostly due just to the
Minkowski inequality), the set of all real
valued random variables X such that the
expectation (measure theory integral)
E[X^2] is finite is a Hilbert space,
right, is complete. Amazing, but true.
Then in Hilbert space, get to see how to
approximate one function by others. So,
in particular, get to see how to
approximate a random variable don't have
by ones you do have -- might call that
statistical estimation and would be
correct.
Then can drag out the Hahn-Banach result
and do projections, that is, least
squares, that is, in an important sense
(from a classic random variable
convergence result you should be sure to
learn), best possible linear
approximations. And maybe such an
approximation is the ad targeting that
makes you the most money.
So, that projection is a baby version of
regression analysis. There's a problem
here: The usual treatments of regression
analysis make a long list of assumptions
that look essentially impossible in
practice to verify or satisfy and, thus,
leave one with what look like unjustified
applications.
Nope: Just do the derivations yourself
with fewer assumptions and get fewer
results but still often enough in
practice. And they are still solid
results.
For the usual text derivations, by
assuming so much, they get much more,
especially lots of confidence intervals.
In practice often you can use those
confidence interval results as first-cut,
rough measures of goodness of fit or
some such.
But the idea of just a projection can
give you a lot. In particular there is an
easy, sweetheart way around the onerous,
hideous, hated over fitting -- it seems
silly that having too much data hurts, and
it shouldn't hurt and doesn't have to!
And the now popular practice in machine
learning of just fitting with learning
data and then verifying with test
data, with some more considerations which
are also appropriate, can also be solid
with even fewer assumptions.
Fabulous write up. Do you somewhere besides here to publish it so it's easier for people to wander across in the future? I'd be interested to see what else you've written.
After freshman calculus, it would be good to start with abstract algebra. There you will get handy with (relatively simple versions of, and, thus, a good place to start -- and, I have to say, put you ahead of a surprisingly large fraction of the best chaired professors of computer science) theorems, proofs, sets, axioms, groups (once I published a paper in multi-variate, distribution-free statistical hypothesis tests where the core of the math used some group theory), rings, fields, the integers, rationals, reals, and complex numbers and their leading properties, some important algorithms, e.g., Euclidean greatest common divisor (also the way to find multiplicative inverses in the finite field of the integers modulo a prime number and, thus, the core of a super cute way to do numerical matrix inversion exactly using only short precision arithmetic), number theory and prime numbers (crucial in cryptography), vector spaces (the core of multi-variate statistics and more), and some of the classic results. Some of this material is finite mathematics at times of high interest in computing -- e.g., error correcting coding. For such a course, a good teaching math department would be good. Have a good prof read and correct your early efforts at writing proofs -- could help you a lot.
But starting with linear algebra is also good. There are lots of good books. The grand classic, best as a second text, is P. Halmos, Finite Dimensional Vector Spaces. He wrote this in 1942 after he had gotten his Ph.D. from J. Doob (long the best guy in the US in stochastic processes) and was an assistant to von Neumann at the Institute for Advanced Study (for much of the 20th century a good candidate for the best mathematician in the world). The book is really a finite dimensional introduction to von Neumann's Hilbert space theory.
In
http://www-history.mcs.st-and.ac.uk/BigPictures/Ulam_Feynman...
von Neumann is the guy on the right. The guy on the left is S. Ulam (has a cute result the French mathematician LeCam once called tightness I used once). The guy in the middle is just a physicist! Of course, in that picture they were working up ways to save 1 million US casualties in the Pacific in WWII and were astoundingly successful. Ulam is best known for the Teller-Ulam configuration which in its first test yielded an energy of 15 million tons of TNT. There are rumors that von Neumann worked out the geometry for the US W-88, 475 kilotons in a small package as at
http://en.wikipedia.org/wiki/W88
Von Neumann also has a really nice book on quantum mechanics the first half of which is a totally sweetheart introduction to linear algebra.
Of course, Ulam was an early user of Monte Carlo simulation, still important.
Other linear algebra authors include G. Strang, E. Nering, Hoffman and Kunze, R. Bellman, B. Noble, R. Horn. Also for numerical linear algebra, e.g., G. Forsythe and C. Moler, the LINPACK materials, etc. There are free, on-line PDF versions for some of these. Since the subject has not changed much since Halmos in 1942, don't necessarily need the latest paper copy at $100+!