In Part 2, we finished defining all the formal terms and symbols you see in the StackOverflow question on the Hindley-Milner algorithm, so now we’re ready to translate what that question was asking about, namely the rules for deducing statements about type inference. Let’s get down to it!
In Part 1, we said what the building blocks of the Hindley-Milner formalization would be, and in this post we’ll thoroughly define them, and actually formulate the formalization:
Formalizing the concept of an expression
We’ll give a recursive definition of what an expression is; in other words, we’ll state what the most basic kind of expression is, we’ll say how to create new, more complex expressions out of existing expressions, and we’ll say that only things made in this way are valid expressions.
Mike Sierchio wrote a cool post on password strength, and the concept of entropy. As he points out, entropy isn’t always entropy. That confusion is apparently not uncommon, as it’s been asked about on IT Security Stack Exchange as well. So what’s really going on?
Let’s step back for a sec and fill in some context. What are we trying to do? We’d like some way to measure how hard it is to guess our passwords, a number that serves as a heuristic standard of password strength. But there are two fundamentally different things we might want to measure:
How hard would it be for someone to guess your password with essentially no knowledge of how you created your password?
How hard would it be for someone to guess your password if they knew the process used to generate it? This is of course assuming that there is a process, for example some script that does some
Math.rand-ing and produces a password string.
The term “entropy” has been used to refer to both kinds of calculations, but they’re clearly entirely different things: the former essentially takes a string as input, the latter takes a random process as input. Hence, “entropy is not entropy.”
Alright, well if entropy isn’t entropy, let’s see what entropies are. We’ll look at the standard mathematical formulation of the random-process-entropy which comes from information theory. And we’ll look at the function used to calculate particular-string-entropy in one of the most popular password strength testers. And that’s all we’re going to do, we’ll look at how the calculations are done, without dwelling too much on the differences between the two approaches or what their use cases are.
I’ve been learning a bit of statistical computing with R lately on the side from Chris Paciorek’s Berkeley course. I just got introduced to knitr and it’s damned sweet! It’s an R package which takes a LaTeX file with embedded R, and produces a pure LaTeX file (similar to how Rails renders an
.html.erb file into an
.html file), where the resulting LaTeX file has the output of the R code. It makes it super easy to embed statistical calculations, graphs, and all the good stuff R gives you right into your TeX files. It let’s you put math in your math, so you can math while you math.
I’ve got a little project which:
- Runs a Python script which will use Selenium to scrape a web page for 2012 NFL passing statistics.
- “Knits” a TeX file with embedded R that cleans the raw scraped data, produces a histogram of touchown passes for teams, and displays the teams with the least and greatest number of touchdowns.
- Compiles the resulting TeX file and opens the resulting PDF.
- Cleans up any temporary work files. Continue reading
This question came up as a joke during a team standup a few months ago. Although the obvious answer is “no,” if you’re willing to play fast and loose with your metaphysics for a bit, the answer can be “yes” and there’s a cute solution that ties together binary numbers and binary trees. This post itself is a bit of a joke in that it’s just for fun, but it might be nice to see the familiar concepts of binary numbers and binary trees in a new light.
The obvious answer is “no”
Let’s quickly see why the real life answer is “no.” But first we should lay out the assumptions implicit in the problem. We’re going to assume that at some point in time, everyone was either entirely Swedish or entirely non-Swedish. There’s a chicken-and-egg problem that we’re sweeping under the rug here, but that’s what rugs are for. Next we’re assuming that every person after that point in time has their Swedishness wholly and equally determined by their parents Swedishness. So if mom is 17% Swedish and dad is 66% Swedish, then baby is ½ x 17% + ½ x 66% = 41.5% Swedish.
Before we figure out what it means, let’s get an idea for why we care in the first place. Daniel Spiewak’s blog post gives a really nice explanation of the purpose of the HM algorithm, in addition to an in-depth example of its application:
Functionally speaking, Hindley-Milner (or “Damas-Milner”) is an algorithm for inferring value types based on use. It literally formalizes the intuition that a type can be deduced by the functionality it supports.
Okay, so we want to formalize an algorithm for inferring types of any given expression. In this post, I’m going to touch on what it means to formalize something, then describe the building blocks of the HM formalization. In Part 2, I’ll flesh out the building blocks of the formalization. Finally in Part 3, I’ll translate that StackOverflow question.
Whether you’re learning math for pleasure or profit (jumping on the Big Data bandwagon), there are times when it may seem intimidating, overwhelming, confounding, etc. My assertion is that if you think like a programmer, you already have a leg up when it comes to learning math.
I gave today’s Tech Talk at Pivotal Labs SF on this very topic. The outline of the talk is listed below, and these are the slides. I’ll be following up with a series of mini blog posts extracting the contents of the talk.
- Program-y translations of math notation
- Why is math hard to read? Conventions!
- Translation 1: Mathematical functions (lines, sinusoids)
- Translation 2: Sigma notation, and other indexed operations
- Translation 3: Set notation and quantifiers
- Resources for learning on your own
- Program-y proof that the infinity of the reals is bigger than the infinity of the naturals (yes, there are different sizes of infinity)
- Groundwork: The real numbers
- Groundwork: Cardinality, the math way to say “how many”
- Groundwork: Proof by contradiction, the math way to say “when pigs fly”
- The proof that |R| > |N| for programmers (read: finitists)