The Product Rule

At some point in every calculus class, we must discover and prove the product rule for derivatives. How a calculus teacher chooses to do this probably says a lot about their pedagogy and educational priorities.

Some teachers might simply write the rule on the board, expect students to accept it, and immediately launch into examples. Should we try to let the students discover the formula on their own? Should we perhaps lead them into a trap by suggesting that the derivative of a product of two functions is the product of the derivatives and let them find counterexamples? Should we state the theorem, but let the students try to prove it on their own? Should we perhaps have an entire mini-lesson on what it even means to have a product of two functions?

Should we try to motivate the entire discussion with a particularly intuitive pair of functions whose product has some real-world significance? Should we interpret the product of two functions geometrically, as the area of the corresponding rectangle? If properly motivated and explained, do we actually gain anything by doing the rigorous proof via limits?

The Status Quo

As a foil, here is the introduction to and proof of the product rule from the textbook that I teach out of.

product rule

I understand that textbooks have limited space and are no substitute for a full curriculum, but I think we can all agree that this is awful. There is no motivating example and no geometric intuition is called upon. The author merely proves the theorem, dryly and without understanding or purpose. The author even admits that the proof is unsatisfying and unedifying and apologizes in advance for its opaque maneuvers! Some proofs involve “clever steps that may appear unmotivated to a reader”.

In other words, reader, I am clever and you are not. This proof crucially involves cleverness, but since you’re not clever, you never would have thought of it yourself. I will perform some algebraic manipulations here in blue — they may appear unmotivated to you, but that’s your fault. In fact, I haven’t motivated them at all, but I don’t need to explain my clever methods to you. This is a calculus textbook after all, not a motivational textbook on explaining one’s cleverness. I have proved the rule, what else do you want me to do? If you want meaning and understanding, please consult your local religious figures for guidance.

Can we do better? Yes, I think we can. My friend James Key and I have used the phrase “tyranny of the blue text” to refer to totally opaque and unmotivated algebraic moves in textbook math proofs, since the offending expressions are often rendered in blue. Proving an important theorem to students via seemingly arbitrary, unmotivated algebraic tricks is an intellectual crime, and we should endeavor to banish the tyranny of the blue text from our classrooms and from our consciousness.

Idea #1: A Word Problem

Suppose a particular factory produces toys 24 hours a day.

Let W(t) be a continuous model of the number of workers at the factory at time t. The value of this function fluctuates throughout the day as workers leave and arrive according to their various particular schedules.

Let E(t) be a continuous model of the number of toys produced per worker per hour at time t. This function measures the overall efficiency of the factory at a particular time of day. This could reasonably be expected to fluctuate due to external factors like the electricity supply, the weather (solar panels!), or the tiredness of the workers.

Then (WE)(t) = W(t) \cdot E(t) is the total rate at which the factory produces toys, measured in toys per hour, at a particular time t.

W'(t) is the rate of change of W with respect to t, in other words the rate at which the workforce at the factory is rising or falling, as workers leave and arrive.

E'(t) is the rate of change of E with respect to t, in other words the rate at which the efficiency of the factory (on a per worker basis) is changing at a particular time t.

(WE)'(t) is the rate at which the factory’s output is changing, at a given time t. In other words, if (WE)'(t) is positive and big, the factory’s output is increasing a lot at that moment, but if (WE)'(t) is positive and small, the factory’s output is increasing only a little at that time t.

Using our own common sense, what should (WE)'(t) depend on? Surely W'(t) is relevant, since even if efficiency holds steady, if workers are pouring into the factory at time t, the factory’s output will go up. But surely E'(t) is also relevant, since even if the workforce holds steady, if the workers are becoming more efficient, then the factory’s overall output will go up. But the current size of the workforce, W(t), is also relevant, since if, for example, efficiency is going up but the current workforce is very small, those gains in efficiency will not translate into large increases in output. And the current efficiency, E(t), is also relevant, since if, for example, workers are pouring into the factory, but the current toy production per worker per hour is very small, then those extra workers will also not translate into large increases in output.

Just by having these conversations, we prime our students to have a deep appreciation of what the product rule is about, what differentiation is about, why we would ever want to multiply two functions, and why we would ever want to learn calculus.

This year, when giving this exact introduction to the product rule, I had a student guess the product rule right there on the spot, just from talking out the logic of the toy factory.

Idea #2: A Geometric Interpretation

Some calculus textbooks motivate the product rule geometrically, by interpreting the product of two functions as the area of the rectangle whose side lengths are the values of the two functions at a given time.

prod rule #1

This sloppy picture is taken from a presentation I gave at an NCTM conference a few years ago about calculus proofs. The area of the rectangle with side lengths f and g represents the value of the fg function at a particular time. A moment later, both f and g change, and the derivative wants to measure the size of the change. Here again, we can read the product rule directly off the diagram. A “proof” like this was probably totally sufficient to a mathematician of the 18th century, but in a post-Cauchy/Weierstrass world, we need to verify these intuitions via the definition of the derivative as a limit.

But we can hold onto our geometric intuition and have our rigor as well!

prod rule #2

The same diagram can be used to interpret that mysterious numerator in the definition of the derivative and avoid the tyranny of the blue text. The diagram motivates, but the rigor is preserved, since the limit just under the rectangle can be expanded and verified to be equivalent to the limit just above the diagram. But this time we are doing the proof with meaning and understanding.

Teaching the product rule this way might even be considered “standard”. The only drawback is that you do kind of have to be clever to think to do all this! Would a student come up with the idea to make a rectangle on their own? I’m not sure. I don’t claim to be the first or the only one to use a rectangle to discover, motivate, and even prove the product rule, but the following is something I have never seen anywhere before and that I just came up with a month ago. It is the excuse to write this blog post.

Idea #3: Start with a particular example

prodx2

I’m getting a bit tired, so I pasted this picture in. The idea is simple. Combine a function with a known derivative and a generic second function f. And then just try to find the derivative of the product. No understanding or geometric intuition is required, but no teacher help or input is probably required either.

A reasonable calculus student who is confident, good at algebra, and experienced with limits and computing derivatives from the definition should be able to get to the end by just doing what comes natural.

I have not tried this before in class, so I can’t say how well it will work. But if it works, then the students have proved themselves to be every bit as clever as is required, and they’ve done it on their own. The teacher can then add extra layers of understanding to the general phenomenon of the product rule and lead the class through the general proof, possibly using geometry as a guide. But the students who figured out this particular example will feel that they could have done the general case on their own. And they will be right.

Advertisements

Area models for multiplication throughout the K-12 curriculum

Let’s take a look at area models, shall we?

My thesis today is that area models should be ubiquitous across the entire curriculum because mathematics is a sense making discipline. As math educators, we ought to encourage our students to take every opportunity to visualize their mathematics in an effort to illuminate, explain, prove, and bring intuition.

So let’s take a walk through the K-12 math curriculum and highlight the use of area models as they might apply to arithmeticalgebra, and calculus.

base-ten-blocks

Arithmetic

Students experience area models for the first time in elementary school as they work to visualize multi-digit multiplication. This can also be used for division as well, just running the logic in reverse–that is, seeking an unknown “side length” rather than an unknown area. And Base Ten Blocks can be used to help students understand the building blocks of our number system.

Here’s how you might work out 27\times 54:

27\times 54 = (20+7)(50+4)=(20)(50)+(20)(4)+(7)(50)+(7)(4)

area-model-multiplication

27\times 54=1000+80+350+28=1458

The advantage of using a visual model like this is that you can easily see your calculation and explain why constituent calculations, taken together, faithfully produce the desired result. If you do a “man on the street” interview with most users or purveyors of the standard algorithm, you would almost certainly not get crystal clear explanations for why it produces results. For a further discussion of area models for multi-digit multiplication, see this article, or read Jo Boaler’s now famous book Mathematical Mindsets.

Algebra

In middle school, as students first encounter algebra, they may use area models to support their algebraic reasoning around multiplying polynomials. And in an Algebra 2 course they may learn about polynomial division and support their thinking using an area model in the same way they used area models to do division in elementary school. Here Algebra Tiles can be used as physical manipulatives to support student learning.

Here’s how you might work out (x+4)(2x+3):

(x+4)(2x+3)=(x)(2x)+(x)(3)+(4)(2x)+(4)(3)

area-model-polynomials

(x+4)(2x+3)=2x^2+3x+8x+12=2x^2+11x+12

Notice also that if you let x=10, you obtain the following result from arithmetic:

14\times 23 = 200+110+12=322

The Common Core places special emphasis on making such connections. I agree with this effort, even though I can also commiserate with fellow math teachers who say things like, “My Precalculus students still use the box method for multiplying polynomials!” We definitely want to move our students toward fluency, but perhaps we should wait for them to realize that they don’t need their visual models. Eventually most students figure out on their own that it would be more efficient to do without the models.

Calculus

Later in high school, as students first study calculus, area models can be used to bring understanding to the Product Rule–a result that is often memorized without any understanding. Even the usual “textbook proof” justifies but does not illuminate.

Here’s an informal proof of the Product Rule using an area model:

The “change in” the quantity L\cdot W can be thought of as the change in the area of a rectangle with side lengths L and W. That is, let A=LW. As we change L and W by amounts \Delta L and \Delta W, we are wondering how the overall area changes (that is, what is \Delta A?).

If the side length L increases by \Delta L, the new side length is L+\Delta L. Similarly, the width is now W+\Delta W. It follows that the new area is:

A+\Delta A=(L+\Delta L)(W+\Delta W)=LW+L\Delta W+W\Delta L+\Delta L\Delta W

area-model-product-rule

Keeping in mind that A=LW, we can subtract this quantity from both sides to obtain:

\Delta A=L\Delta W+W\Delta L+\Delta L\Delta W

Dividing through by \Delta x gives:

\frac{\Delta A}{\Delta x}=L\cdot\frac{\Delta W}{\Delta x}+W\cdot\frac{\Delta L}{\Delta x}+\frac{\Delta L}{\Delta x} \frac{\Delta W}{\Delta x} \Delta x

And taking limits as \Delta x\to 0 gives the desired result:

\frac{dA}{dx}=L\cdot\frac{dW}{dx}+W\cdot\frac{dL}{dx}

Conclusion

If you’re like me, you once looked down on area models as being for those who can’t handle the “real” algebra. But if we take that view, there’s a lot of sense-making that we’re missing out on. Area models are an important tool in our tool belt for bringing clarity and connections to our math students.

Okay, so last question: Base Ten Blocks exist, and Algebra Tiles exist. What do you think? Shall we manufacture and sell Calculus DX Tiles © ? 🙂

What does a point on the normal distribution represent?

Here’s another Quora answer I’m reposting here. This is the question, followed by my answer.

What does the value of a point on the normal distribution actually represent, if anything?

 

It’s important to note the difference between discrete and continuous random variables as we answer this question. Though naming conventions vary, I think most mathematicians would agree that a discrete random variable has a Probability Mass Function (PMF) and a continuous random variable has a Probability Density Function (PDF).

The words mass and density go a long way in helping to capture the difference between discrete and continuous random variables. For a discrete random variable, the PMF evaluated at a certain x gives the probability of x. For a continuous random variable, the PDF at a certain x does not give the probability at all, it gives the density. (As advertised!)

So what is the probability that a continuous random variable takes on a certain value? For example, assume a certain type of fish has length X that is normally distributed with mean 22 cm and standard deviation 1.6 cm. What is the probability of selecting a fish exactly 26 cm long? That is, what is P(X=26)?


The answer, for any continuous random variable, is zero. More formally, if X is a continuous random variable with support \mathcal{S}, then P(X=x)=0 for all x\in\mathcal{S}.

For the fish problem, this actually does make sense. Think about it. You pull a fish out of the water which you claim is 26 cm long. But is it really 26 cm long? Exactly 26 cm long? Like 26.00000… cm long? With what precision did you make that measurement? This should explain why the probability is zero.

If instead you want to ask about the probability of getting a fish between 25.995 and 26.005 cm long, that’s perfectly fine, and you’ll get a positive answer for the probability (it’s a small answer :-).

Let’s return to the words mass and density for a second. Think about what those words mean in a physics context. Imagine having a point mass–this is in an ideal case–then the mass of that point is defined by a discrete function. In reality, though, we have density functions that assign a density to each point in an object.

Think about a 1-dimmensional rod with density function \rho(x)=x, x\in (0,10). What is the mass of this rod at x=5? Of course, the answer is zero! This should make intuitive sense. Of course, we can get meaningful answers to questions like: What is the mass of the rod between x=5 and x=6? The answer is \int_5^6 xdx=5.5.

Does the physical understanding of mass vs density clear things up for you?

Haloween worksheet for Calculus

Anyone who has been in math classes knows those corny worksheets with a joke on them. When you answer the questions, the solution to the (hilarious) joke is revealed. Did I mention these worksheets are corny? But when you get to Calculus or higher math classes, you get nostalgic for those old pre-algebra worksheets your middle school teacher gave you. I think I speak for all of us when I say this.

Not to fear, here’s a very corny joke worksheet I made just for your Calculus students. Print this on orange paper and hand it out on Halloween. When kids successfully solve the problems and discover the solution, give them candy.

Here is the solution:

Happy Halloween. Enjoy!

 

PS: I normally use my blog to share deep insights about math education or to discuss interesting higher level mathematics. But I was inspired to share more of my day-to-day activities and worksheets because of Rebecka Peterson at Epsilon-Delta. She has shared some great resources, which I’ve stolen in used in my classroom. Thanks, Rebecka!

 

In Defense of Calculus

In the following article, I expand and clarify my arguments that first appeared in this post.

A colleague recently sent me another article (thanks Doug) claiming that Statistics should replace Calculus as the most important math class for high school students.

Which peak to climb? (CCL, click on image for source)

The argument usually goes: Most kids won’t use Calculus. Statistics is more useful.

As you might know already, I disagree that the most important reason for teaching math is because it is useful. I don’t disagree that math is useful. Math is not just useful, but essential for STEM careers. So “usefulness” is certainly one reason for teaching math. But I don’t think it’s the most important reason for teaching math.

The most important reason for teaching math is because it is beautiful and eternal. Math is the single place in school where students can find deductive certainty and eternal truth. Even when human activity ceases, math will persist. When we study math, we tap into something bigger than ourselves. We taste the divine!

We are teaching students to think deductively—like a mathematician would. This is such an important area of knowledge for students to explore. They need to know what it means to prove something. A proof provides a kind of truth that is unattainable in other subjects, even the hard sciences. At best, the scientific method is still just guesses compared to math.

This is the most important thing we pass on to our students. Though some will, most of our students will not directly use the math we teach. This is actually true about every subject in high school. Most students will not remember the details of The Great Gatsby or remember the chemical formula for Ammonium Nitrate. But we do hope they learn the bigger skills: analyzing text and thinking scientifically. In math, the “bigger skills” are the ones I outlined above—proof, logic, reasoning, argumentation, problem solving. They can always look up the formulas.

Math is a subject that stands on its own and it is not the servant of other subjects. If we treat math as simply a subject that serves other subjects by providing useful formulas, we turn math into magic. We don’t need to defend math in this way. It stands on its own!

Calculus = The Mona Lisa

If students can take both Statistics and Calculus, that is ideal. But if I had to choose one, I would pick Calculus. The development of “the Calculus” is one of the great achievements of mankind and it’s a real crime to go through life never having been exposed to it. Can you imagine never having seen The Mona Lisa? Calculus is like the Mona Lisa of mathematics :-).

What is a Point of Inflection?

Simple question right?

This website, along with the Calc book we’re teaching from, define it this way:

A point where the graph of a function has a tangent line and where the concavity changes is a point of inflection.

inflection point

No debate about there being an inflection point at x=0 on this graph.

There’s no debate about functions like f(x)=x^3-x, which has an unambiguous inflection point at x=0.

In fact, I think we’re all in agreement that:

  1. There has to be a change in concavity. That is, we require that for x<c we have f''(x)<0 and for x>c we have f''(x)>0, or vice versa.*
  2. The original function f has to be continuous at x=c. That is, f(x)=\frac{1}{x^2} does not have a point of inflection at x=0 even though there’s a concavity change because f isn’t even defined here. If we then piecewise-define f so that it carries the same values except at x=0 for which we define f(0)=5, we still don’t consider this a point of inflection because of the lack of continuity.
vertical tangent

The point of inflection x=0 is at a location without a first derivative. A “tangent line” still exists, however.

But the part of the definition that requires f to have a tangent line is problematic, in my opinion. I know why they say it this way, of course. They want to capture functions that have a concavity change across a vertical tangent line, such as f(x)=\sqrt[3]{x}. Here we have a concavity change (concave up to concave down) across x=0 and there is a tangent line (x=0) but f'(0) is undefined.

questionable inflection point

Is x=0 a point of inflection? Some definitions say no, because no tangent line exists.

So It’s clear that this definition is built to include vertical tangents. It’s also obvious that the definition is built in such a way as to exclude cusps and corners. Why? What’s wrong with a cusp or corner being a point of inflection? I would claim that the piecewise-defined function f(x) shown above has a point of inflection at x=0 even though no tangent line exists here.

I prefer the definition:

A point where the graph of a function is continuous and where the concavity changes is a point of inflection.

That is, I would only require the two conditions listed at the beginning of this post. What do you think?

Once you’re done thinking about that, consider this strange example that has no point of inflection even though there’s a concavity change. As my colleague Matt suggests, could we consider this a region of inflection? Now we’re just being silly, right?

interval of inflection

A region/interval of inflection?

———————————————

Footnotes:

* When we say that a function is concave up or down on a certain interval, we actually mean f''(x)>0 or f''(x)<0 for the whole interval except at finitely many locations. If there are point discontinuities, we still consider the interval to have the same concavity.

** This source, interestingly, seems to require differentiability at the point. I think most of us would agree this is too strong a requirement, right?