Laurent's personal blog

Starting a new adventure with Powerweave

2024-04-05T00:00:00Z

I co-founded Powerweave with Mathilde Mounier. We have been working nights and weekends for the better part of a year. Today, I can reveal it publicly.

Starting a business is something that I have been thinking about since finishing graduate school. My academic expertise could not be turned into a business outright (although ex-colleagues are trying!), but my skills in mathematics, software engineering, and computer science have a broader applicability.

Mathilde and I want to act on the climate emergency. As simple citizens, our actions have a limited effect. However, by starting the right business, we can create a movement which has a tangible impact. I am positively inclined towards technology startups specifically because of the growth curve. This means that if everything goes right, Powerweave could change the world.

For many reasons, including the push towards sustainable energy and the increasing demand for energy due to a resurgence of artificial intelligence progress, power grid infrastructure is under tremendous stress, and technologies are evolving rapidly to address this. The space is fertile ground for innovators that seek to help the world in a concrete way. In order to get both startup and industry experience, I joined SocïVolta, a technology startup that operates at the intersection of financial markets and the North American power grid. This is also where I met Mathilde. We both gained invaluable insights into how the North American power infrastructure works – and where its weaknesses are.

In this post, I will describe current power infrastructure, how Powerweave helps, and what’s next for us.

What is the North American power grid?

The power grid is an electrical network which can be categorized into three broad levels:

Generation: power stations, which are often far from population centers;
Transmission: long-distance electrical transmission lines connecting power stations to population centers;
Distribution: network of low-voltage transmission to homes and businesses.

There are places in North America where all three levels are controlled by a single entity (e.g. Hydro-Québec). But increasingly, different entities own and operate different levels of the power grid.

What is the provenance of the electricity that powers your screen right now? It might not be from the closest power station. The provenance of your electricity depends on lots of factors, including who else wants electricity right now, and who is currently generating electricity. Sometimes, it is better to use a power source far from you, but that takes a path which is relatively unused.

Most people want cheap electricity. Getting you the cheapest power every second of the year requires coordination between all operating entities on the power grid. The modern way to coordinate the behavior of the power grid at all levels are so-called wholesale electricity markets. At wholesale electricity markets, power station owners, transmission line owners, and energy service providers come together to enter auctions to determine the geographical price of electricity. While the details of such auctions are far beyond the scope of this blog post, the important bit is that power auctions answer the following two questions:

Where does the energy produced by a power station get consumed?
What is the cost of this energy at every node in the power grid?

Wholesale electricity markets have been shown to increase competition and lower electricity rates, as well as integrate renewable energy resources more effectively. That is why you see them popping up more and more across the world. The European Union even codified it into law in 2019.

Local electricity distribution

Let’s zoom in on the last level, where Powerweave operates: local electricity distribution.

It used to be that local distribution of electricity was squarely in one direction: homes and businesses would only consume electricity. With the advent of rooftop solar panels and home batteries, this is no longer true. Homes and businesses can act as spontaneous, small power stations called distributed energy resources.

On one hand, we should encourage individuals and businesses to produce their own energy and even export it back to the grid at certain times. Electricity consumption per capita is increasing rapidly, which is stressing power infrastructure. Individuals and businesses that produce and consume energy, prosumers, invest in power infrastructure so that the utilities don’t have to – a concept known as virtual infrastructure upgrades. The alternative is that energy service providers (e.g. utilities) invest in infrastructure, in which case all rate-payers split the bill.

On the other hand, energy service providers are either not ready, or unwilling, to absorb spontaneous extra energy from prosumers. From an electricity distributor point-of-view, it would be much better for electricity to be consumed close to where it is generated (community self-consumption).

At the local distribution level, what is now needed is coordination between all rate-payers in a community. Does this problem remind you of something?

So what is Powerweave?

Powerweave operates a platform that coordinates all rate-payers in a community. It solves the problem of incentivizing rate-payers to invest in power infrastructure and to be more energy efficient when the power grid needs it the most, while also ensuring community self-consumption.

The Powerweave platform is centered around local power auctions, or local electricity markets. For each community, auctions are conducted at regular intervals (usually 5 minutes). Just like wholesale power markets, each auction covers a period of electricity consumption/generation in the near future:

The time between an auction ending, and its corresponding billing period starting, is the action period. This is a period during which rate-payers know how much power will cost, but they still have influence over how much power they consume or generate.

I cannot stress enough how the existence of the action period is key to Powerweave. Consider this example.

I am charging my electric car, which will require quite a bit of power (10kW) over the next 8 hours. I participate in the next power auction, where I bid for 10kW of power. However, when closing the auction, I am only cleared to draw 4kW of solar power from a few neighbors at a reasonable price; the shortfall (6kW) will be covered by my utility at a higher price, since the grid is under stress right now. Knowing this in advance, I can slow down or pause charging my vehicle until I can get a better overall energy price.

This example is the foundation of Powerweave. As a rate-payer, I minimized my electricity bill and used clean sustainable power. My utility incentivized me to be more energy efficient when it mattered, and it also saved on infrastructure costs. Win-win!

However, as you can imagine, constantly participating in auctions and adjusting power consumption/generation (day and night!) is unrealistic for most people. Therefore, Powerweave automates all of this.

On the auction side, Powerweave can trade on behalf of rate-payers. Using statistical and machine-learning models trained on historical consumption and generation data, as well as external data such as weather forecasts, Powerweave can place bids and offers for power to achieve goals set by each rate-payer, such as minimizing energy costs, or minimizing carbon emissions. These models are tireless and self-adjusting; you never need to participate in auctions directly if you don’t want to.

On the pricing response side, Powerweave integrates with third-parties (e.g. smartphone notifications, electric vehicle charging systems, smart thermostats) to automatically adjust your load or generation profile.

Our technology is unique in North America (although related technologies have been trialed in the past). Powerweave can legally operate in about 10 US states (including New York and California), with more states potentially opening up soon. Powerweave probably could not have operated even just 5 years ago. We exist at the edge of regulations; this is a great opportunity to influence US-wide reforms on local energy coordination.

The near future

Mathilde and I have completed an early technology demonstration, to show that Powerweave’s offering can indeed be realized. We are using a lot of cool technology, which I hope to share with you soon!

Right now, we are focused on partnering with one energy service provider (utility, community choice aggregation, or independent microgrid) to run a pilot project. This will demonstrate that our technology is beneficial, and give confidence to less-technologically-inclined energy service providers (which are most of them) that Powerweave can help them.

If you can help us connect with decision makers at energy service providers, I would love to hear from you. Getting the first contact with potential partners is difficult; being introduced by someone else helps tremendously.

We hope to grow the team soon as well. If you are interested in Powerweave’s mission, be sure to follow us on LinkedIn!

Trading strategies with typed features using Haskell and type families

2024-02-04T00:00:00Z

I work in the business of algorithmic power trading, which is the automated trading of various power-related products in regulated electricity markets. Products include short-term inter-jurisdiction arbitrage, financial transmission rights, and more.

This year, my employer is expanding its trading operations to a new class of products. Since there is no overlap between this new work and our current operations, I got to design a technology stack most suited for the task. This technology stack includes Haskell, most importantly because wrong or unexpected trading decisions can (and have) cost us dearly.

In this blog post, I want to show you the basics of how we designed the framework in which to express trading strategies.

The fundamentals of trading strategies

The fundamental pieces of trading operations are strategies. In algorithmic trading, strategies are computer programs that decide what to trade, and how to trade it, at any given moment.

Let’s take the example of a simple trading strategy that is only concerned with AAPL stock. The current stock price is about 190 USD today; our example strategy is defined thusly:

If the AAPL price rises above 200, sell our holdings (if we have any);
If the AAPL price falls below 180, buy 10 shares;

In this case, the result of this strategy is some signal to buy or sell AAPL stock. We can run our strategy in a loop:

import qualified Control.Monad

data Action = Buy Int 
            | Sell 
            | Hold

type Price = Double

main :: IO ()
main = Control.Monad.forever $ do
    aaplPrice <- fetchMostRecentPrice
    let action = myStrategy aaplPrice
    executeMarketAction action
    where
        myStrategy :: Price -> Action
        myStrategy aapl_price
            | aapl_price > 200 = Sell
            | aapl_price < 180 = Buy 10
            | otherwise        = Hold

        fetchMostRecentPrice :: IO Price
        fetchMostRecentPrice = (...)

        executeMarketAction :: Action -> IO ()
        execureMarketAction = (...)

and boom, you have a simple trading system!

Once you have a good idea for a strategy, you should test it on historical data. This is called backtesting. Backtesting strategies is, by definition, much more computationally intensive than live trading, since you are evaluating your strategy on much more data. We often backtest strategies on 5-10 years’ worth of data when it makes sense, and sometimes more.

I will also note that it is easiest to have strategies that can run in various contexts (including backtesting and live operations) if the strategy is pure (in the mathematical sense). It is in the quest for purity and performance that we decided to implement the trading system for a new asset class in Haskell.

Trading strategies in Haskell

For the simplicity of presentation, we will only consider strategies that involve prices. The simplest such strategies are strategies which depend on the most recent price:

newtype Strategy
    = MkStrategy { runStrategy :: Price -> Action }

Strategy is a type of functions, from the most recent Price known to some market action. This is only re-packaging the example above.

Let’s build a backtesting framework. There are two parts here:

determine historical market actions;
simulate the effects of market actions.

In practice, the two parts of backtesting are handled simultaneously. However, for simplicity, I will only consider the first part here.

The nature of this problem is well-suited to streaming approaches; I will use pipes¹:

-- From the `time` package
import           Data.Time     ( UTCTime )
-- From the `pipes` package
import           Pipes         ( Producer, (>->) )
import qualified Pipes
import qualified Pipes.Prelude as Pipes

-- | From a stream of input features, produce a stream
-- of output 'Market' actions 
backtestStrategy :: Monad m 
                 => Strategy r
                 -> Producer (UTCTime, Price)  m () -- ^ stream of timestamped AAPL prices
                 -> m BacktestResults
backtestStrategy strat prices 
    =   prices 
    >-> Pipes.map (\(k, f) -> (k, runStrategy strat f)) 
    >-> simulateMarketActions

-- The following is out-of-scope

data BacktestResults = MkBacktestResults (...)

simulateMarketActions :: Consumer (UTCTime, Action) m BacktestResults
simulateMarketActions = (...)

More expressive strategies

I have a problem with the above definition of Strategy: I’m limited to strategies based on the single, most recent Price. What if I had a good idea for a strategy which involves the last 10 price values? Our Strategy type is not expressive enough: it only takes one type of feature, while we want to support a wide range of features.

I will define a Strategy type which removes restrictions on the input feature:

newtype Strategy r 
    = MkStrategy { runStrategy :: r -> Action }

with the understanding that the data of type r is somehow derived from prices.

What are some features derived from prices, that we might be interested in?

Price history, e.g. most recent N ticks;
Price aggregations, e.g. average of most recent M ticks;
Rolling aggregations, e.g. N-tick history of the averages of M ticks;

Every conceptual feature described above has some free parameters. We don’t want to have separate strategies like Strategy PriceHistoryForPast10Ticks and Strategy PriceHistoryForPast20Ticks. More specifically, for every feature of type r, there is a type of parameters p which describes the parameters of r.

For example²:

-- from the `javelin` pacakge
import Data.Series ( Series )

newtype PriceHistory 
    = MkPriceHistory (Series UTCTime Price)

data NumTicks 
    = MkNumTicks { numTicks :: Int }

Typed features and their parametrization

We could list all possible features in a big sum type:

data Feature 
    = FPrice Price
    | FPriceHistory (Series UTCTime Price)
    | FAveragePrice Price
    | (...)

However, it’s not possible to control what features go in what strategy. We can do better.

We want to be able to link the types PriceHistory and NumTicks such that they are used together when backtesting, to ensure type safety. This is the domain of indexed type families, or type families for short. We will amend our Feature typeclass and backtestStrategy function to take into account feature parametrization:

class Feature r where
    -- For every instances `r` of `Feature`,
    -- there is an associated type `Parameters r` which the user
    -- needs to specify. See examples below.
    type Parameters r

    deriveFeature :: Monad m 
                  => Parameters r
                  -> Producer (UTCTime, Price) m ()
                  -> Producer (UTCTime, r) m ()

backtestStrategy :: (Feature r, Monad m) 
                 => Strategy r
                 -> Parameters r
                 -> Producer (UTCTime, Price)  m ()
                 -> m BacktestResults
backtestStrategy strat params prices 
    =   deriveFeature params prices 
    >-> Pipes.map (\(k, feature) -> (k, runStrategy strat feature)) 
    >-> simulateMarketActions

Let’s look at two example instances of Feature. The simplest is the basic feature or Price:

type NoParameters = ()

instance Feature Price where
    -- The `Price` feature has no free parameters
    type Parameters Price = NoParameters

    deriveFeature :: Monad m 
                  => NoParameters
                  -> Producer (UTCTime, Price) m ()
                  -> Producer (UTCTime, Price) m ()
    deriveFeatures _ prices = prices

This is easy because there are no parameters. What about looking at the price history?

newtype PriceHistory 
    = MkPriceHistory (Series UTCTime Price)

newtype NumTicks 
    = MkNumTicks { numTicks :: Int }

instance Feature PriceHistory where
    type Parameters PriceHistory = NumTicks

    deriveFeature :: Monad m 
                  => NumTicks
                  -> Producer (UTCTime, Price) m ()
                  -> Producer (UTCTime, PriceHistory) m ()
    deriveFeature (MkPriceHistoryParameters numTicks) prices
        = prices >-> accumulate numTicks 
                 >-> Pipes.map (\xs -> (maximum $ Series.index xs, MkPriceHistory xs))
        where
            -- out of scope, see end of blog post for link to source
            accumulate :: Functor m 
                       => Int
                       -> Pipe (UTCTime, a) (Series UTCTime a) m () 
            accumulate = (...)

Finally, as an example of the power of this approach, we’ll create a strategy which combines two features.

First, we’ll extend the Feature class to combine two features a and b into one (a, b) feature:

instance (Feature a, Feature b) => Feature (a, b) where

    type Parameters (a, b) = (Parameters a, Parameters b)
    
    deriveFeature :: Monad m 
                  => Parameters (a, b)
                  -> Producer (UTCTime, Price)  m ()
                  -> Producer (UTCTime, (a, b)) m ()
    deriveFeature (paramsA, paramsB) prices 
        = Pipes.zipWith (\(k,a) (_, b) -> (k, (a, b))) 
                        (deriveFeature paramsA prices) 
                        (deriveFeature paramsB prices)

Second, we’ll define a simple strategy that compares the most recent price against the average price of the last 10 ticks:

import Data.Series ( fold, mean )

finalStrategy :: Strategy (PriceHistory, Price)
finalStrategy 
    = MkStrategy $ \(MkPriceHistory history, price) 
        -> let avgPrice = fold mean history
            in case price `compare` avgPrice of
                GT -> Sell
                LT -> Buy 10
                EQ -> Hold

It is trivial to backtest this strategy like so:

backtestFinalStrategy :: Monad m 
                      => Producer (UTCTime, Price) m () 
                      -> m BacktestResults
backtestFinalStrategy = backtestStrategy finalStrategy ( MkNumTicks 10, () )

and voilà!

Conclusion

In this post, I have shown you how to define trading strategies with typed feature parametrization, which is a neat use of type families.

All code is available in this Haskell module.

If you are unfamiliar with pipes, you should check out its tutorial.↩︎
We are storing the price history in a Series, which comes from the javelin package that I created specifically for this work.↩︎

The algebraic structure of a trading stop-loss system

2023-05-07T00:00:00Z

I was once an undergraduate student in a joint Mathematics & Physics program. Some of the math courses, namely group theory and algebra, remained very abstract to me throughout my education. There is some group theory in the description of symmetries of physical systems; but being an experimentalist, I didn’t use more than 5% of what I learned in my undergrad during my PhD.

However, in the course of my work now in finance, I had the pleasure of discovering that I was actually working with an algebraic structure. This post describes how that happened.

The small trading firm for which I work is focusing a bit more on automated performance monitoring these days. With detailed trading performance data streaming in, it is now a good time to implement a stop-loss system.

A stop-loss system is a system which receives trading performance data, and emits three categories of signal:

an all-clear signal, meaning that nothing in recent trading performance indicates a problem;
a warning signal, meaning that recent trading performance is degraded – but not yet concerning – and a human should take a look under the hood;
a halt signal, meaning that there is most probably something wrong, trading should be halted at once.

Of course, we’re trading different products in different markets and even jurisdictions, and therefore the trading performance of every product is monitored independently. Moreover, our risk tolerance or expectations may be different for every product, and so a stop-loss system is really a framework in which to express multiple stop-loss rules, with different products being supervised by completely different stop-loss rules.

Let us consider examples: assume that we’re trading a particular stock like AAPL¹. Sensible stop-loss rules might be:

If our current position has lost >10% in value over the last month, emit a warning; if the position has lost >25% over the last month, emit a halt signal.
If we’re expecting market volatility in the next hour to be high (for example, due to expected high-impact news), emit a halt signal.
If our forecast of the ticker price is way off – perhaps due to a problem in the forecasting model –, emit a halt signal.

Here is what a rule framework might looks like²:

from enum import Enum, auto, unique
from typing import Callable

@unique
class Signal(Enum):
    AllClear = auto()
    Warn     = auto()
    Halt     = auto()

class Context:
    ...

Rule = Callable[[Context], Signal]

# Example rule
def rule(context: Context) -> Signal:
    ...

A Rule is a function from some Context object to a Signal. We’re packing all information required to make decisions in a single data structure for reasons which will become obvious shortly. In this framework, we may express one of the stop loss rule examples as:

def rule(context: Context) -> Signal:
    recent_loss = loss_percent( context.recent_performance(period="30d") )
    if recent_loss > 0.25:
        return Signal.Halt
    elif recent_loss > 0.10
        return Signal.Warn
    else:
        return Signal.AllClear

For the remainder of this post, I don’t care anymore about the domain-specific content of a rule.

My colleagues and I are expecting that, in practice, we will have pretty complex rules. In order to build complex rules from smaller, simpler rules, I wanted to be able to compose Rules together. This is straightforward because all rules have the same input and output types. Consider two rules, rule1 and rule2. If I want a new rule to halt if both rule1 and rule2 emit Signal.Halt, I could write it like this:

def rule1(context: Context) -> Signal:
    ...

def rule2(context: Context) -> Signal:
    ...

def rule_lax(context: Context) -> Signal:
    sig1 = rule1(context)
    sig2 = rule2(context)

    if sig1 == sig2 == Signal.Halt:
        return Signal.Halt
    elif sig1 == sig2 == Signal.Warn:
        return Signal.Warn
    else:
        return Signal.AllClear

That is an acceptable definition of rule composition. Since rule_lax will emit a Halt signal if both sub-rules emit a Halt signal, we’ll call this type of composition conjunction. In order to make it more ergonomic to write, let us wrap all rules in an object and re-use the & (overloaded and) operator:

from dataclasses import dataclass
from enum import Enum
from operator import attrgetter

class Signal(Enum):
    """
    Signals can be composed using (&):

    >>> Signal.AllClear & Signal.AllClear
    < Signal.AllClear: 1 > 
    >>> Signal.Warn & Signal.Halt
    < Signal.Warn: 2 > 
    >>> Signal.Halt & Signal.Halt
    < Signal.Halt: 3 >
    """
    AllClear = 1
    Warn     = 2
    Halt     = 3

    def __and__(self, other: "Signal") -> "Signal":
        return min(self, other, key=attrgetter('value'))

@dataclass
class rule(Callable):
    _inner: Callable[[Context], Signal]

    def __call__(self, context: Context) -> Signal:
        return self._inner.__call__(context=context)
    
    def __and__(self, other: "rule"):
        def newinner(context: Context) -> Signal:
            return rule1(context) & rule2(context)
        return self.__class__(newinner)

and now we can re-write rule_lax like so:

# The @rule decorator is required in order to lift rule1 from a regular function
# to the `rule` object
@rule
def rule1(context: Context) -> Signal:
    ...

@rule
def rule2(context: Context) -> Signal:
    ...

rule_lax = rule1 & rule2

Now, rule_lax is defined such that it’ll emit Signal.Halt if both rule1 and rule2 emit Signal.Halt. The same is true of warnings; if both rules emit a warning, then rule_lax will emit Signal.Warning. Here is a table which summarizes this composition:

$A$	$B$	$A ~ \& ~ B$
$C$	$C$	$C$
$C$	$W$	$C$
$C$	$H$	$C$
$W$	$C$	$C$
$W$	$W$	$W$
$W$	$H$	$W$
$H$	$C$	$C$
$H$	$W$	$W$
$H$	$H$	$H$

where $C$ is Signal.AllClear, $W$ is Signal.Warning, and $H$ is Signal.Halt. Therefore, & is a binary function from Rules to Rule.

This is not the only natural way to compose rules. What about this?

def rule_strict(context: Context) -> Signal:
    sig1 = rule1(context)
    sig2 = rule2(context)

    if (sig1 == Signal.Halt) or (sig2 == Signal.Halt):
        return Signal.Halt
    elif (sig1 == Signal.Warning) or (sig2 == Signal.Warning):
        return Signal.Warning
    else:
        return Signal.AllClear

In this case, rule_strict is more, uh, strict than rule_lax; it emits Signal.Halt if either rule1 or rule2 emits a stop signal. We’ll call this composition disjunction and re-use the | (overloaded or) operator to make it more ergonomic to write:

class Signal(Enum):
    """
    Signals can be composed using (&) and (|):

    >>> Signal.AllClear & Signal.AllClear
    < Signal.AllClear: 1 > 
    >>> Signal.Warn & Signal.Halt
    < Signal.Warn: 2 > 
    >>> Signal.Warn | Signal.Halt
    < Signal.Halt: 3 >
    """
    AllClear = 1
    Warn     = 2
    Halt     = 3

    def __and__(self, other: "Signal") -> "Signal":
        return min(self, other, key=attrgetter('value'))

    def __or__(self, other: "Signal") -> "Signal":
        return max(self, other, key=attrgetter('value'))

@dataclass
class rule(Callable):
    _inner: Callable[[Context], Signal]

    def __call__(self, context: Context) -> Signal:
        return self._inner.__call__(context=context)
    
    def __and__(self, other: "rule"):
        def newinner(context: Context) -> Signal:
            return rule1(context) & rule2(context)
        return self.__class__(newinner)

    def __or__(self, other: "rule"):
        def newinner(context: Context) -> Signal:
            return rule1(context) | rule2(context)
        return self.__class__(newinner)

With this implementation, we can express rule_lax and rule_strict as:

# The @rule decorator is required in order to lift rule1 from a regular function
# to the `rule` object
@rule
def rule1(context: Context) -> Signal:
    ...

@rule
def rule2(context: Context) -> Signal:
    ...

rule_lax    = rule1 & rule2
rule_strict = rule1 | rule2

We can update the table for the definition of & and |:

$A$	$B$	$A ~ \& ~ B$	$A ~ \| ~ B$
$C$	$C$	$C$	$C$
$C$	$W$	$C$	$W$
$C$	$H$	$C$	$H$
$W$	$C$	$C$	$W$
$W$	$W$	$W$	$W$
$W$	$H$	$W$	$H$
$H$	$C$	$C$	$H$
$H$	$W$	$W$	$H$
$H$	$H$	$H$	$H$

So for a given a given Context, which is fixed when the trading stop-loss system is running, we have:

A set of rule outcomes of type Signal;
A binary operation called conjunction (the & operator);
- & is associative;
- & is commutative;
- & has an identity, Signal.Halt;
- & does NOT have an inverse element.
A binary operation called disjunction (the | operator).
- | is associative;
- | is commutative;
- | has an identity, Signal.AllClear;
- | does NOT have an inverse element.

That looks like a commutative semiring to me! Just a few more things to check:

| distributes from both sides over &:
- $a ~|~ (b ~\&~ c)=(a ~|~ b) ~\&~ (a ~\&~ c)$ for all $a$ , $b$ , and $c$ ;
- $(a ~ \& ~ b) ~|~ c = (a ~|~ c) ~\&~ (b ~\&~ c)$ for all $a$ , $b$ , and $c$ .
The identity element of & (called $0$ , in this case Signal.Halt) annihilates the | operation, i.e. $0 ~ | ~ a = 0$ for all $a$ .

Don’t take my word for it, we can check exhaustively:

from itertools import product

zero = Signal.Halt
one  = Signal.AllClear

# Assert & is associative
assert all( (a & b) & c == a & (b & c) for (a, b, c) in product(Signal, repeat=3)  )
# Assert & is commutative
assert all( a & b == b & a for (a, b) in product(Signal, repeat=2)  )
# Assert & has an identity
assert all( a & zero == a for a in Signal )

# Assert | is associative
assert all( (a | b) | c == a | (b | c) for (a, b, c) in product(Signal, repeat=3)  )
# Assert | has an identity
assert all( a | one == a for a in Signal )

# Assert | distributes over & on both sides
assert all( a | (b & c) == (a | b) & (a | c) for (a, b, c) in product(Signal, repeat=3)  )
assert all( (a & b) | c == (a | c) & (b | c) for (a, b, c) in product(Signal, repeat=3)  )

# Assert identity of & annihilates with respect to |
assert all( (zero | a) == zero for a in Signal)

and there we have it! This design of a trading stop-loss system is an example of commutative semirings. This fact does absolutely nothing in the practical sense; I’m just happy to have spotted this structure more than 10 years after seeing it in undergrad.

I’m actually not involved in trading securities at all, but I think intuition about stock markets is more common↩︎
I’ll be using Python in this post because it was a requirement of the implementation, but know that I’m doing this under protest.↩︎

Efficient rolling statistics

2023-03-23T00:00:00Z

In the context of an array, rolling operations are operations on a set of values which are computed at each index of the array based on a subset of values in the array. A common rolling operation is the rolling mean, also known as the moving average.

The best way to understand is to see it in action. Consider the following list:

[0, 1, 2, 3, 4, 3, 2, 1]

The rolling average with a window size of 2 is:

[ (0 + 0)/2, (0 + 1)/2, (1 + 2)/2, (2 + 3)/2, (3 + 4)/2, (4 + 3)/2, (3 + 2)/2, (2 + 1)/2]

[0, 0.5, 1.5, 2.5, 3.5, 3.5, 2.5, 1.5]

Rolling operations such as the rolling mean tremendously useful at my work. When working with time-series, for example, the rolling mean may be a good indicator to include as part of machine learning feature engineering or trading strategy design. Here’s an example of using the rolling average price of AAPL stock as an indicator:

The problem is that rolling operations can be rather slow if implemented improperly. In this post, I’ll show you how to implement efficient rolling statistics using a method based on recurrence relations.

In principle, a general rolling function for lists might have the following type signature:

rolling :: Int        -- ^ Window length
        -> ([a] -> b) -- ^ Rolling function, e.g. the mean or the standard deviation
        -> [a]        -- ^ An input list of values
        -> [b]        -- ^ An output list of values

In this hypothetical scenario, the rolling function of type [a] -> b receives a sublist of length $M$ , the window length. The problem is, if the input list has size $N$ , and the window has length $M$ , the complexity of this operation is at best $\mathcal{O}(N \cdot M)$ . Even if you’re using a data structure which is more efficient than a list – an array, for example –, this is still inefficient.

Let’s see how to make this operation $\mathcal{O}(N)$ , i.e. constant in the window length!

Recurrence relations and the rolling average

The recipe for these algorithms involves constructing the recurrence relation of the operation. A recurrence relation is a way to describe a series by expressing how a term at index $i$ is related to the term at index $i-1$ .

Let proceed by example. Consider a series of values $X$ like so:

$X = \left[ x_0, x_1, ...\right]$

We want to calculate the rolling average $\bar{X} = \left[ \bar{x}_0, \bar{x}_1, ... \right]$ of series $X$ with a window length $N$ . The equation for the $j$ ^th term, $\bar{x}_j$ is given by:

$\bar{x}_j = \frac{1}{N}\sum_{i=j - N + 1}^{N} x_i = \frac{1}{N} \sum \left[ x_{j - N + 1}, x_{j - N + 2}, ..., x_{j} \right]$

Now let’s look at the equation for the $(j-1)$ ^th term:

$\bar{x}_{j-1} = \frac{1}{N}\sum_{i=j - N}^{j-1} x_i = \frac{1}{N} \sum \left[ x_{j - N}, x_{j - N + 1}, ..., x_{j-1} \right]$

Note the large overlap between the computation of $\bar{x}_j$ and $\bar{x}_{j-1}$ ; in both cases, you need to sum up $\left[ x_{j-N+1}, x_{j-N+2}, ..., x_{j-1} \right]$

Given that the overlap is very large, let’s take the difference between two consecutive terms, $\bar{x}_j$ and $\bar{x}_{j-1}$ :

$\begin{aligned} \bar{x}_j - \bar{x}_{j-1} &= \frac{1}{N} \sum \left[ x_{j - N + 1}, x_{j - N + 2}, ..., x_j \right] - \frac{1}{N} \sum \left[ x_{j - N}, x_{j - N + 1}, ..., x_{j-1} \right] \\ &= \frac{1}{N} \sum \left[ -x_{j-N} + x_{j - N + 1} - x_{j - N + 1} + x_{j - N + 2} - x_{j - N + 2} + ... + x_{j-1} - x_{j-1} + x_j\right] \\ &= \frac{1}{N} ( x_{j} - x_{j - N} ) \end{aligned}$

Rewriting a little:

$\bar{x}_j = \bar{x}_{j-1} + \frac{1}{N} ( x_j - x_{j-N} )$

This is the recurrence relation of the rolling average with a window of length $N$ . It tells us that for every term of the rolling average series $\bar{X}$ , we only need to involve two terms of the original series $X$ , regardless of the window. Awesome!

Haskell implementation

Let’s implement this in Haskell. We’ll use the vector library which is much faster than lists for numerical calculations like this, and comes with some combinators which make it pretty easy to implement the rolling mean. Regular users of vector will notice that the recurrence relation above fits the scanl use-case. If you’re unfamiliar, scanl is a function which looks like this:

scanl :: (b -> a -> b) -- ^ Combination function
      -> b             -- ^ Starting value
      -> Vector a      -- ^ Input
      -> Vector b      -- ^ Output

For example:

>>> import Data.Vector as Vector
>>> Vector.scanl (+) 0 (Vector.fromList [1, 4, 7, 10])
[1, 5, 12, 22]

If we decompose the example:

[    0 + 1                 -- 1
,   (0 + 1) + 4            -- 5
,  ((0 + 1) + 4) + 7       -- 12
, (((0 + 1) + 4) + 7) + 10 -- 22
]

In this specific case, Vector.scanl (+) 0 is the same as numpy.cumsum if you’re more familiar with Python. In general, scanl is an accumulation from left to right, where the “scanned” term at index i depends on the value of the input at indices i and the scanned term at i-1. This is perfect to represent recurrence relations. Note that in the case of the rolling mean recurrence relation, we’ll need access to the value at index i and i - N, where again N is the length of the window. The canonical way to operate on more than one array at once elementwise is the zip* family of functions.

-- from the `vector` library
import           Data.Vector ( Vector )   
import qualified Data.Vector as Vector

-- | Perform the rolling mean calculation on a vector.
rollingMean :: Int            -- ^ Window length
            -> Vector Double  -- ^ Input series
            -> Vector Double  
rollingMean window vs
    = let w     = fromIntegral window 
          -- Starting point is the mean of the first complete window
          start = Vector.sum (Vector.take window vs) / w
          
          -- Consider the recurrence relation mean[i] = mean[i-1] + (edge - lag)/w 
          -- where w    = window length
          --       edge = vs[i]
          --       lag  = vs[i - w]
          edge = Vector.drop window vs
          lag  = Vector.take (Vector.length vs - window) vs

        -- mean[i] = mean[i-1] + diff, where diff is:
          diff = Vector.zipWith (\p n -> (p - n)/w) edge lag
      
    -- The rolling mean for the elements at indices i < window is set to 0
       in Vector.replicate (window - 1) 0 <> Vector.scanl (+) start diff

With this function, we can compute the rolling mean like so:

>>> import Data.Vector as Vector
>>> rollingMean 2 $ Vector.fromList [0,1,2,3,4,5]
[0.0,1.5,2.5,3.5,4.5]

Complexity analysis

Let’s say the window length is $N$ and the input array length is $n$ . The naive algorithm has complexity $\mathcal{O}(n \cdot N)$ . On the other hand, rollingMean has a complexity of $\mathcal{O}(n + N)$ :

Vector.sum to compute start is $\mathcal{O}(N)$ ;
Vector.replicate (window - 1) has order $\mathcal{O}(N)$
Vector.drop and Vector.take are both $\mathcal{O}(1)$ ;
Vector.scanl and Vector.zipWith are both $\mathcal{O}(n)$ (and in practice, these operations should get fused to a single pass);

However, usually $N << n$ . For example, at work, we typically roll 10+ years of data with a window on the order of days / weeks. Therefore, we have that rollingMean scales linearly with the length of the input ( $\mathcal{O}(n)$ )

Efficient rolling variance

Now that we’ve developed a procedure on how to determine an efficient rolling algorithm, let’s do it for the (unbiased) variance.

Again, consider a series of values:

$X = \left[ x_0, x_1, ...\right]$

We want to calculate the rolling variance $\sigma^2(X)$ of series $X$ with a window length $N$ . The equation for the $j$ ^th term, $\sigma^2_j$ is given by:

$\sigma^2_j = \frac{1}{N - 1}\sum_{i=j - N + 1}^{j} (x_i - \bar{x}_j)^2 = \frac{1}{N-1} \sum \left[ (x_{j - N + 1} - \bar{x}_j)^2, ..., (x_j - \bar{x}_j)^2 \right]$

where $\bar{x}_i$ is the rolling mean at index $i$ , just like in the previous section. Let’s simplify a bit by expanding the squares:

$\begin{aligned} (N - 1) ~ \sigma^2_j &= \sum_{i=j-N+1}^{j} (x_i - \bar{x}_j)^2 \\ &= N\bar{x}^2_j + \sum_{i=j - N + 1}^{j} x^2_i - 2 x_i \bar{x}_j \end{aligned}$

We note here that $\sum_{i=j - N + 1}^{j} x_i \equiv N \bar{x}_j$ , which allows to simplify the equation further:

$\begin{aligned} (N - 1) ~ \sigma^2_j &= N\bar{x}^2_j - 2 N \bar{x}^2_j + \sum_{i=j - N + 1}^{j} x^2_i \\ &= -N\bar{x}^2_j + \sum_{i=j - N + 1}^{j} x^2_i \end{aligned}$

This leads to the following difference between consecutive rolling unbiased variance terms:

$\begin{aligned} (N - 1) \left( \sigma^2_j - \sigma^2_{j-1} \right) &= N\bar{x}^2_{j - 1} - N\bar{x}^2_j + \sum_{i=j - N + 1}^{j} x^2_i - \sum_{i'=j - N}^{j-1} x^2_{i'} \\ &= N\bar{x}^2_{j - 1} - N\bar{x}^2_j + x^2_j - x^2_{j-N} \end{aligned}$

and therefore, the recurrence relation:

$\sigma^2_j = \sigma^2_{j-1} + \frac{1}{N-1} \left[ N\bar{x}^2_{j - 1} - N\bar{x}^2_j + x^2_j - x^2_{j - N} \right]$

This recurrence relation looks pretty similar to the rolling mean recurrence relation, with the added wrinkle that you need to know the rolling mean in advance.

Haskell implementation

Let’s implement this in Haskell again. We can re-use our rollingMean. We’ll also need to compute the unbiased variance in the starting window; I’ll use the statistics library for brevity, but it’s easy to implement yourself if you care about minimizing dependencies.

-- from the `vector` library
import           Data.Vector ( Vector )   
import qualified Data.Vector as Vector
-- from the `statistics` library
import           Statistics.Sample ( varianceUnbiased )

rollingMean :: Int          
            -> Vector Double
            -> Vector Double
rollingMean = (...)  -- see above

-- | Perform the rolling unbiased variance calculation on a vector.
rollingVar :: Int          
           -> Vector Double
           -> Vector Double
rollingVar window vs
    = let start   = varianceUnbiased $ Vector.take window vs
          n       = fromIntegral window
          ms      = rollingMean window vs
        
          -- Rolling mean terms leading by N
          ms_edge = Vector.drop window ms
          -- Rolling mean terms leading by N - 1
          ms_lag  = Vector.drop (window - 1) ms
          
          -- Values leading by N
          xs_edge = Vector.drop window vs
          -- Values leading by 0
          xs_lag  = vs
          
          -- Implementation of the recurrence relation, minus the previous term in the series
          -- There's no way to make the following look nice, sorry.
          -- N * \bar{x}^2_{N-1} - N * \bar{x}^2_{N} + x^2_N - x^2_0
          term xbar_nm1 xbar_n x_n x_0 = (n * (xbar_nm1**2) - n * (xbar_n ** 2) + x_n**2 - x_0**2)/(n - 1)
        
    -- The rolling variance for the elements at indices i < window is set to 0
       in Vector.replicate (window - 1) 0 <> Vector.scanl (+) start (Vector.zipWith4 term ms_lag ms_edge xs_edge xs_lag)

Note that it may be benificial to reformulate the $N\bar{x}^2_{j - 1} - N\bar{x}^2_j + x^2_j - x^2_{j - N}$ part of the recurrence relation to optimize the rollingVar function. For example, is it faster to minimize the number of exponentiations, or multiplications? I do not know, and leave further optimizations aside.

Complexity analysis

Again, let’s say the window length is $N$ and the input array length is $n$ . The naive algorithm still has complexity $\mathcal{O}(n \cdot N)$ . On the other hand, rollingVar has a complexity of $\mathcal{O}(n + N)$ :

varianceUnbiased to compute start is $\mathcal{O}(N)$ ;
Vector.replicate (window - 1) has order $\mathcal{O}(N)$
Vector.drop and Vector.take are both $\mathcal{O}(1)$ ;
Vector.scanl and Vector.zipWith4 are both $\mathcal{O}(n)$ (and in practice, these operations should get fused to a single pass);

Since usually $N << n$ , as before, we have that rollingVar scales linearly with the length of the input ( $\mathcal{O}(n)$ ).

Bonus: rolling Sharpe ratio

The Sharpe ratio¹ is a common financial indicator of return on risk. Its definition is simple. Consider excess returns in a set $X$ . The Sharpe ratio $S(X)$ of these excess returns is:

$S(X) = \frac{\bar{X}}{\sigma_X}$

For ordered excess returns $X = \left[ x_0, x_1, ... \right]$ , the rolling Sharpe ratio at index $j$ is:

$S_j = \frac{\bar{x}_j}{\sigma_j}$

where $\bar{x}_j$ and $\sigma_j$ are the rolling mean and standard deviation at index $j$ , respectively.

Since the rolling variance requires knowledge of the rolling mean, we can easily compute the rolling Sharpe ratio by modifying the implementation of rollingVariance:

-- from the `vector` library
import           Data.Vector ( Vector )   
import qualified Data.Vector as Vector
-- from the `statistics` library
import           Statistics.Sample ( varianceUnbiased )

rollingMean :: Int          
            -> Vector Double
            -> Vector Double
rollingMean = (...)  -- see above

rollingSharpe :: Int          
              -> Vector Double
              -> Vector Double
rollingSharpe window vs
    = let start   = varianceUnbiased $ Vector.take window vs
          n       = fromIntegral window
          ms      = rollingMean window vs
          
          -- The following expressions are taken from rollingVar
          ms_edge = Vector.drop window ms
          ms_lag  = Vector.drop (window - 1) ms
          xs_edge = Vector.drop window vs
          xs_lag  = vs
          term xbar_nm1 xbar_n x_n x_0 = (n * (xbar_nm1**2) - n * (xbar_n ** 2) + x_n**2 - x_0**2)/(n - 1)

          -- standard deviation from variance
          std = sqrt <$> Vector.scanl (+) start (Vector.zipWith4 term ms_lag ms_edge xs_edge xs_lag)
        
    -- The rolling Sharpe ratio for the elements at indices i < window is set to 0
       in Vector.replicate (window - 1) 0 <> Vector.zipWith (/) (Vector.drop window ms) std

Conclusion

In this blog post, I’ve shown you a recipe to design rolling statistics algorithms which are efficient (i.e. $\mathcal{O}(n)$ ) based on recurrence relations. Efficient rolling statistics as implemented in this post are an essential part of backtesting software, which is software to test trading strategies.

All code is available in this Haskell module.

William F. Sharpe. Mutual Fund Performance. Journal of Business, 31 (1966). DOI: 10.1086/294846 ↩︎

Filtering noise with discrete wavelet transforms

2022-11-23T00:00:00Z

In this post, I’ll show you a class of filtering techniques, based on discrete wavelet transforms, which is suited to noise that cannot be filtered away with more traditional techniques – such as ones that rely on the Fourier transform. This has been important in my past research¹ ², and I hope that this can help you too.

Integral transforms

A large category of filtering techniques are based on integral transforms. Broadly speaking, an integral transform $T$ is an operation that is performed on a function $f$ , and builds a new function $T\left[ f\right]$ which is defined on a variable $s$ , such that:

$T\left[ f \right](s) = \int dt ~ f(t) \cdot K(t, s)$

Here, $K$ (for kernel) is a function which “selects” which parts of $f(t)$ are important at a fixed $s$ . Note that for an integral transform to be useful as a filter, we’ll need the ability to invert the transformation, i.e. there exists an inverse kernel $K^{-1}(s, t)$ such that:

$f(t) = \int ds ~ \left( T \left[ f\right] (s) \right) \cdot K^{-1}(s,t)$

All of this was very abstract, so let’s look at a concrete example: the Fourier transform. The Fourier transform is an integral transform where³:

$\begin{align} K(t, \omega) &\equiv \frac{e^{-i \omega t}}{\sqrt{2 \pi}}\\ K^{-1}(\omega, t) &\equiv e^{i \omega t}\\ \omega & \in \mathbb{R} \end{align}$

There are many other integral transforms, such as:

The Laplace transform ( $K(t, s) \equiv e^{- s t}$ ), which is useful to solve linear ordinary differential equations;
The Legendre transforms ( $K_n(t, s) \equiv P_n(s)$ , where $P_n$ is the n^th Legendre polynomial) which is used to solve for electron motion in hydrogen atoms;
The Radon transform (for which I cannot write down a kernel), which is used to analyze computed tomography data.

So why are integral transforms interesting? Well, depending on the function $f(t)$ you want to transform, you might end up with a representation of $f$ in the transformed space, $T \left[ f\right] (s)$ , which has nice properties! Re-using the Fourier transform for a simple, consider a function made up of two well-defined frequencies:

$f(t) \equiv e^{-i ~ 2t} + e^{-i ~ 5t}$

The representation of $f(t)$ in frequency space – the Fourier transform of $f$ , $F\left[ f\right](\omega)$ – is very simple:

$F\left[ f\right](\omega) = \sqrt{2 \pi} \left[ \delta(\omega - 2) + \delta(\omega - 5) \right]$

The Fourier transform of $f$ is perfectly localized in frequency space, being zero everywhere except at $\omega=2$ and $\omega=5$ . Functions composed of infinite waves (like the example above) always have the nice property of being localized in frequency space, which makes it easy to manipulate them… like filtering some of their components away!

Discretization

It is much more efficient to use discretized versions of integral transforms on computers. Loosely speaking, given a discrete signal composed of $N$ terms $x_0$ , …, $x_{N-1}$ :

$T\left[ f \right](k) = \sum_n x_n \cdot K(n, k)$

i.e. the integral is now a finite sum. For example, the discrete Fourier transform of the signal $x_n$ , $X_k$ , can be written as:

$X_k = \sum_n x_n \cdot e^{-i 2 \pi k n / N}$

and its inverse becomes:

$x_n = \frac{1}{N}\sum_k X_k \cdot e^{i 2 \pi k n / N}$

This is the definition used by numpy. Let’s use this definition to compute the discrete Fourier transform of $f(t) \equiv e^{-i ~ 2t} + e^{-i ~ 5t}$ :

Using the discrete Fourier transform to filter noise

Let’s add some noise to our signal and see how we can use the discrete Fourier transform to filter it away. The discrete Fourier transform is most effective if your noise has some nice properties in frequency space. For example, consider high-frequency noise:

$N(t) = \sum_{\omega=20}^{50} \sin(\omega t + \phi_{\omega})$

where $\phi_\omega$ are random phases, one for each frequency component of the noise. While the signal looks very noisy, it’s very obvious in frequency-space what is noise and what is signal:

The basics of filtering is as follows: set the transform of a signal to 0 in regions which are thought to be undesirable. In the case of the Fourier transform, this is known as a band-pass filter; frequency components of a particular frequency band are passed-through unchanged, and frequency components outside of this band are zeroed. Special names are given to band-pass filters with no lower bound (low-pass filter) and no upper bound (high-pass filter). We can express this filtering as a window function $W_k$ in the inverse discrete Fourier transform:

$x_{n}^{\text{filtered}} = \frac{1}{N}\sum_k W_k \cdot X_k \cdot e^{i 2 \pi k n / N}$

In the case of the plot above, we want to apply a low-pass filter with a cutoff at $\omega=10$ . That is:

$W_k = \left\{ \begin{array}{cl} 1 & : \ |k| \leq 10 \\ 0 & : \ |k| > 10 \end{array} \right.$

Visually:

Top: Noisy signal with the pure signal shown in comparison. Middle: Discrete Fourier transform of the noisy signal. The band of our band-pass filter is shown, with a cutoff of

\omega=10

. All Fourier components in the zeroed region are set to 0 before performing the inverse discrete Fourier transform. Bottom: Comparison between the filtered signal and the pure signal. The only (small) deviations can be observed at the edges. (Source code)

The lesson here is that filtering signals using a discretized integral transform (like the discrete Fourier transform) consists in:

Performing a forward transform;
Modifying the transformed signal using a window function, usually by zeroing components;
Performing the inverse transform on the modified signal.

Discrete wavelet transforms

Discrete wavelet transforms are a class of discrete transforms which decomposes signals into a sum of wavelets. While the complex exponential functions which make up the Fourier transform are localized in frequency but infinite in space, wavelets are localized in both time space and frequency space.

In order to generate the basis wavelets, the original wavelet is stretched. This is akin to the Fourier transform, where the sine/cosine basis functions are ‘stretched’ by decreasing their frequency. In technical terms, the amount of ‘stretch’ is called the level. For example, the discrete wavelet transform using the db4⁴ wavelet up to level 5 is the decomposition of a signal into the following wavelets:

Five of the db4 basis wavelets shown. As the level increases, the wavelet is stretched such that it can represent lower-frequency components of a signal. (Source code)

In practice, discrete wavelet transforms are expressed as two transforms per level. This means that a discrete wavelet transform of level 1 gives back two sets of coefficients. One set of coefficient contains the low-frequency components of the signal, and are usually called the approximate coefficients. The other set of coefficients contains the high-frequency components of the signal, and are usually called the detail coefficients. A wavelet transform of level 2 is done by taking the approximate coefficients of level 1, and transforming them using a stretched wavelet into two sets of coefficients: the approximate coefficients of level 2, and the detail coefficients of level 2. Therefore, a signal transformed using a wavelet transform of level $N$ has $N$ sets of coefficients: the approximate and detail coefficients of level $N$ , and the detail coefficients of levels $N-1$ , $N-2$ , …, $1$ .

Filtering using the discrete wavelet transform

The discrete Fourier transform excels at filtering away noise which has nice properties in frequency space. This is isn’t always the case in practice; for example, noise may have frequency components which overlap with the signal we’re looking for. This was the case in my research on ultrafast electron diffraction of polycrystalline samples⁵ ⁶, where the ‘noise’ was a trendline which moved over time, and whose frequency components overlapped with diffraction pattern we were trying to isolate.

As an example, let’s use real electron diffraction data and we’ll pretend this is a time signal, to keep the units familiar. We’ll take a look at some really annoying noise: normally-distributed white noise drawn from this distribution⁷:

$P(x) = \frac{1}{\sqrt{2 \pi}} \exp{-\frac{(x + 1/2)^2}{2}}$

Visually:

This example shows a common situation: realistic noise whose frequency components overlap with the signal we’re trying to isolate. We wouldn’t be able to use filtering techniques based on the Fourier transform.

Now let’s look at a particular discrete wavelet transform, with the underlying wavelet sym17. Decomposing the noisy signal up to level 3, we get four components:

All coefficients from a discrete wavelet transform up to level 3 with wavelet sym17. (Source code)

Looks like the approximate coefficients at level 3 contain all the information we’re looking for. Let’s set all detail coefficients to 0, and invert the transform:

That’s looking pretty good! Not perfect of course, which I expected because we’re using real data here.

Conclusion

In this post, I’ve tried to give some of the intuition behind filtering signals using discrete wavelet transforms as an analogy to filtering with the discrete Fourier transform.

This was only a basic explanation. There is so much more to wavelet transforms. There are many classes of wavelets with different properties, some of which⁸ are very useful when dealing with higher-dimensional data (e.g. images and videos). If you’re dealing with noisy data, it won’t hurt to try and see if wavelets will help you understand it!

L. P. René de Cotret and B. J. Siwick, A general method for baseline-removal in ultrafast electron powder diffraction data using the dual-tree complex wavelet transform, Struct. Dyn. 4 (2017) DOI:10.1063/1.4972518 ↩︎
M. R. Otto, L. P. René de Cotret, et al, How optical excitation controls the structure and properties of vanadium dioxide, PNAS (2018) DOI: 10.1073/pnas.1808414115.↩︎
Note that it is traditional in physics to represent the transform variable as $\omega$ instead of $s$ . If $t$ is time (in seconds), then $\omega$ is angular frequency (in radians per seconds). If $t$ is distance (in meters), $\omega$ is spatial angular frequency (in radians per meter).↩︎
I will be using the wavelet naming scheme from PyWavelets.↩︎
L. P. René de Cotret and B. J. Siwick, A general method for baseline-removal in ultrafast electron powder diffraction data using the dual-tree complex wavelet transform, Struct. Dyn. 4 (2017) DOI:10.1063/1.4972518 ↩︎
M. R. Otto, L. P. René de Cotret, et al, How optical excitation controls the structure and properties of vanadium dioxide, PNAS (2018) DOI: 10.1073/pnas.1808414115.↩︎
Note that this distribution contains a bias of - $1/2$ , which is useful in order to introduce low-frequencies in the noise which overlap with the spectrum of the signal.↩︎
N. G. Kingsbury, The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters, IEEE Digital Signal Processing Workshop, DSP 98 (1998)↩︎

Chesterton's fence and why I'm not sold on the blockchain

2022-08-02T00:00:00Z

The key technological advances which brought Bitcoin to life are the blockchain and its associated proof-of-work consensus algorithm. The Bitcoin whitepaper¹ is very clear on its purpose:

A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending.

The double-spending problem to which Nakamoto refers is a unique challenge of digital cash implementations. Contrary to physical cash, which is difficult to copy, digital cash is but bytes; it can be trivially copied. Before Bitcoin, the most popular way to prevent double-spending has been to route all digital cash transactions on a particular network through a trusted entity which ensures that no double-spending occurs. This is how the credit card and Interac networks work, for example.

The Bitcoin whitepaper brings a new solution to the double-spending problem, a solution designed to explicitly avoid centralized trusted entities.

In software engineering, there is a principle that one should understand why something is the way it is, before trying to change it. This principle is known as Chesterton’s fence²:

There exists (…) a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, ‘I don’t see the use of this; let us clear it away.’ To which the more intelligent type of reformer will do well to answer: ’If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.

To me, the push towards decentralization is a case of Chesterton’s fence. No one wants to involve a third-party in every transactions, but it is this way for two main reasons: fraud management and performance (transaction throughput).

Fraud management is a weak point of an anonymous peer-to-peer network like Bitcoin. While I appreciate the desire for anonymity, this leads to the same behaviors which lead to the founding of the US Securities and Exchange Commission almost a hundred years ago. Decentralization also enabled the rise of ransomware, as it is now much harder to track the flow of money between anonymous, single-use cryptocurrency accounts.

Performance is another major downside of decentralization. As an example, Bitcoin’s throughput has never reached more than 6 transactions per second as of the time of writing. By contract, the electronic payment network VisaNet (which powers Visa credit card) can process up to 76 000 transactions per second.

Until blockchain enthusiasts understand the advantages of centralization presented above, I don’t think cryptocurrencies will become mainstream.

This post was inspired by the Tim O’Reilly interview on the Rational Reminder podcast.

S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System (2008). Link to PDF.↩︎
G. K. Chesterton, The Thing: Why I Am a Catholic, chapter 4 (1929).↩︎

Exploring the multiverse of possibilities all at once using monads

2022-03-02T00:00:00Z

I’m working on a global optimization problem these days. Unlike local optimization problems, e.g. what you would solve using least-square minimization, global optimization inevitably involves exhaustively evaluating all possible solutions and choosing the best one. As you can imagine, global optimization is much more computationally-intensive than local optimization, due to the size of the set of potential solutions. Speeding up a global optimization problem involves reducing the set of possible solution to a minimum, based on the specifics of the problem.

In this post, I’ll show you how to build the minimal set of possible solutions to an optimization problem, instead of searching for solutions in a larger space. As we’ll see, only viable solutions are ever considered. This will be done by splitting the computations into multiple universes whenever a choice is presented to us, such that we traverse the multiverse of possibilities all at once.

An example problem

Let’s say we’ve got 8 friends going out for a drink, in two cars with four seats each. How many arrangements of people can we have? If we don’t care about where people sit in each car, the number of arrangements is the number of combinations of 4 people we can make from 8 people, since the remaining 4 people will go in the second car. For every configuration, there’s also a configuration which swaps the car. Therefore, there are:

$\binom{8}{4} \times 2 = \frac{8!}{4!(8-4)!} \times 2 = 140$

possible combinations. If you’re not familiar with this notation, you can read $\binom{8}{4}$ as choose 4 people out of 8 people, of which there are 70 possibilities (and then 70 other possibilities with the cars swapped). That means that if we wanted to optimize the distribution of people into the two cars – for example, if we wanted to group up the best friends together, or minimize the total weight of people in car1, or some other objective –, we would need to look at 140 solutions. This problem is purely combinatorial.

Now let’s add some constraints. Our 8 friends are coming back from the bar. Out of the 8 friends, 3 of them didn’t drink and are therefore allowed to drive. Thus, the number of possible arrangements of friends in the car has been reduced, as each car needs a driver. For one car, we need to select 1 driver out of 3, and 3 remaining passengers out of 7. However, the other car will need a driver, so really there are 6 passengers to choose from. Finally, for every arrangement there is a duplicate arrangement with the cars swapped. The number of possibilities is therefore:

$\binom{3}{1} \binom{6}{3} \times 2 = 120$

Potential solutions as a decision graph

How else can we express the number of combinations? Think of building a solution, instead of searching for one. We may want to start by assigning a driver to car 1. For each possible decision here, we’ll assign a driver to the second car next, then passengers. The possibilities look like this:

In the figure above, no one is assigned at the start. Then, we assign the first driver (out of three choices). Then, we need to assign a second driver, of which there are only two remaining. Each of the 6 passengers are then assigned. A potential solution (i.e. a assignment between people and cars) is represented by a path in the decision tree. Three possibilities are shown as examples.

This way of thinking about solutions reminds me strongly of the Everett interpretation of quantum mechanics, also known as the many-worlds interpretation or the multiverse interpretation. The three potential assignments are three universes that split from the same starting point. Enumerating all possible solutions to our example problem consists in crawling the decision tree, or crawling the multiverse of possibilities.

Expressing the multiverse of solutions in Haskell

Based on the decision tree above, I want to run a computation which, when presented with choices, explores all possibilities all at once.

Consider the following type constructor:

newtype Possibilities a = Possibilities [a]

A computation that returns a result Possibilities a represents all possible answers of final type a. For example, a computation can possibly have multiple answers might look like:

possibly :: [a] -> Possibilities a
possibly xs = Possibilities xs

Alternatively, a computation which is certain, i.e. has a single possibility, is represented by:

certainly :: a -> Possibilities a
certainly x = Possibilities [x] -- A single possibility = a certainty.

Possibilities is basically a list, so we’ll start with a Foldable instance which is useful for counting the number of possibilities using length:

instance Foldable Possibilities where
    foldMap m (Possibilities xs) = foldMap m xs

Possibilities is a functor:

instance Functor Possibilities where
    fmap f (Possibilities ps) = Possibilities (fmap f ps)

The interesting tidbit starts with the Applicative instance. Combining possibilities should be combinatorial, e.g. combining the possibilities of 3 drivers and 6 passengers results in 18 possibilities.

instance Applicative Possibilities where
    pure x = certainly x -- see above

    (Possibilities fs) <*> (Possibilities ps) = Possibilities [f p | f <- fs, p <- ps]

Recall that the list comprehension notation is combinatorial, i.e. [(n,m) | n <- [1..3], m <- [1..3]] has 9 elements ([(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]).

Now for the crucial part of composing possibilities. We want past possibilities to influence future possibilities; we’ll need a monad instance. A monad instance means that if we start with multiple possibilities, and each possibility can results in multiple possibilities, the whole computation should produce multiple possibilities¹.

instance Monad Possibilities where

    Possibilities ps >>= f = Possibilities $ concat [toList (f p) | p <- ps] -- concat :: [ [a] ] -> [a]
        where
            toList (Possibilities xs) = xs

Let’s define some helper datatypes and functions. We

{- 
With the following imports:

import           Data.Set    (Set, (\\))
import qualified Data.Set as Set 
-}

-- | All possible people which can be assigned to cars
data People = Driver1    | Driver2    | Driver3
            | Passenger1 | Passenger2 | Passenger3
            | Passenger4 | Passenger5 | Passenger6
    deriving (Bounded, Eq, Enum)


-- A car assignment consists in two cars, each with a driver, 
-- as well as passengers
data CarAssignment 
    = CarAssignment { driver1        :: Person
                    , driver2        :: Person
                    , car1Passengers :: Set Person
                    , car2Passengers :: Set Person
                    }
    deriving Show


allDrivers :: Set Person
allDrivers = Set.fromList [Driver1, Driver2, Driver3]


-- Pick a driver from an available group of people.
-- Returns the assigned driver, and the remaining unassigned people
assignDriver :: Set Person -> Possibilities (Person, Set Person)
assignDriver people 
    = possibly [ (driver, Set.delete driver people) 
               | driver <- Set.toList $ people `Set.intersection` allDrivers
               ]


-- Pick three passengers from an available group of people.
-- Returns the assigned passengers, and the remaining unassigned people
assign3Passengers :: Set Person -> Possibilities (Set Person, Set Person)
assign3Passengers people = possibly [ (passengers, people \\ passengers) 
                                   | passengers <- setsOf3
                                   ]
    where setsOf3 = filter (\s -> length s == 3) $ Set.toList $ Set.powerSet people

Finally, we can express the multiverse of possible drivers-and-passengers assignments with great elegance. Behold:

carAssignments :: Possibilities CarAssignment
carAssignments = do
    let everyone = Set.fromList $ enumFromTo minBound maxBound -- [Driver1, Driver2, ..., Passenger6]
    
    (driver1, rest) <- assignDriver everyone
    (driver2, rest) <- assignDriver rest

    (car1Passengers, rest) <- assign3Passengers rest
    (car2Passengers, _)    <- assign3Passengers rest

    return $ CarAssignment driver1 driver2 car1Passengers car2Passengers

Given the monad instance for Possibilities, the return function returns all possible possibilities. Let’s take a look at the size of the multiverse in this case:

ghci> let multiverse = carAssignments
ghci> print $ length multiverse
120

Just as we had calculated by hand. Amazing!

Conclusion

What I’ve shown you today is how to structure computations in such a way that you are exploring the multiverse of possibilities all at once. The seasoned Haskell programmer will have recognized that the Functor, Applicative, and Monad instances of Possibilities are just like lists!

Although I’m not using Haskell at work², I expect that something similar will need to be built in the near future to speed up our global optimization problem. The specific problem we are tackling has many more constraints than the example presented in this post. It’s easier to generate a list of solutions, most of which are unsuitable, and filter the solutions one by one. There is a fixed computational cost associated with generating and checking a solution, and so reducing the set of possible solutions is even more important.

This post was partly inspired by the legendary blog post Typing the technical interview

A self-contained Haskell source file containing all code from this post is available for download here

This is why some people like to thing of monads as types that support flatMap.↩︎
Boss, if you’re reading this, please let me use Haskell :).↩︎

Can you make heterogeneous lists in Haskell? Sure — as long as your intent is clear

2021-09-26T00:00:00Z

Featured in Haskell Weekly issue 283

Sometimes, Haskell’s type system seems a bit restrictive compared to dynamic languages like Python. The most obvious example is the heterogenous list:

>>> # Python
>>> mylist = ["hello", "world", 117, None]
>>>
>>> for item in mylist:
...     print(item) 
hello
world
117
None

but in Haskell, list items must be of the same type:

-- Haskell
mylist = ["hello", "world", 117, ()] -- Invalid: type cannot be inferred!

This is a contrived example, of course. But consider this use-case: I just want to print the content of the list. It’s unfortunate I can’t write:

mylist :: Show a => [a]
mylist =  ["hello", "world", 117, ()] -- All these types have Show instances, but this won't compile

For this specific application, the type system is overly restrictive – as long as all I want to do is print the content of my list! In this post, I’ll show you how to do something like this using the ExistentialQuantification language extension.

A more complex example

Let’s say I want to list American football players. There are two broad classes of players (offensive and defensive) and we want to keep track of the players in a list – the player registry. Our final objective is to print the list of players to standard output.

Let’s try to do the same in Haskell. Our first reflex might be to use a sum type:

data Player = OffensivePlayer String String -- name and position
            | DefensivePlayer String String -- name and position

playerRegistry :: [Player]
playerRegistry = ...

However, not all sports stats apply to OffensivePlayer and DefensivePlayer constructors. For example:

passingAccuracy :: Player -> IO Double
passingAccuracy (OffensivePlayer name pos) = lookupFromDatabase "passingAccuracy" name
passingAccuracy (DefensivePlayer name pos) = return 0 -- Defensive players don't pass


tacklesPerGame :: Player -> IO Double
tacklesPerGame (OffensivePlayer name pos) = return 0 -- Offensive players don't tackle
tacklesPerGame (DefensivePlayer name pos) = lookupFromDatabase "tacklesPerGame" name

The Player type is too general; we’re not using the type system to its full potential. It’s much more representative of our situation to use two separate types:

data OffensivePlayer = OffensivePlayer String String
data DefensivePlayer = DefensivePlayer String String

passingAccuracy :: OffensivePlayer -> IO Double
passingAccuracy = ...

tacklesPerGame :: DefensivePlayer -> IO Double
tacklesPerGame = ...

This is much safer and appropriate. Now let’s give ourselves the ability to print players:

instance Show OffensivePlayer where
    show (OffensivePlayer name pos) = mconcat ["< ", name, " : ", pos, " >"]

instance Show DefensivePlayer where
    show (DefensivePlayer name pos) = mconcat ["< ", name, " : ", pos, " >"]

Awesome. One last problem:

-- This won't typecheck
playerRegistry = [ OffensivePlayer "Tom Brady"       "Quarterback"
                  , DefensivePlayer "Michael Strahan" "Defensive end"
                  ]

printPlayerList :: IO ()
printPlayerList = forM_ playerRegistry print -- `forM_` from Control.Monad

Rather annoying. We could wrap the two player types in a sum type:

data Player = OP OffensivePlayer
            | DP DefensivePlayer

instance Show Player where
    show (OP p) = show p
    show (DP p) = show p

playerRegistry :: [Player]
playerRegistry = [ OP (OffensivePlayer "Tom Brady"       "Quarterback")
                 , DP (DefensivePlayer "Michael Strahan" "Defensive end")
                 ]

printPlayerList :: IO ()
printPlayerList = forM_ playerRegistry print

but this is quite clunky. It also doesn’t scale well to cases where we have a lot more types!

Enter existential quantification

The latest version of the Haskell language (Haskell 2010) is somewhat dated at this point. However, the Glasgow Haskell Compiler supports language extensions at the cost of portability. Turns out that the ExistentialQuantification language extension can help us with this problem.

We turn on the extension at the top of our module:

{-# LANGUAGE ExistentialQuantification #-}

and create an existential datatype:

data ShowPlayer = forall a. Show a
                => ShowPlayer a

The datatype ShowPlayer is a real datatype that bundles any data a which can be shown. Note that everything else about the internal type is forgotten, since the ShowPlayer type wraps any type that can be shown (that’s what forall a. Show a means).

We can facilitate the construction of a Player with the following helper function:

mkPlayer :: Show a => a -> ShowPlayer
mkPlayer a = ShowPlayer a show

Now since the data bundled in a ShowPlayer can be shown, the only operation supported by ShowPlayer is Show:

instance Show ShowPlayer where
    show (ShowPlayer a) = show a

Finally, our heterogenous list:

playerRegistry :: [ShowPlayer]
playerRegistry = [ -- ✓ OffensivePlayer has a Show instance ✓
                   ShowPlayer (OffensivePlayer "Tom Brady"       "Quarterback"))
                   -- ✓ DefensivePlayer has a Show instance ✓
                 , ShowPlayer (DefensivePlayer "Michael Strahan" "Defensive end"))
                 ]

printPlayerList :: IO ()
printPlayerList = forM_ playerRegistry print

So we can have an heterogenous list – as long as the only thing we can do with it is show it!

The advantage here compared to the sum-type approach is when we extend our code to many more types:

data Quarterback    = Quarterback  String deriving Show
data Lineman        = Lineman      String deriving Show
data Runningback    = Runningback  String deriving Show
data WideReceiver   = WideReceiver String deriving Show

data DefensiveEnd   = DefensiveEnd String deriving Show
data Linebacker     = Linebacker   String deriving Show
data Safety         = Safety       String deriving Show
data Corner         = Corner       String deriving Show


-- Example: some functions are specific to certain positions
passingAccuracy :: Quarterback -> IO Double
assingAccuracy = ...


playerRegistry :: [ShowPlayer]
playerRegistry = [ mkPlayer (Quarterback  "Tom Brady"))
                 , mkPlayer (DefensiveEnd "Michael Strahan"))
                 , mkPlayer (Safety       "Richard Sherman"))
                 , ...
                 ]

printPlayerList :: IO ()
printPlayerList = forM_ playerRegistry print

This way, we can keep the benefits of the type system when we want it, but also give ourselves some flexibility when we need it. This is actually similar to object-oriented programming, where classes bundle data and operations on them into an object!

A bit more functionality

Let’s pack in more operations on our heterogenous list. We might want to not only show players, but also access their salaries. We describe the functionality common to all players in a typeclass called BasePlayer:

class Show p => BasePlayer p where
    -- Operate in IO because of database access, for example
    getYearlySalary :: p -> IO Double

instance BasePlayer Quarterback where
    ...

instance BasePlayer Lineman where
    ...

We can update our player registry to support the same operations as BasePlayer through the Player existential type:

data Player = forall a. BasePlayer a
            => Player a

instance Show Player where
    show (Player a) = show a

instance BasePlayer Player where
    getYearlySalary (Player a) = getYearlySalary a

and our new heterogenous list now supports:

playerRegistry :: [Player]
playerRegistry = [ Player (Quarterback  "Tom Brady")
                 , Player (DefensiveEnd "Michael Strahan")
                 , Player (Safety       "Richard Sherman")
                 , ...
                 ]

printPlayerList :: IO ()
printPlayerList = forM_ playerRegistry print -- unchanged


average_salary :: IO Double
average_salary = do
    salaries <- for playerRegistry getYearlySalary -- (`for` from Data.Traversable)
    return $ (sum salaries) / (length salaries)

So we can have a heterogenous list – but we can only perform operations which are supported by the Player type. In this sense, the Player type encodes our intent.

Conclusion

In this post, we’ve seen how to create heterogenous lists in Haskell. However, contrary to dynamic languages, we can only do so provided we are explicit about our intent. That means we get the safety of strong, static types with some added flexibility if we so choose.

If you’re interested in type-level programming, including but not limited to the content of this present post, I strongly recommend Rebecca Skinner’s An Introduction to Type Level Programming

Thanks to Brandon Chinn for some explanation on how to simplify existential types.

In defence of the PhD prelim exam

2021-06-12T00:00:00Z

In the department of Physics at McGill University, there are a few requirements for graduation in the PhD program. One of these requirements is to pass the preliminary examination, or prelim for short, at the end of the first year¹. This type of examination is becoming rarer across North America. The Physics department has been discussing the modernization of the prelim, either by changing its format or removing it entirely.

In this post, I want to explain what the prelim is and why I think its essence should be preserved.

What is the prelim?

The prelim in its pre-COVID-19 form is a 6h sit-down exam, split in two 3h sessions. It aims to test students’ mastery of Physics concepts at the undergraduate level. At McGill, there are four themes of questions:

Classical mechanics and special relativity;
Thermodynamics and statistical mechanics;
Electromagnetism;
Quantum mechanics.

The first 3h session is composed of 16 short questions, 10 of which must be answered. Some of the short questions are conceptual, while other involve a small calculations. Here is an example of a short question from the year I passed the prelim:

Imagine a planet being a long infinite solid cylinder of radius $R$ with a mass per unit length $\Lambda$ . The matter is uniformly distributed over its radius. Find the potential and gravitational field everywhere, i.e. inside and outside the cylinder, and sketch the field lines.

The second 3h session is composed of 8 long questions, split evenly among the four themes. Four questions must be answered (no more!), with at least one question from each theme. Here is an example of a long question from the year I passed the prelim:

A simple 1-dimensional model for an ionic crystal (such as NaCl) consists of an array of $N$ point charges in a straight line, alternately $+e$ and $−e$ and each at a distance $a$ from its nearest neighbours. If $N$ is very large, find the potential energy of a charge in the middle of the row and of one at the end of the row in the form $\alpha e^2/(4\pi \epsilon_0 a)$ .

I passed the prelim exam in 2018. For the curious, here are all the questions from that year: short (PDF) and long (PDF). The department of Physics also keeps a record of the prelim questions going back to 1996. Senior undergraduates are well-equipped to answer prelim questions. The difficulty comes from the breath of possible questions, as well as the time constraint.

A test of competence

Of course, the prelim is only one of the requirements on the way to earn a doctoral degree. Most importantly, PhD students need to write a dissertation and defend its content in front of a committee of experts. So why have the prelim at all?

The prelim serves as a way to ensure that all PhD students have a certain level of competence in all historical areas of Physics. Evaluating students for admission to the Physics department is inherently hard because it is difficult to compare academic records from different institutions across the world.

Earning a PhD makes you an expert in a narrow subject. Passing the prelim indicates that students have a baseline knowledge across all historical Physics disciplines.

Proposed alternative: the comprehensive examination

Not every department in the McGill Faculty of Science requires PhD students to pass a prelim exam. Another popular alternative, in use in the Chemistry department for example, is the so-called comprehensive examination².

The structure of the comprehensive exam varies across departments, but generally it involves the student writing a multi-page project proposal and defending this proposal in front of a committee of faculty members. In the course of the comprehensive exam, committee members may ask the student any question related to their research topic.

A comprehensive exam has two attractive attributes. First, its scope is closer to students’ area of research. Second, a large part of the comprehensive (the project proposal) can be done offline, without the pressure of being timed.

In defence of the prelim

The prelim is a stressful event. Not everyone is comfortable in a sit-down exam setting. A PhD career can end because someone slept poorly the night before the exam. I support any and all adjustments to the current prelim format to make the experience more accessible in this sense.

My main objection with replacing the prelim with something closer to the comprehensive exam is the functionalization of education. Removing the prelim eliminates the incentive to have a baseline knowledge across Physics. It encourages PhD students to have an even narrower set of skills, making the PhD program more focused around the resulting dissertation.

The comprehensive exam is inherently about making students’ experience more focused on their research area. This is appealing from students point-of-view: why should they have to go out of their way to stay aware about classical mechanics, something which they might never use? The comprehensive exam (in the format that I have described above) streamlines the requirements for graduation.

The graduate student experience is about much more than the resulting dissertation. We want our students to be more than just experts in their narrow fields; we also want them to be ready to contribute to society beyond their immediate expertise. Does the prelim ensure that this is the case? Of course not. But removing the prelim sends the wrong message about what it means to graduate with a PhD.

On a personal note, the prelim made me review all of my undergraduate studies. I purchased the Feynman Lectures on Physics and read all three volumes. With a Masters’ degree under my belt, I was able to appreciate my learnings under a new light, even though I haven’t used most of it since then. While I cannot say that the exam was fun, the studying experience was definitely one of the highlights of my PhD.

Other institutions might call it the qualifying examination.↩︎
Again, this might have other names at other institutions.↩︎

Harnessing symmetry to find the center of a diffraction pattern

2021-01-23T00:00:00Z

Ultrafast electron diffraction involves the analysis of diffraction patterns. Here is an example diffraction pattern for a thin (<100nm) flake of graphite¹:

A diffraction pattern is effectively the intensity of the Fourier transform. Given that crystals like graphite are well-ordered, the diffraction peaks (i.e. Fourier components) are very large. You can see that the diffraction pattern is six-fold symmetric; that’s because the atoms in graphite arrange themselves in a honeycomb pattern, which is also six-fold symmetric. In these experiments, the fundamental Fourier component is so strong that we need to block it. That’s what that black beam-block is about.

There are crystals that are not as well-ordered as graphite. Think of a powder made of many small crystallites, each being about 50nm x 50nm x 50nm. Diffraction electrons through a sample like that results in a kind of average of all possible diffraction patterns. Here’s an example with polycrystalline Chromium:

Each ring in the above pattern pattern corresponds to a Fourier component. Notice again how symmetric the pattern is; the material itself is symmetric enough that the fundamental Fourier component needs to be blocked.

For my work on iris-ued, a data analysis package for ultrafast electron scattering, I needed to find a reliable, automatic way to get the center of such diffraction patterns to get rid of the manual work required now. So let’s see how!

First try: center of mass

A first naive attempt might start with the center-of-mass, i.e. the average of pixel positions weighted by their intensity. Since intensity is symmetric about the center, the center-of-mass should coincide with the actual physical center of the image.

Good news, scipy’s ndimage module exports such a function: center_of_mass. Let’s try it:

Demonstration of using scipy.ndimage.center_of_mass to find the center of diffraction patterns. (Source code)

Not bad! Especially in the first image, really not a bad first try. But I’m looking for something ~~pixel-perfect~~ much closer. Intuitively, the beam-block in each image should mess with the calculation of the center of mass. Let’s define the following areas that we would like to ignore:

Masks are generally defined as boolean arrays with True (or 1) where pixels are valid, and False (or 0) where pixels are invalid. Therefore, we should ignore the weight of masked pixels. scipy.ndimage.center_of_mass does not support this feature; we need an extension of center_of_mass:

def center_of_mass_masked(im, mask):
    rr, cc = np.indices(im.shape)
    weights = im * mask.astype(im.dtype)

    r = np.average(rr, weights=weights)
    c = np.average(cc, weights=weights)
    return r, c

This is effectively an average of the row and column coordinates (rr and cc) weighted by the image intensity. The trick here is that mask.astype(im.dtype) is 0 where pixels are “invalid”; therefore they don’t count in the average! Let’s look at the result:

Demonstration of using center_of_mass_masked (see above) to find the center of diffraction patterns. (Source code)

I’m not sure if it’s looking better, honestly. But at least we have an approximate center! That’s a good starting point that feeds in to the next step.

Friedel pairs and radial inversion symmetry

In his thesis², which is now also a book, Nelson Liu describes how he does it:

A rough estimate of its position is obtained by calculating the ‘centre of intensity’ or intensity-weighted arithmetic mean of the position of > 100 random points uniformly distributed over the masked image; this is used to match diffraction spots into Friedel pairs amongst those found earlier. By averaging the midpoint of the lines connecting these pairs of points, a more accurate position of the centre is obtained.

Friedel pairs are peaks related by inversion through the center of the diffraction pattern. The existence of these pairs is guaranteed by crystal symmetry. For polycrystalline patterns, Friedel pairs are averaged into rings; rings are always inversion-symmetric about their centers. Here’s an example of two Friedel pairs:

The algorithm by Liu was meant for single-crystal diffraction patterns with well-defined peaks, and not so much for rings. However, we can distill Liu’s idea into a new, more general approach. If the approximate center coincides with the actual center of the image, then the image should be invariant under radial-inversion with respect to the approximate center. Said another way: if the image $I$ is defined on polar coordinates $(r, \theta)$ , then the center maximizes correlation between $I(r, \theta)$ and $I(-r, \theta)$ . Thankfully, computing the masked correlation between images is something I’ve worked on before!

Let’s look at what radial inversion looks like. There are ways to do it with interpolation, e.g. scikit-image’s warp function. However, in my testing, this is incredibly slow compared to what I will show you. A faster approach is to consider that if the image was centered on the array, then radial inversion is really flipping the direction of the array axes; that is, if the image array I has size (128, 128), and the center is at (64, 64), the radial inverse of I is I[::-1, ::-1] (numpy) / flip(flip(I, 1), 2) (MATLAB) / I[end:-1:1,end:-1:1] (Julia). Another important note is that if the approximate center of the image is far from the center of the array, the overlap between the image and its radial inverse is limited. Consider this:

If we cropped out the bright areas around the frame, then the approximate center found would coincide with the center of the array; then, radial inversion is very fast.

Now, especially for the right column of images, it’s pretty clear that the approximate center wasn’t perfect. The correction to the approximate center is can be calculated with the masked normalized cross-correlation³ ⁴:

The cross-correlation in the bottom right corner (zoomed by 2x) shows that the true center is the approximate center we found earlier, corrected by the small shift (white arrow)! For single-crystal diffraction patterns, the resulting is even more striking:

We can put the two steps together:

Bonus: low-quality diffraction

Here’s a fun consequence: the technique works also for diffraction patterns that are pretty crappy and very far off center, provided that the asymmetry in the background is taken care-of:

Conclusion

In this post, we have determined a robust way to compute the center of a diffraction pattern without any parameters, by making use of a strong invariant: radial inversion symmetry. My favourite part: this method admits no free parameters!

If you want to make use of this, take a look at autocenter, a new function that has been added to scikit-ued.

L.P. René de Cotret et al, Time- and momentum-resolved phonon population dynamics with ultrafast electron diffuse scattering, Phys. Rev. B 100 (2019) DOI: 10.1103/PhysRevB.100.214115.↩︎
Liu, Lai Chung. Chemistry in Action: Making Molecular Movies with Ultrafast Electron Diffraction and Data Science. University of Toronto, 2019.↩︎
Dirk Padfield. Masked object registration in the Fourier domain. IEEE Transactions on Image Processing, 21(5):2706–2718, 2012. DOI: 10.1109/TIP.2011.2181402 ↩︎
Dirk Padfield. Masked FFT registration. Prov. Computer Vision and Pattern Recognition. pp 2918-2925 (2010). DOI:10.1109/CVPR.2010.5540032 ↩︎

$A$	$B$	$A ~ \& ~ B$
$C$	$C$	$C$
$C$	$W$	$C$
$C$	$H$	$C$
$W$	$C$	$C$
$W$	$W$	$W$
$W$	$H$	$W$
$H$	$C$	$C$
$H$	$W$	$W$
$H$	$H$	$H$

$A$	$B$	$A ~ \& ~ B$	$A ~ \| ~ B$
$C$	$C$	$C$	$C$
$C$	$W$	$C$	$W$
$C$	$H$	$C$	$H$
$W$	$C$	$C$	$W$
$W$	$W$	$W$	$W$
$W$	$H$	$W$	$H$
$H$	$C$	$C$	$H$
$H$	$W$	$W$	$H$
$H$	$H$	$H$	$H$

$A$	$B$	$A ~ \& ~ B$
$C$	$C$	$C$
$C$	$W$	$C$
$C$	$H$	$C$
$W$	$C$	$C$
$W$	$W$	$W$
$W$	$H$	$W$
$H$	$C$	$C$
$H$	$W$	$W$
$H$	$H$	$H$

$A$	$B$	$A ~ \& ~ B$	$A ~ \| ~ B$
$C$	$C$	$C$	$C$
$C$	$W$	$C$	$W$
$C$	$H$	$C$	$H$
$W$	$C$	$C$	$W$
$W$	$W$	$W$	$W$
$W$	$H$	$W$	$H$
$H$	$C$	$C$	$H$
$H$	$W$	$W$	$H$
$H$	$H$	$H$	$H$

Laurent's personal blog

Starting a new adventure with Powerweave

What is the North American power grid?

Local electricity distribution

So what is Powerweave?

The near future

Trading strategies with typed features using Haskell and type families

The fundamentals of trading strategies

Trading strategies in Haskell

More expressive strategies

Typed features and their parametrization

Conclusion

The algebraic structure of a trading stop-loss system

Efficient rolling statistics

Recurrence relations and the rolling average

Haskell implementation

Complexity analysis

Efficient rolling variance

Haskell implementation

Complexity analysis

Bonus: rolling Sharpe ratio

Conclusion

Filtering noise with discrete wavelet transforms

On this page

Integral transforms

Discretization

Using the discrete Fourier transform to filter noise

Discrete wavelet transforms

Filtering using the discrete wavelet transform

Conclusion

Chesterton's fence and why I'm not sold on the blockchain

Exploring the multiverse of possibilities all at once using monads

An example problem

Potential solutions as a decision graph

Expressing the multiverse of solutions in Haskell

Conclusion

Can you make heterogeneous lists in Haskell? Sure — as long as your intent is clear

A more complex example

Enter existential quantification

A bit more functionality

Conclusion

In defence of the PhD prelim exam

What is the prelim?

A test of competence

Proposed alternative: the comprehensive examination

In defence of the prelim

Harnessing symmetry to find the center of a diffraction pattern

First try: center of mass

Friedel pairs and radial inversion symmetry

Bonus: low-quality diffraction

Conclusion

$A$	$B$	$A ~ \& ~ B$
$C$	$C$	$C$
$C$	$W$	$C$
$C$	$H$	$C$
$W$	$C$	$C$
$W$	$W$	$W$
$W$	$H$	$W$
$H$	$C$	$C$
$H$	$W$	$W$
$H$	$H$	$H$

$A$	$B$	$A ~ \& ~ B$	$A ~ \| ~ B$
$C$	$C$	$C$	$C$
$C$	$W$	$C$	$W$
$C$	$H$	$C$	$H$
$W$	$C$	$C$	$W$
$W$	$W$	$W$	$W$
$W$	$H$	$W$	$H$
$H$	$C$	$C$	$H$
$H$	$W$	$W$	$H$
$H$	$H$	$H$	$H$