Distilling DDSP: Exploring Real-Time Audio Generation on Embedded Systems
Frequency Modulation (FM) synthesis generates complex timbres by modulating the frequency of a carrier oscillator using another modulator oscillator.
A signal $x[n]$ is expressed as:
$ x[n] = A_c \sin\left(2\pi f_c n T + I \sin\left(2\pi f_m n T\right)\right) $
where $A_c$ and $f_c$ are respectively the amplitude and frequency of the carrier, while $f_m$ is the modulator frequency, and $I$ is the modulation index that determines the spectral complexity. FM synthesis gained prominence in the 1980s with Yamaha’s DX7 synthesizer. It became a standard for generating bright, dynamic timbres in electric piano, bass, brass, and bell sounds.
The DDX7 architecture employs a TCN decoder conditioned on a sequence of pitch and loudness frames to drive the envelopes of a few-oscillator differentiable FM synthesizer that features a fixed FM configuration with fixed frequency ratios, effectively mapping continuous controls of pitched musical instruments to a well-known synthesis architecture.
Reference | Anchor (LPC) | |
---|---|---|
🪈 Flute | ||
🎺 Trumpet | ||
🎻 Violin | ||
🎹 Piano |
Full | Reduced | Reduced+AD | Reduced+CD | |
---|---|---|---|---|
🪈 Flute | ||||
🎺 Trumpet | ||||
🎻 Violin | ||||
🎹 Piano |