## Getting the best of 32 bit

For general discussions related to SynthMaker.

Moderators: electrogear, exonerate

### Getting the best of 32 bit

Hello all!

I know there has been a lot of discussion about bit depth (mainly requests for 64 bit lately), but I've been wondering if there could be any possibilities to use the 32 bits more efficiently, since that's all we have so far. I'm really not an expert on this, but after reading some information about bit depths, I had some ideas.

First of all, I believe 32 bit sound consists of 24 bit mantissa and 8 bit exponent (roughly), and values above 1 are the exponent. Am I right? I understand exponent is rarely used for sound, but why not use it to get more bits?

Oversampling is possible in SM when the in-between samples are being calculated side by side at the same time. Could this be possible on the bit domain? Isn't this what double precision actually does or am I completely wrong? Of course the output will still be 32 bit on SM (at this time), but maybe we could get more double precision tools to work with, if we figure out how to do this?

Please let me know if there are already answers out there or if I'm otherwise just being an idiot
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

nmakinen wrote:First of all, I believe 32 bit sound consists of 24 bit mantissa and 8 bit exponent (roughly), and values above 1 are the exponent. Am I right? I understand exponent is rarely used for sound, but why not use it to get more bits?

To be precise, the mantissa only contains 23 bits (the leading 1 is implied) and there is a "sign" bit as well. The exponent value represents the "power of two" that the mantissa is multiplied. It's range is -127 to 128. You need to keep in mind that those "negative" exponents are very important to DSP. If we didn't have the negative exponents, we wouldn't be able to represent numbers with an absolute value less than one. So unfortunately, there's really no advantage to circumvent the IEEE standard. Having said that, with a little creativity there are many tricks available at the bit level. May I suggest you download the little toolkit that I posted. In it, there is an IEEE754 converter that might help you visualize some of the possibilities.

-cyto

cyto
essemilian

Posts: 317
Joined: Sun Nov 28, 2010 4:36 am
Location: CIN | OH | USA

### Re: Getting the best of 32 bit

Thank you Cyto, the IEEE754 converter seems very handy.

So, I was completely wrong about the exponent stuff. But I've been thinking, could the calculations be more accurate if the signal level was multiplied for the time of processing and then divided back to normal level again after that? Does this possibly just introduce more rounding errors? I remember reading about something like this months ago but couldn't find any topics about it anymore.

By the way, theoretically, if an effect made in SM does all the processing / calculations in double precision, can you honestly say it has "64 bit internal processing"?
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

if you really want you can combine 32 bit numbers together to get greater bit depths. But every calculation will take at least 2 extra calculations so should only be done in assembler for speed reasons.

unfortunately it will significantly add to the brain pain!
- the future sounds better.

Disco_Steve
essemilian

Posts: 396
Joined: Sun May 27, 2007 10:02 am
Location: UK

### Re: Getting the best of 32 bit

could the calculations be more accurate if the signal level was multiplied for the time of processing and then divided back to normal level again after that

It's called normalization. Usually signals are normalized by the powers of two, so no additional error is introduced. Signals are processed at maximum avaliable dynamic range and then scaled back, discarding the least significant bits.

However, that was more of an issue with fixed point / integer operation rather than float. Floats were introduced just for increased dynamic range, which is approx. 3.4E–38 to 3.4E+38 or 380 decibels. In audio you rarely need more than 60, and it will be compressed down to 20 later in the mix.

As a matter of fact, scaling can't help with floating-point numbers unless you are likely to reach upper or lower limit. Only extending to double precision may have some effect. However, the only actual use will be feedback loops (delay, infinite response filters) which may accumulate a large number of tiny fractions from wide range.

Suprisingly, my latest test of modern processor's performance shows that double works FASTER than float. Processors were optimzed for double and using floats actually means conversion to and from double.

Warmonger
essemist

Posts: 172
Joined: Wed Jul 20, 2011 5:40 am
Location: Warsaw, Poland

### Re: Getting the best of 32 bit

nmakinen wrote:multiplied for the time of processing and then divided back to normal level again after that? Does this possibly just introduce more rounding errors?

Yup, you are exactly right.
In general, scaling a signal (as long as you don't go too mad!), just gives you a bigger or smaller exponent - but the mantissa will still have the same 24 bits of precision. So the "delta" (smallest possible change in value), will just scale right along with your number - a bit like zooming a bitmap picture too far, you don't see any extra detail, just bigger pixels.
But it does add an extra multiply and divide to your maths so, as you say, another extra source of rounding errors has been introduced - so making the output less precise than before.

Having said that, you are not totally off the mark about increasing precision - there is maths software out there that can generate incredible precision by spreading values across more memory locations (e.g. the constantly rising world record for doing Pi to a zillion decimal places!).
In the same way, there are some DSP routines that use complex numbers (e.g. FFT's) - the CPU does not handle these natively, so software algorithms are used to add those functions. But even on modern processors these things are just too CPU intensive to use routinely for real-time DSP. For example, a normal 32bit multiply takes a only single clock cycle of the CPU - but if you had to handle all the "carrying over" of bits, trapping overflows and extra memory reads and writes etc. in software, the performance would come tumbling down.

nmakinen wrote:By the way, theoretically, if an effect made in SM does all the processing / calculations in double precision, can you honestly say it has "64 bit internal processing"?

He he, a bit like those cheap speakers that can handle "100W *" until you read the small print where it says "* for approximately 1 nanosecond before the voice coil melts."!
Feel free to use any schematics and algorithms I post on the forum in your own designs - a credit is appreciated (but not a requirement).
Don't stagnate, mutate to create. Without randomness and serendipity the earth would be just another barren rock.

trogluddite
smychopath

Posts: 3025
Joined: Mon Oct 20, 2008 3:52 pm
Location: Yorkshire, UK

### Re: Getting the best of 32 bit

Many thanks Disco_Steve, Warmonger, Trogluddite.

I'm testing Disco_Steve's dither / quantizer (it's been working great!), and the double precision version seems to quantize from 64 bits to 24 bits or less (I believe the numbers are 32 bit floats converted to double precision, and in the end converted back to 32 bit, having the amount of information as the quantizing bit depth allows).

If all the calculations are done in double precision and if this dither module is the last one before output (everything being double precision all the way until the end), shouldn't the outcome be the same as if the plugin was fully 64 bits, and had an identical dither / quantizer to 24 bits or less at the end? Or is this theory destroyed by the lack of 64 bit floats as inputs?

Don't you just love noobish questions?
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

hehe, thanks, that dither is far from perfect or optimally fast, and years old! but its good to hear that someone is using it.

In SM you have to be really careful, but if you don't mind your output being limited to 24bit resolution then it should work.

I'm assuming your making filters etc? and want to get your THD down? getting greater than 24 bit audio into SM in the first place is not exactly easy, (although I haven't used the wave in primitives in a while).

My recommendation would be to build a very basic test bench algo at single and double precision and see if you can tell the difference. if you can, (or can measure it which will be darn hard in SM! as everything is single precision) then continue, else just leave it at 24 bit.

I'm working with the AD shark chips at the mo and they have 28 bit mantissa, if you work hard you don't need any more resolution (to my ears), but 24 can be a touch restricting some times, mainly for the filter rates not for the actual sound!
- the future sounds better.

Disco_Steve
essemilian

Posts: 396
Joined: Sun May 27, 2007 10:02 am
Location: UK

### Re: Getting the best of 32 bit

Disco_Steve, I'm working on a loudness maximizer atm, and I think it's good to have a dither right before output. Is it ok if I use yours? You say it's far from perfect, but is it still decent / do what it's supposed to do? Would it be ok to just leave dithering to the users and not have a dither at all in a loudness maximizer? I'm pretty sure I can't create anything close to your dither anyway so...
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

yea feel free
- the future sounds better.

Disco_Steve
essemilian

Posts: 396
Joined: Sun May 27, 2007 10:02 am
Location: UK

### Re: Getting the best of 32 bit

Thank you!

Just a thought: could the denormal remover also be used to generate some kind of dither noise? I suppose it's not valid for dither?
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

you get quite a bit of fredom actually with Dither noise.

rectangular, triangular Gaussian. shaped.

all work. I've used triangular in that version which leaves a tinny weeny hint of distortion but not worth worrying about.
what matters is that the noise is 'nothing to do with the signal', so if the noise generator isn't an excellent approximation to random then there will be the appearance of correlation and it wont work. Try it and see. It should sound better than without but the distortions wont totally go. So then you have the toss up. Added noise but haven't removed all the distortions, was it worth it?.....only you can decide.....

Dither isn't as simple as just adding noise though. The amplitude of the noise has to be just right. look at my osm to see how it works. (it's been a while!)
- the future sounds better.

Disco_Steve
essemilian

Posts: 396
Joined: Sun May 27, 2007 10:02 am
Location: UK

### Re: Getting the best of 32 bit

Thanks again for the info! This might be going a bit or 32 off topic, so I'll quit bothering you with all these questions about dithering

Correct me if I'm wrong, but I read 32 bit float and 24 bit integer both have exactly the same SNR. First I was thinking the 32 bit float has to have 1 bit less of SNR because it's mantissa is only 23 bits. Then I thought that the sign bit itself doubles the range since it spreads the information to + and - (and doubling adds 1 bit or 6.02dB to the SNR). So 32 bit float after all has the same SNR than 24 bit int. Have I done my homework right? (hopefully my terms are as understandable as they are propably incorrect).

But I also read 32 bit float has the dynamic range identical to 256 bit integer! Doesn't this mean it has a lot more precision than 24 bit? Or is "precision" a wrong word here?
nmakinen
essemer

Posts: 17
Joined: Fri Oct 28, 2011 4:31 am

### Re: Getting the best of 32 bit

nmakinen wrote:Correct me if I'm wrong, but I read 32 bit float and 24 bit integer both have exactly the same SNR.

This is something which I have also seen quoted very often. Like many 'scientific facts' it is often used without any consideration of how the ''real world" differs from the 'idealised' theory.

It is approximately true if you have two signals that are fully modulating the numbers (i.e. -1 to +1 in SM streams).
But if a signal is scaled down, a float can have a reduced exponent and still retain 23 bits of precision in the mantissa - whereas an integer simply loses precision as fewer and fewer of the more significant bits are modulated.
Since SNR is based on ratios, this makes it clear that the SNR of a scaled down integer will increase (the +/-0.5LSB rounding errors are now larger relative to the desired signal). OTOH, the float SNR is barely affected by scaling, as the rounding errors scale along with the desired signal.
This is very important when it comes to summing many small signals (e.g. a mixing desk) - as the errors will accumulate at this stage, leaving the integer version with total 'error noise' that might occupy much more than just the LSB.

As scaling, splitting and summing signals is such a crucial part of DSP, this enhanced precision when using less than full-scale signals gives floats the advantage over integers.

Of course, it is way more complex than that in reality - but the general principle works well for any well designed float algorithm. Even with float signals, different algorithms can have a bias towards magnifying rounding errors, or cancelling them out . For example, a good algorithm will try to order the maths such that each operation takes two operands with closely matched exponents.

The moral of the tale is that SNR really only makes sense in the context of a complete system - once you start using DSP to actually do something with the data, the theoretical case of full-scale signals no longer applies. And we haven't even included the unavoidable noise from the analogue systems at the soundcard output, which is usually much greater than the theoretical +/- 0.5LSB quantising errors...

Matching the dynamic range of floats can certainly be done with integer (or fixed point) numbers, so long as enough bits are used - and yes, there would be greater precision; at least for signals that modulated more than just the lowest 23bits. (of course, double-precision floats would require a ridiculous number of bits for equivalent DR).
But just adding more bits creates massive problems for hardware design, memory usage, and for shifting data around quickly - which is the very reason that floats took over from fixed-point maths pretty early on in PC development.
Feel free to use any schematics and algorithms I post on the forum in your own designs - a credit is appreciated (but not a requirement).
Don't stagnate, mutate to create. Without randomness and serendipity the earth would be just another barren rock.

trogluddite
smychopath

Posts: 3025
Joined: Mon Oct 20, 2008 3:52 pm
Location: Yorkshire, UK