[Ground-station] Ground-Station Project: Half-precision Floats

Phil Karn karn at ka9q.net
Mon Jul 30 10:39:24 PDT 2018


On 7/26/18 23:34, Brian Padalino wrote:
> Hi Phil,
> 
> I just joined the Ground-Station mailing list and I was going through
> the archives and I saw your query about half precision floats.  Sorry
> for mailing off list, but I am not really sure how to reply to the query
> on list.  Feel free to respond on-list and we can continue there if
> you'd like.  I should be subscribed now.

OK, I'm CC'ing this response to the list with your message included so
others can see it.

> 
> It looks like your main concern is network throughput for samples, and
> the next concern is fidelity of the signal, and trying to solve the
> scenario when external gain changes the signal level at the receiver and
> maintaining that continuity.

Right.

> 
> I haven't done any analysis on half-precision floats, or tried
> converting sampled signals through them for fidelity analysis, but
> working with FPGA's, I know that dealing with floating point numbers is
> a bit of a pain even in the latest and greatest devices which have hard
> floating-point units on them.

I did my first big DSP project circa 2001 (video codec for Qualcomm).
The Pentium 4 had just come out with its SSE2 instructions that included
good floating point support. Virtually all of my DSP work since then has
also been in floating point; it's well supported by general purpose
computing hardware (if not FPGAs) and I simply don't have to worry about
scaling. I've gotten used to that!

> 
> So what if we took the assumption that we always have a signal between
> -1 and 1 for both I and Q.  We could then define a compressed float as a
> structure consisting of a sign bit, 7 bits for the integer part of the
> negative dB of the signal, and 8 bits for the fractional part of the dB
> of the signal.  This is the same 16-bits as the half precision float and
> would be easier (I think) to implement in an FPGA on the SDR side, and
> the host side can still convert them over to single precision floats
> (probably using a LUT of some type to save on computation) without an issue.

Yes, I also use the -1.0 to +1.0 convention; it seems to be standard
when DSP is done with floating point. But the beauty is that you don't
have to worry about rescaling your signals at every intermediate point
along the way; you can do all your gain adjustments at the very end,
right before you send audio to a D/A and speaker. For digital modes, my
demodulators are either insensitive to level or provide their own AGCs.

Nothing says you *have* to process floating point in your FPGAs. They
probably get their data directly from A/Ds that already produce
integers. Once the FPGA has decimated the signal to a much lower sample
rate, it could then be converted to floating point and sent to general
purpose CPUs (x86-64, ARM) for further processing in that format.

> The other nice part about this in an SDR, and staying in the dB domain,
> would be that any types of gain changes would simply result in an extra
> addition or subtraction so scaling things becomes extremely easy when
> setting amplifier gains.

You're talking about a purely logarithmic signal, then. That has its own
problems; how would I do an FFT or FIR that requires addition?
Multiplication (even floating point) is now quite fast on modern CPUs,
and there are even fused multiply-adds on newer hardware. So I don't see
any problems working with conventional floating point formats that
consist of a (linear) mantissa plus an exponent that simply says where
to put the decimal (binary, actually) point.

I would only use half-precision floats as an interchange format for high
speed I/Q streams where network bandwidth is a factor. Also, these
streams will undergo further bandwidth reduction to that of the signal
being demodulated, so the required SNR of the representation is only
moderate -- somewhere between that of the A/D and the baseband signal
itself.

For example, when I receive narrowband signals with the HackRF I
typically sample at 12.288 MHz and decimate 64:1 in the same processor
that has the USB connection to the HackRF. This 192 kHz stream then goes
by Ethernet to another system that decimates a further 4:1 to 48 kHz as
it applies the matched filter and demodulates the signal. That's still
much wider than, say, SSB or even NBFM, but I stick with 48 kHz because
it's so common in digital audio. (If I want a lower data rate, I'll
compress with the Opus codec, which is built for a nominal 48 kHz sample
rate even when handling comm-quality voice.)

> I put together a quick little C program to look at the feasibility of
> something like this.  It probably has bugs but it mostly seems to work. 
> Some initial observations:
> 
>   - Might only need 6 bits of integer part of the compressed float
>   - Close to 1, the granularity is pretty terrible, so maybe a
> Huffman-like encoded integer part could yield more bits for the
> fractional part for better fidelity

Huffman encoding seems too involved for what I want to do. I'd just like
a convenient interchange format that doesn't increase the overall data
rate too much, and which is easily converted to and from 32-bit single
precision floats for actual processing. (Note the half-precision float
support now being added to x86 processors consists only of instructions
to convert to and from 32-bit floats. It doesn't let you operate
directly on half-precision floats.)

>   - Some other compression might be able to take place and utilize
> 12-bit values, and achieve a bit less network overhead at the expense of
> dynamic range/fidelity
>   - The approach is similar to IEEE-754 using exponents and mantissas,
> but we don't need to express signals which go from 1e-14 to 1e15 - just
> ones that are 0 to 1, so we should be able to do better than half
> precision floats
> 
> Maybe this idea has already passed through your head, and maybe this
> yields worse performance than half-precision floats.  I'm not 100% sure
> myself, but it did seem to fit better in the fixed-point nature of the
> FPGA (log2 is easy, and shift+add for multiply by 3 is easy for a close
> approximation of dB).  Then again, a 64k entry LUT would be easy enough,
> too.
> 
> Looking forward to hearing your thoughts.
> 
> Brian

Thanks for your comments!

Phil




More information about the Ground-Station mailing list