RR Rated M For Mature
HOME   rrTV-PHOTO   GALLERIES   MY GALLERY   HELP-FAQ
myHOME PM pmRR MEMBERS 893 ONLINE 18 EVENTS SEARCH REGISTER  START HERE
 
12 pages [ <<    <     1     ( 2 )     3      4     NEXT    >> ]20235 viewsPOST REPLY
Heli Wholesaler . JR-Spektrum . E-flite

.
.
Radio - Servo - Gyro - Gov - Batt > Homebrew PCM Receiver: QPSK/RF Design
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
QPSK Architecture, Part 1.1?

G'Day!

[Warning! This post is way too long, fairly complex, and will probably be of little interest to most folks. I include it here only to help those few folks who might be interested in seeing some of the thought processes that you sometimes go through while creating a software-defined radio project. Note, too, that I haven't even tried these ideas out yet - you're seeing an interative design process occur at the same time I'm learning; some ideas will work, and some won't! Please accept my apologies if this is just too far out!]

Well, I was hoping to write part II, but it turns out that we're not ready yet. In a nutshell, the problem is CPU performance. As time progresses, I'm discovering that this QPSK project is totally unlike the receiver I just completed at work. It's not worth explaining in detail, but believe it or not, we'll be using radically different algorithms for all eight of the processing steps listed above! Of these, filtering is the most problematic. On the big QPSK receiver, filtering was performed in hardware, but if possible, I'd prefer to perform it in software here for cost effectiveness. While I was going to delay further discussion on this for a while, it's just too big of a concern to ignore.

As hinted at previously, the front-end square root raised cosine filter is currently a "128-tap FIR filter", meaning that it takes 128 multiplications and 127 additions per sample. With 96,000 samples/second, that's 12,288,000 multiply and accumulates (MACs) per second. Unfortunately, ARM CPUs aren't very good at performing MACs - in this application, it'll take about 10 CPU clock cycles/MAC. That means that filtering alone would require a 123 MHz ARM CPU, but we're only using a 60 MHz CPU! Clearly, we need some drastic efficiency improvements somewhere.

One tempting way out would be to use a DSP chip. DSPs in the $8 price range can perform about 800M MACs/second (in particular, the Analog Devices Blackfin is a great candidate here). Moreover, with glueless interfaces to 96 KHz A/D and D/As, DMA, etc, we could just ignore the complexity of the problem and crunch right through it! This is worth keeping in mind, but before we change horses mid-race, let's first see if it's possible to figure out a way to work smarter, instead of harder.

As I mentioned before, the first attempt at solving this didn't work: changing the SRRC filter into polyphase form did reduce CPU requirements, but it didn't measure up from an RF performance perspective. There's another way we might approach this, and to understand it, we need to talk about "sample rate conversion", which is simply the process of changing from one sampling rate to another.

Start with the basics. For modulation or demodulation, we just need one sample/symbol, or 6,000 samples/second. For symbol timing recovery, and most of the other functions in the receiver, we'll need two samples/symbol, or 12,000 samples/second. At the digital/analog interface, though, we've got that pesky 96,000 samples/second. It'd be nice if we could figure out a way to do as little work as possible at the 96,000 samples/second rate, and do everything else except modulation and demodulation at 12,000 samples/second (mod and demod will always be done at 6,000 samples/second).

Enter "interpolation" and "decimation". Interpolation is the process of converting from a lower sample rate to a higher sample rate, and decimation is the process of moving from a higher sample rate to a lower sample rate. Now, we could just duplicate samples to upconvert, or drop samples to downconvert, but as it turns out, quite counter-intuitively, that that adds "spectral images" and "aliasing" components that muck up the output signal, and will deliver terrible RF performance. Instead, the right thing to do is to use a filtering operation in either direction; the appropriate filters to do that are unsurprisingly called an "interpolation filter" and a "decimation filter". While there are dozens of different approaches to creating these filters, they essentially are all variations on a basic low-pass filter.

The idea is this. Instead of trying to create a 96 KHz polyphase SRRC filter in one step, what about splitting it into two steps? In step one, we'll square-root raised cosine filter the transmit data at two samples/symbol, or 12,000 samples/second. Next, we could use an interpolation filter to convert those 12,000 samples/second to the output 96,000 sample/second rate.

Wait, though, how could using two filters make life easier than using one filter? Well, if the 2 sample/symbol SRRC filter required, say, 32 MACs, and the interpolation filter required perhaps 10 MACs, then we'd need a total of just 32/(96/12) + 10 = 14 MACs/output sample component (for each of I & Q), which would be about 4.5X efficiency improvement from the 128 MACs/output sample required for the brute-force approach! Will it work? I don't know yet, but I will let you know once I've tried it!

Another interesting possible simplification comes to mind. I'd mentioned before that the easiest way to combine I & Q into a single carrier was at 4X the I.F. frequency (thanks to that 90 degree delay in Q), but maybe we can try another approach here, too. This one's a bit funkier, but it might be worth trying, as well. Basic [Nyquist] theory says that to transmit a 24 KHz carrier in a bandwidth-limited channel, we only need 48K samples/second, not 96K samples/second. Since we'll already be using an interpolator to generate the output samples (assuming that is, that the previous step works!), we could in fact add a 90 degree offset to Q almost "for free" by simply telling the interpolator to add an additional 1/2 sample to it's delay target for the Q samples.

What's that again? Some kinds of interpolators work by synthesizing a new output sample value in-between two input samples; and the amount of time delay is expressed as a fraction between 0.0 (which would output the first sample), and 1.0 (which would output the second sample). With this kind of interpolator, we could just add 0.5 to the Q delay to easily add a 90 degree offset to Q!

[Almost. The 90 degree Q delay refers to a 90 degree delay at the 24 KHz carrier/I.F. frequency, not at the 2 sample/symbol input frequency. So, assuming that we're interpolating from 2 samples/symbol (12 KHz) to 8 samples/symbol (48 KHz), then we'd actually tell the interpolator to delay Q by an additional 0.5/((8/2)/2) = 0.25, or, 1/4th of an input sample.]

If, indeed this second approach will work, assuming the same filter lengths as before, then we'd need 32 / (48/12) + 10 MACs = 18 MACs/output sample, but that's at 48 KHz instead of 96 KHz, so we'd have the equivalent of just 9 MACs/96 KHz output sample component, for each of I & Q.

[Yet another minor glitch to consider. When we combine the I & Q waveforms onto the I.F. carrier, we do so by first multiplying I by cos(), and then Q by sin(), and then adding them together. At 4X the I.F., cos() is conveniently 1, 0, -1, 0; while sin() is 0, 1, 0, -1. So at 4X I.F., the multiplies disappear, and we just alternately output I & Q, negating I & Q every other sample pair. By moving to a 2X I.F., it'll become necessary to multiply the Q samples by the sin() of a 90/(96/48) = 45 degree offset first. Thus, the total complexity for this approach becomes the equivalent of 9.5 MACs/96 KHz output component, for each of I & Q.]

If I cross my fingers and all of this actually works, we'd have a reduction from 200% of the CPU for filtering, down to just 30% (48000 * (18+19) * 10 / 60000000) of the CPU! As I'm writing this, it sounds too good to be true, but the only way to know if either of these approaches will work is to go and code them up. Hopefully, I'll get the chance to check this out within a couple of days. Once I do, I'll pass on the results; good, bad, or ugly!

Cheers!
MarkF

CORRECTION: I'd initially neglected to include I & Q in the CPU overhead computation, and had guessed it would be 15% instead of 30%. This is now fixed above.
06-29-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, Gang!

Fascinating! After some experimenting, I was able to come up with a pair of filters that seem to do the trick, at least for the first solution above. The square-root raised cosine filter uses 48 taps with an alpha of 0.35, and then I applied a Hanning window, and quantized the filter coefficients to 16 bit precision. After quantization, there are 40 non-zero filter coefficients left. For the interpolation filter, I'm still playing around with different filter types, but a 32-tap Parks-McClellan low-pass filter with a Hanning window (passband 3900 Hz, cutoff 6500 Hz) has 30 non-zero taps after quantization.

[Terminology break. "Windowing" a filter means to multiply the filter coefficients by a window function, and in this case, the Hanning window is just a cosine curve. Think of the positive half of a sine or cosine pulse (low at the ends, high in the middle). By multiplying the filter coefficients by this pulse shape, the primary region of frequencies that the filter passes is widened slightly, but the region of frequencies that the filter rejects, are rejected even more strongly/deeply. When circumstances permit their use, they help shorter filters perform like longer ones. Quantization just means to round each filter coefficient to the nearest integer; in the process, some coefficients become 0, letting you skip the need to multiply that particular filter tap/coefficient.]

What this means from a performance perspective is that for each symbol to be transmitted, the QPSK receiver will need to execute (40 [MACs/SRRC filter] + 30 [MACs/Interpolation filter]) * 2 [I & Q] * 2 [samples/symbol] * 6000 [symbols/second] = 1,680,000 MACs/second for filtering. At about 10 clocks/MAC, that means that the front-end filtering operation will require 1680000 * 10 / 60000000, or about 28% of the CPU. The bottom line is that it worked!

As far as the second idea goes (delaying Q by 90 degrees in the interpolator), I haven't yet found a solution. I gave it a shot, but the types of interpolator that let you specify the precise variable delay (like "lagrangian" interpolators and "sinc" interpolators), don't make very good low-pass filters, and the output spectrum stinks. So, we'll have to keep this idea on the back-burner, for now, though I may resurrect it later.

Still, by taking advantage of the fact that we can perform different computations at different sample rates, thanks to interpolation, we've managed to reduce the compute requirement for the receiver's front-end to less than 1/7th of that required for the original "obvious" solution. That makes for a pretty good day!

Have Fun!
MarkF
06-30-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
Jeff H
Key Veteran
Location: Cincinnati, OH

My Posts This: Topic  Forum
This is a little over my head.


06-30-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   GALLERY
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, Jeff!

I honestly am sorry for all the techno-babble; as I'm learning myself, the process of creating software-defined radio requires getting into a pretty wide range of stuff. As I've been posting this thread, I'm simultaneously feeling that I'm not going into enough detail about the technology, and that I'm going into too much detail.

Since your question was apparently prompted by the last couple of posts about filters, I'll go into a bit more detail about what digital filters are for, and how they work.

So far, we've covered that we'll be sending 12,000 bits/second between the transmitter and the receiver. To do so, the interface between the digital world and the analog world will be performed at 96,000 (probably 16-bit) samples/second in both the transmitter and receiver. Nyquist's fundamental communications theory explains that a signal that is sampled at 96 KHz can only carry frequencies from 0-48 KHz. In the receiver, this means that a conventional analog hardware filter must be used to eliminate all frequencies that are greater than 48 KHz before the signal is digitized. In fact, all A/D and D/A conversions require this sort of filtering.

Once a signal is in the digital domain, the same rules apply. In the case of the QPSK receiver, there will be frequencies between 0 and 48 KHz present, but if you look at the baseband QPSK modulation spectrum in the sixth post of the first page of this thread, you'll see that the only frequencies that are of interest are between 0 and about 4 KHz. What you don't see in that chart is what would happen in the real world - the entire spectrum would contain additional noise and interference - that's life with R.F.. If the receiver tried to do its thing with this "raw" digitized signal, everything between 4 KHz and 48 KHz would interfere with the signal that it was trying to receive, and the result would be terrible range, if it worked at all.

To make things work, the receiver has to remove all the "irrelevant" frequencies between about 4 KHz and 48 KHz, and this is done with a digital filter. While there's a lot of scary terminology around digital filters, they're actually pretty simple. By far the most common type of digital filter is called a Finite Impulse Response filter, or FIR filter. FIR filters are nothing more than a sequence of numbers - the length of the sequence is called the number of "taps" in the filter, and each of the numbers is called a "filter coefficient" (so, a 128-tap FIR filter would have 128 filter coefficients). Let's say that we need a 128-tap FIR filter to perform a particular filtering job - in that case, the FIR filter would be nothing more than a sequence of 128 numbers. To apply the filter, these 128 coefficients would be multiplied by the last 128 input samples (one coefficient per input sample), and the results of these 128 multiplications are summed together. The sum of the coefficients times the input samples is the output of the filter. This process is then repeated every time a new input sample comes in from the A/D converter.

If the right filter coefficients were selected for the filter, the output of the filter will be the same as the input of the filter, except that all the frequencies that we aren't interested in will have been removed by the filter, leaving only the "interesting" signal information. In this case, if you were to plot the received signal, it'd only contain information between 0 and about 4 KHz, and the receiver would be ready to do its job.

There are two important things to consider, though. The first is picking the right filter coefficients, and for most digital communications, the "optimum" filter design is known as a square-root raised cosine (SRRC) filter. This is just another basic FIR filter, and it's only the particular filter coefficients that are selected that make it an SRRC filter. Selecting the right filter coefficients can be a bear to do by hand, but is quite simple using filter design programs like Iowegian's excellent ScopeFIR filter design program, which is what I use for filter designs.

The second problem is how much effort it takes the CPU to do all those multiplies. This QPSK receiver started with the filter I just described: a 128-tap SRRC-type FIR filter. However, performing those 128 multiplies and 127 additions 96,000 times/second would take twice as much CPU horsepower as the CPU chip has, so it was necessary to make things simpler.

To do that, the trick was to split up the filtering job into two separate tasks. While the earlier discussion about splitting this job into two concentrated on the transmitter, the same simplification effort also applies to the receiver, so let's recast that discussion in terms of the receiver.

[To be continued in the next post...]
06-30-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
[...continued from prior post...]

The first thing that happens in the receiver is that the signal will be digitized at 96 KHz. With the initial design, the receiver would then apply a 128-tap SRRC filter to each input sample. Just like the transmitter, though, that would more than overwhelm the CPU. To reduce the amount of work that'll be required to filter the signal, the task is once again split into two phases. In the receiver, though, the order of these two operations is reversed, as compared to the transmitter. Instead of executing the SRRC filter, and then a sample rate conversion, it'll instead perform a sample rate conversion, and then execute the SRRC filter.

To perform sample rate conversion in the receiver, the incoming signal needs to be converted from 96,000 samples/second down to 12,000 samples/second. Unfortunately, the same frequency problems mentioned earlier apply. The input signal contains frequency information between 0 and 48 KHz, yet a 12,000 sample/second signal can only properly hold information between 0 and 6 KHz, so the sample rate conversion will need some type of filtering, too.

In the transmitter, where we were increasing the sample rate, an interpolation filter was used to filter the signal (an interpolation filter is just another type of FIR filter; in fact, it's a pretty normal low-pass filter). In the receiver, the analagous filter that's used for decreasing the sample rate is called a decimation filter (yet another basic FIR low-pass filter). If you think about it, the characteristics of the decimation filter are clear: it'll need to remove all information that's greater than 6 KHz, so that the 12,000 output samples/second are clean. That's exactly the kind of filter we use, and it's name, a Parks-MacClellan low-pass filter, is nothing more than one way of coming up with a set of low-pass filter coefficients.

What we do is to apply the decimation filter to the 96,000 samples/second input signal, and what comes out is 12,000 samples/second, with the information between 6 KHz and 48 KHz removed from it. [This is done via an efficient polyphase filtering operation that we'll discuss in a bit. For the moment, just think of this as a simple 32-tap filtering operation].

After the signal has been converted to 12,000 samples/second, we then apply a 48-tap SRRC filter to the result. This filter applies the optimum square-root raised cosine shape to the signal, and also removes the information from about 4 KHz to 6 KHz. This 48-tap SRRC filter is far more efficient than the original 128-tap filter, since there are only 48 multiples and 47 additions per input sample, instead of the 128+127 required before. In addition it only needs to be run 12,000 times 2 (for I & Q) = 24,000 samples/second, rather than 96,000 times/second. In other words, the CPU has a lot less work to do.

I mentioned that the sample rate conversion filters were executed in polyphase form. Without going into a lot of unnecessary detail, the essential idea of a polyphase filter is to split up a filter into segments and to compute each segment separately, with each segment yielding its own output. While polyphase filters can't be used everywhere (as I found, they don't work well for SRRC filters), they are particularly useful during sample rate conversion.

To give a quick example of how a polyphase filter works, the 32-tap interpolation filter that was used in the transmitter was actually converting between 12,000 samples/second and 48,000 samples/second [bear with me, we'll explain why it wasn't 96,000 samples/second below]. As a result, there are four output samples for every input sample. With the polyphase approach, the 32 filter coefficients were actually split into four separate eight-tap filters, and each of these mini-filters are multiplied by their own set of input samples. In place of getting just one output per 32 multiplies, the polyphase approach lets us yield four outputs for the same 32 multiplies - it's magic! For a very easy-to-read explanation on exactly how to implement a polyphase filter, check out DSP Guru's Interpolation FAQ.

Now, why was the filter changing between 12,000 and 48,000 samples/second, instead of 12,000 and 96,000 samples/second? The reason has to do with how the process of converting between a baseband QPSK signal and a carrier-based QPSK signal works - in particular, when it's being done at 4X the Intermediate Frequency. At 96 KHz, the code to do this conversion is really simple: it sequentially outputs an I sample, then a Q sample, then -1 times the next I sample, then -1 times the next Q sample, and loops. The receiver does the same thing: it digitizes at 96 KHz, but the first sample it sees is declared to be the first I sample, the second input is the first Q sample, the third is the second I sample times -1, and the fourth is the second Q sample times -1. So, the signal filters actually only need to run at 48 KHz, despite the fact that we're running at a 96 KHz sample rate. I also hope that you'll realize that the analog filters used for A/D and D/A filtering actually need to filter out frequencies greater than 24 KHz, rather than the 48 KHz I mentioned earlier (I hope you'll forgive me for that one, but it was just a simplification until we could get around to explaining the carrier process).

The final two terms I think I'd used include windowing and quantization. To be brief, the Hanning-window mentioned is just the process of multiplying a set of filter coefficients by a cosine pulse shape, one time during the filter design process. This changes the way the filter works, by modifying its frequency response in a way that can help a shorter filter provide better rejection characteristics (again, windowing isn't always useful, depending on the frequency response needed for an application).

As far as quantization goes, that's just fitting the set of filter coefficients into a specific number of bits - 16 bits, here. When this happens, some of the filter coefficients that were close to 0 might actually become 0. If you're then implementing a filter with some coefficients equal to 0, you can write the code to skip the need to multiply those coefficients times their corresponding input sample, and instead just skip that input sample. That's why the 48-tap SRRC filter was described as having 40 non-zero taps, and the 32-tap interpolation filter was described as having 30 non-zero taps.

Ultimately, was this all worth it? With the original 48,000 * 2 * 128 Multiply/Accumulates needed per second, we needed to perform 12,288,000 MACs/.second. With the new approach, we actually perform (30+40) * 2 * 12,000 = 1,680,000 MACs/second, less than 1/7th the effort. All in all, a very significant improvement!

I hope this helps explain in a bit more detail - if not, please let me know what areas you'd like to see clarified!

Cheers!
MarkF
06-30-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
hughesdt
Heliman
Location: Broomfield, CO

My Posts This: Topic  Forum
Great Stuff

I haven’t thought about this stiff since college. Very interesting topic, I hope that you keep it going to a conclusion.


Great Job with all the discriotions and reasoning!
Dan
06-30-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, Dan!

Thank you very much - I truly appreciate your feedback! I know this thread is pretty far afield of the usual topics on RunRyder, but I also know another thing. When I first started to try to understand software radio so that I could begin this project, I looked everywhere trying to figure out how all the pieces fit together. Try as I might, I couldn't find anything that tried to explain the big picture. Sure, I found thousands of papers that talked about one element or another, but few that tried to explain to me how to use them together.

This thread is my attempt to do just that, for other folks who might be curious about the same thing. To be clear, this is an extremely broad and deep topic, and I can't begin to be able to explain it all - if for no other reason, I certainly don't know or understand it all! Instead, my goal is just to introduce enough background information so that when folks who want to learn more decide to go a-Googling, they'll hopefully have the context to be able to do so more efficiently than I was able to. Amazingly enough, at this moment in time, if you type in "QPSK receiver" in Google, this thread already pops up as the 13th hit out of 37,100, and that makes me truly happy. If we're really fortunate, perhaps it'll even bring a few more folks to RunRyder!

Thanks again, Dan! I know that folks are reading this thread, but it means a lot to know that someone's actually finding it worthwhile!

Best Wishes!
MarkF
07-01-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkFSenior Heliman - Location: Palo Alto, CA - My Posts This: Topic  Forum
Howdy Folks!

While reviewing the QPSK Architecture, Part I post, I realized that I'd mentioned the QPSK "constellation diagram", but hadn't shown what it looks like here (it was shown in the original Homebrew Receiver thread, but that's been quite a while).

Perhaps the best way to understand what QPSK really "looks like" is to plot a QPSK waveform with the I signal on the horizontal axis, and the Q signal on the vertical axis (this is called "X-Y" mode on oscilloscopes). With the same audio QPSK transmitter that we've been discussing all along, here's what its (baseband) constellation diagram looks like:



As you'll notice, there are four points where the signal "bunches up" and the different signal paths merge: those four points are the four possible symbol values that can be sent in QPSK. All the lines and arcs that connect the four points show what happens to the signal at times while it's transitioning from one symbol value to another. So, the "symbol timing recovery" process we mentioned earlier is all about trying to find the point in time where the signal is instantaneously sitting at one of those four constellation points, instead of flying around towards the next one.

If you think of the constellation diagram as a "compass" (so-called polar notation), then the signal phases of these four points are located at 45 degrees, 135 degrees, 225 degrees, and 315 degrees. At baseband, these signals are always nicely aligned just like you see here. However, if you modulate the signal onto a carrier, the inevitable frequency difference between the transmitter and receiver causes an interesting problem. Instead of the constellation diagram shown above, from the receiver's point of view, the constellation will be rotating around the origin. At any specific instant, the four points will still be spaced at 90 degrees, but they could be any multiple of 90 degrees. For example, at some point, the receiver will see symbols 0, 1, 2, and 3 at phase offsets 267, 357, 87, and 177!

This introduces two more challenges. The first is that the receiver will need to rotate the constellation diagram (i.e. rotate the incoming signals around the origin), so that the symbols are centered in each quadrant of the circle (i.e. to 45, 135, 225 and 315 degrees). This is done to make it simpler to decode a signal in the presence of noise. Noise introduced during transmission will move individual symbols away from those perfect four phase offsets, distributing them randomly. To demodulate a signal, the receiver has to figure out where the nearest "ideal" phase point is. Theoretically, this is equivalent to performing a "Euclidean" distance search between an incoming data point and all four ideal constellation points, then selecting the shortest distance. You can do this if you'd like to, but there's a much simpler way to accomplish the same thing. If the constellation is rotated as above, with the 90 degree phase points offset by 45 degrees, then the nearest ideal phase point is found just by taking the signs of the incoming I & Q sample values!

The second major challenge introduced by carrier frequency error is that we don't know where symbol 0 is - at any given point in time, symbol 0 could be in any of the four signal quadrants, as the receiver sees it.

The easiest way to solve this problem is to use "differential encoding" of the transmitted signal, so that instead of translating a quadrant number into received data, we translate the difference between quadrant numbers as the received data, In fact, the most common differential coding is as follows:

TX: Differential phase
0:...(No change)
1:...+90 degrees
2:....-90 degrees
3:.+180 degrees

As long as the transmitter and the receiver agree on the same set of rules, then the receiver can recover the input data stream regardless of where "quadrant 0" has been rotated to.

Now, it's important to clarify one key point. Differential encoding is not the same as differential detection. If you think about it, you'll realize that if the signal is differentially encoded, it isn't actually necessary to rotate the incoming samples at all. Instead, the receiver can just measure the phase of each incoming signal, compare it to the phase of the prior symbol, and voila, it can figure out what data was sent. This does work, and in fact this simple approach is quite commonly used. Unfortunately, you pay an R.F. penalty for this simplification. As mentioned previously, differential detection loses about 2.5 dB of R.F. performance.

While 2.5 dB may not sound like much, it can mean more than a 20X increase in errors at low signal/noise ratios! Consequently, this receiver will take advantage of differential encoding to keep track of where "quadrant 0" is, but it will go through the more complex process of symbol derotation before making demodulation decisions. This so-called "coherent" demodulation will buy back most of that 2.5 dB of R.F. performance. This will give us better signal range, and better control during those wild aerobatic routines!

Cheers!
MarkF
07-01-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hello!

I thought I'd point out one interesting bit of trivia for folks who might be curious about it. It is possible to predetermine exactly how fast the constellation will be rotating, or at least what the upper bound of the rotation rate is.

Our R/C radios require a frequency accuracy of 20 parts per million (PPM). At 75 MHz, that means that the transmitter could be off by as much as +/-1,500 Hz, and the receiver could be off by as much as +/-1,500 Hz. If both radios are off by the maximum amount in different directions, then the worst-case could be a 3,000 Hz error between the transmitter and receiver. That 3,000 Hz error means that the constellation would actually be rotating 3,000 times per second.

At a symbol rate of 6,000 symbols/second, the worst case could actually yield a 180 degree constellation rotation for each received symbol! As you might expect, this could create a serious challenge trying to figure out what information was actually being sent. Ummm... let's see, was that 180 degree phase difference caused by sending a '3', or was it caused by sending a '0' while the constellation rotated by 180 degrees? This reinforces the need for this QPSK receiver to track the transmitter's carrier frequency, and to derotate the symbols before making demodulation decisions.

There's another very serious consequence of this much frequency error. If you think back to the spectrum analyzer displays, it's pretty clear that a 3,000 Hz error could cause us to completely miss nearly 30% of the signal bandwidth in the receiver - so much so that the receiver might not even work! There are three ways we can solve this problem.

The first method is obvious, but it's rather difficult and expensive to implement. Reduce the frequency error! Unfortunately, getting down to frequency errors in the 5 PPM range requires expensive temperature- controlled oscillators that suck lots of power - like 1 watt for each oscillator alone; these buggers have little ovens in them! In fact, each oscillator would need as much power as the transmitter's output power amplifier will need, so this just isn't a viable solution.

The second possible method is to just widen the receiver's input filters so that it will accept a +/- 3,000 Hz frequency error. This method is the most commonly used, but it has some very nasty side-effects. Wider filters let in more noise and interference, reducing the receiver's range, and increasing the likelihood that interference will trash the real signal.

The third method is somewhat of a hybrid. If we use a wider filter in the receiver's front-end, and then apply a digital phase-locked loop and symbol derotator to remove the frequency error, we could then apply the narrow SRRC filter to the frequency-corrected signal. While this is more work than most radios go to, it is a nearly perfect fit to the efficient two-stage filtering approach that was discussed previously. I'm very intrigued by the further gain in R.F. performance this might make possible, and will definitely attempt to implement this in the Omega 16 receiver - hopefully, we'll be able to afford the compute power!

Have Fun!
MarkF
07-01-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Howdy!

As I've done a bit more digging, the "third approach" mentioned above isn't actually necessary - at least not in the form I described it. I'd been concerned that the narrow SRRC filter wouldn't pass enough of the signal to be able to frequency-correct it such that the entire signal would pass the SRRC filter. However, I've learned that the "classical" approach to QPSK receivers simply places the phase-locked loop after the SRRC filter, so it just isn't necessary to take advantage of the wider bandwidth that's available between this receiver's sample rate converter and SRRC filter. As a result, it's looking more and more like we're evolving towards a fairly "traditional" RF processing chain, though this one will be executed primarily in software. Not "traditional" as in conventional R/C gear (which offers rather poor R.F. performance), but "traditional" as compared to contemporary data transmission systems.

My next step will be to create a block diagram that'll provide a visual overview of the complete transmission system, including signal processing chains for both the transmitter and the receiver. As soon as that's done, I'll post it here.

Cheers!
MarkF
07-02-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Happy Independence Day!

Well, folks, every time I think I'm ready to finalize the signal processing chain, I run into another "learning opportunity" ! As I've mentioned, one of the goals of this project has been to move as much of the functionality as possible into software. In that regard, I'd hoped to avoid the need for hardware frequency tracking by using the software carrier phase tracking loop discussed above. Unfortunately, a software loop won't be enough.

The reason this is a problem is the magnitude of the error. With a combined TX/RX carrier frequency error of up to 50% of the symbol rate, the software loop would have to be able to track errors of up to +/- 50%. This is not good for two reasons. First, tracking loops like phase-locked loops (PLLs) just don't work with such a huge tracking range: they achieve stability by tracking errors that are at most ~10%, and more often, less than a few percent of the center frequency.

The second problem is that even if a software PLL could track 50% errors, it wouldn't necessarily lock on to the actual error rate, but could also lock on to the wrong signal phase. Since the S/W tracking loop operates at baseband, it has to operate without the benefit of seeing the actual carrier frequency. Thanks to the fact that baseband QPSK has four possible symbol phases, the loop could think it was properly tracking at any of those four phases. Here, that means that the tracking loop couldn't tell the difference between being on-frequency, off by +/- 1.5 KHz, or off by +/- 3 KHz.

The only solution is to add a hardware frequency tracking solution that works at frequencies that are high enough that the worst-case error is a small percentage of the carrier rate. One very good candidate would be to perform this at the receiver's first Intermediate frequency, which will probably be at 10.7 MHz. Though this section isn't designed yet, it will probably contain a 10 KHz wide crystal filter that's centered on 10.7 MHz. By adding a hardware carrier tracking loop at 10.7 MHz, the maximum error it would have to track would be (20ppm + 20ppm) * 75 MHz / 10.7 MHz, or about .028%, a much more reasonable error rate to track. This has the further benefit of ensuring that the crystal filter itself doesn't throw away part of the valid signal when the TX and RX aren't aligned.

There's another interesting possibility here that could be exploited. If a single "master oscillator" is used to drive the digital logic and to derive the 10.7 MHz H/W tracking loop's "reference clock", then it might even be possible to eliminate the need for any further carrier frequency tracking. If the I.F. clocks (at 10.7 MHz and 455 KHz) are both digitally divided from the master oscillator, then all that would be left would be a constant phase error that would be easy to correct for. If, however, the I.F. references are created by (yet another) PLL from the master oscillator, then a S/W carrier tracking loop will still be required, since clock PLLs introduce a fair amount of periodic low-frequency error.

Ultimately, we'll no doubt keep the S/W carrier tracking loop, since it's relatively inexpensive to compute, and will handle any residual phase or frequency errors that are left. However, it definitely will be necessary to incorporate a traditional hardware carrier tracking loop - most commonly referred to as an AFC (for Automatic Frequency Control).

One of these days, we may actually get there... Meanwhile, it's time for barbeque and fireworks! Have a Happy Fourth of July!

Cheers!
MarkF
07-04-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, Gang!

At long last, I've finally finished creating Version 0.1 of the Omega 16 QPSK TX & RX S/W Architecture diagram (click here) [This has been updated to V0.3 for reference; check below for later versions]. One thing I've definitely learned is that it's a darned good thing that I don't have to make a living as an illustrator, because I'd starve to death! I'm never gonna' admit how long it took me to create this bugger.

Still, here's the first full pictorial view of how all this software processing will fit together, or at least my first view of how it might fit together!

Do note that this is a simplification of the real beasties, but it's a good overview of what I'm hoping to eventually achieve from an R.F. perspective. This doesn't yet cover any of the hardware design elements, which will come (probably quite a bit) later on. Neither does this rehash the various command processing tasks that the receiver performs: this chart focuses solely on QPSK and RF software processing issues.

I hope that this is useful for someone, like maybe you? If you do have any questions, comments, or feedback, I'll look forward to hearing them!

Have Fun!
MarkF

P.S. I've updated to V0.2, which fixes a couple of minor readability issues, and adds some more details in the symbol timing recovery section. FYI!
07-13-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi!

It wasn't until I posted it that I realized that the block diagram had been simplified a bit too much. As a result, the link now points to V0.3, which adds the Automatic Gain Control (AGC), and also references the sample index advance/retard mechanism that's needed during symbol timing recovery.

About the only thing that I can think of that'll intentionally be left out of the block diagram (for space purposes) is the low-battery failsafe mechanism, which hopefully is pretty self-explanatory, anyway.

I realize that the block diagram could use pages and pages of explanation, but I would like to start coding one of these days. As a result, I'll hold off on a general writeup until later on. Nonetheless, I'll be happy to answer any specific questions you may have - just let me know!

Cheers!
MarkF
07-13-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Let's Improve Efficiency!

Hi, Gang!

Well, it's a good thing that I didn't just start coding, after all! After looking at the block diagram as a whole, I've realized that there are some important efficiency improvements that can be made at the architecture level.

First, after thinking over the poor results with the polyphase Square-Root Raised Cosine filter on the transmitter, I realized that the situation on the receiver is not the same as that in the TX (contrary to my earlier claim that the RX needed the exact same filtering operation as the TX). What makes the difference are the signal magnitudes.

In the transmitter, we're having to reduce out-of-channel interference by more than 60 dB. Since decibels are in a logarithmic scale, that means that we're trying to reduce the power of the "unnecessary" information transmitted by a factor of (10^(60/10)) = 1,000,000! While the FCC may need this sort of reduction to keep the airwaves "clean", the receiver only needs to filter out enough of the irrelevant information to prevent accumulated noise+interference from altering a '0' or a '1' decision. Consequently, a much simpler filter should suffice. This is terrific, because it reopens the possibility of using a polyphase filter for the receiver, which means that more CPU time will be freed up. Note that we'll still use the two-part filtering approach, with sample rate conversion followed by the SRRC filter. It's just that the receiver's SRRC filter will be even further simplified! With the reintroduction of the polyphase SRRC filter, other areas can be simplified, too.

This next simplification may be a little hard to follow, but I'll try to walk through it a step at a time - it'll help if you have a hardcopy of the RF Software Architecture in your hands as you read this. To begin with, in this application, the order of the "Carrier Derotation", "AGC+Indexing", and "Interpolator" blocks is relatively unimportant: changing the sequence of these three blocks shouldn't have a significant effect on R.F. performance (what is important is that the input to the two tracking loops follows all three of these blocks). Therefore, let's start by changing the order to become "Interpolator", then "Carrier Derotation", then "AGC + Indexing".

While you'll have to imagine this for now, what the new block diagram will show is a polyphase SRRC filter followed immediately by an Interpolator, and that's interesting. Why? I'd mentioned earlier that polyphase filters are divided up into 'N' mini-filters, each of which has its own output. What I didn't mention is that each of these outputs is the equivalent of running the full (i.e. non-polyphase) filter at a specific phase offset from the start of the existing samples. Now, what's interesting is that an interpolator's entire purpose is to synthesize a new sample value at a specific phase offset from the start of the existing samples. In other words, the polyphase filter can do what the interpolator is doing, without doing any additional work! It will be necessary to ensure that the polyphase filter has enough "mini-filter" segments that the step in delay between two outputs is small enough, but that's simply a matter of storing more filter coefficients (the delay between two sequential segments of the polyphase filter is just the sample time divided by the number of polyphase filter segments).

To make that a bit more concrete, let's say that it was necessary to interpolate to 1/64th of a sample time (that is 1/128th of a symbol time) for sufficiently accurate data recovery. Further, let's guess that ten filter taps will be sufficient in each polyphase filter segment for good performance (this'll have to be verified later on, but it's a reasonable starting point). To design this filter, we design a standard SRRC filter that's 64 * 10 =640 taps long. These 640 filter coefficients are then split into 64 groups of ten coefficients, as per the directions on the Interpolator FAQ website I linked to earlier. With this change, each time we receive a new sample value, the output of the "Symbol Timing Recovery" block that used to connect to the Interpolator is instead used simply to select which of the polyphase filter segments is executed! Voila - no need for an interpolator. Cool!

This is another significant efficiency improvement. In addition to the fact that the polyphase filter approach will reduce the number of multiply-accumulates (MACs) in the SRRC filter by perhaps a factor of four (from perhaps 40 to 10 MACs each for I & Q), this also eliminates the Interpolator block completely, which will save another 8-12 MACs/sample (4-6 MACs each for I & Q). Great stuff!

Now, assuming that this isn't just a case of temporary wishful/delusional thinking, I'll update the block diagram in a day or so and will upload the new version once it's ready. This time, I'll keep the original link intact, so that you can compare the two approaches more easily.

Despite my anxiousness to start coding, the old adage proves true: to write better code, spend more time thinking first, and less time coding later on!

Have Fun!
MarkF
07-15-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Howdy!

As promised, I've uploaded Version 0.4 of the RF Software Architecture. In addition to the changes noted above, I reorganized the drawing a bit to help make things a little clearer (like eliminating the redundant sign blocks, and folding the gain setting into the Symbol Timing Recovery PLL itself).

As always, questions, comments or suggestions are welcomed!

Cheers!
MarkF
07-15-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

My Posts This: Topic  Forum
Hi Mark

Did you somewhere document the frame layout? Just curious what you created there

Do you plan to use the standard trainer output to hook up the transmitter? Maybe you wrote about it somewhere but forgive me that I didn't read all your posts
07-15-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE   GALLERY
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, W.Pasman!

The current frame format is best documented in the source code that I've posted, but here's the current state (as posted in the "Omega 16 Source Code" thread, though, I'm considering a protocol change that will support a variable number of channels - this is the just original frame format).

Since both the QPSK and FSK systems use differential coding, a full frame is comprised of two sub-frames. Each subframe is 94 bits long, with a very tiny 4-bit preamble, followed by nine 10-bit data words. For QPSK, the bitrate is 12,000 bits/second. All transmissions are least-significant bit first.

For even frames, the preamble data pattern is 1010b, and for odd frames, the preamble is 0101b; the preamble inverts every subframe. The following are the current formats for the data words:
  1. Command word: Bit-mapped, with bit 0 indicating whether this is an Even (0) or Odd(1) subframe. Bit 1 differentiates between Normal (0) and Failsafe-Command (1) frames. Bits 2-9 contain the settings for the eight switch channels - Min (0) or Max (1). The switch channels are numbered from 8-15 out of the total 16 channels that are supported.
  2. Absolute PCM code: Contains 10-bit PCM value for channel 0 during even frames, and for channel 4 during odd frames.
  3. Absolute PCM code: Contains 10-bit PCM value for channel 1 during even frames, and for channel 5 during odd frames.
  4. Absolute PCM code: Contains 10-bit PCM value for channel 2 during even frames, and for channel 6 during odd frames.
  5. Absolute PCM code: Contains 10-bit PCM value for channel 3 during even frames, and for channel 7 during odd frames.
  6. Differential PCM code: Contains two packed five-bit differentially coded values for channels 4-5 during even frames, and for channels 0-1 during odd frames.
  7. Differential PCM code: Contains two packed five-bit differentially coded values for channels 6-7 during even frames, and for channels 2-3 during odd frames.
  8. L.S. CRC: Least-significant 10 bits of a 20-bit Cyclic Redundancy Code that protects against data errors.
  9. M.S. CRC: The final word in each subframe contains the most-significant 10 bits of the 20-bit CRC.

Note the (not coincidental) fact that the LSB of the first data word (the control word) is the even/odd frame flag. As a result, the receiver actually scans for a data pattern of 01010b during even frames, and for 10101b during odd frames. A fairly sophisticated delay-locked loop is used to rapidly lock-on to the actual preamble, despite the extremely short preamble pattern.

The first two bits of the command field are sent "in the clear", but the remaining 88 bits of the data field are randomized by XORing them with a random sequence generated by a Linear Feedback Shift-Register (LFSR). The preamble itself is not randomized (or else we wouldn't find it!), and the LFSR resets itself after each pair of subframes is sent.

That's it! There's a more complex transmission format that's used during the Failsafe-Command frames that sets, on a servo-by-servo basis, whether to Set/Hold state, the desired set point, and the servo speed on that output. The receiver code now simultaneously supports mixed 60 Hz and 120 Hz framerate servo outputs, so that it will work with most older analog servos, while still supporting current digital servos.

Your question on the trainer output is a good one, but in this case, it won't work. The speeds that we're running at will require new transmitter hardware (actually, in addition to speed, this requires a 16-bit DAC, new R.F. filters, and a Class A/linear RF amplifier in the transmitter). As a result, it'll be necessary to create both new transmit and receive R.F. sections. I'm currently planning on modifying an Airtronics RD8000 to accomodate this functionality. I'll probably begin on 53 MHz, since I already have an Advanced class ham license, but I've also contacted the FCC about obtaining a "Part 5" experimental license that'll let me work legally in the 72 MHz band.

If your question is just about the basic receiver functionality, that's tested by connecting the output of a PC soundcard to the receiver's input - I have a test version of the transmitter code that runs on a PC which sends the FSK/QPSK output signal through the soundcard. For FSK, all that will be needed to receive the signal is to add a simple data slicer on the Nohau ARM prototype board. For QPSK, I'll need a new platform that includes a DMA interface to a 16-bit, 96 KHz sample rate, A/D converter. While I'd initially hoped to be able to use simple PC codecs back when the transmitter was producing baseband, most of these parts contain on-chip 20 KHz low-pass filters that'll wipe out the 24 KHz Intermediate Frequency carrier that's now produced by the transmit code. Consequently, we'll use more traditional instrumentation-type A/D and D/A converters.

Does that help?

Have Fun!
MarkF
07-15-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

My Posts This: Topic  Forum
Hi Mark

Yes it's clear. In short, you have a similar frame as Futaba uses but you use only one CRC instead of four and leave out the 6to10 coding. Cuts the frame in half.
Then you double the data rate.
Total gain around four times.

Nice!

Yes indeed the trainer port is useless with the current rate. But have you tried overclocking? Maybe you can double the frequency and get the data at the rate you need? I think making a radio computer program would be a major effort comparible to your current efforts to build a radio system!
07-16-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE   GALLERY
 
 
morbid
Heliman
Location: Sweden

My Posts This: Topic  Forum
Great work. If i only understood half of it =)

Oh well, i have to work on my RF theory =) havent used it since highschool =)


//morbid
07-16-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

My Posts This: Topic  Forum
Hi, Folks!

w.pasman: Thank you very much! Unfortunately, overclocking won't work here, for quite a few reasons. First, to minimize the cost of the external hardware that will be required in the transmitter, the output frequency of the software portion of the transmitter is a modulated 24 KHz carrier. While I could try to inject the I & Q baseband signals, there would have to be two separate data inputs (and there's only one!). Trying to inject the modulated carrier is just as hard, in the sense that it's about six times higher in frequency than the (presumably digital) trainer input. It will need a D/A converter, as well. The next limitation is that the crystal frequencies in an existing radio will be wrong, since we're starting with a modulated carrier. Beyond that, the amplifier in today's gear is a non-linear amplifier that will trash a QPSK signal's bandwidth. Finally, the actual hardware filter components will also have to be changed to match these actual frequencies.

So, while I realize that I signed up for a lot more work, I'll be creating both a transmitter and a receiver. Now, I haven't yet decided whether I'll write a full transmitter stack or not. The first step was to write enough transmitter code to develop a functional link that sends random data which matches the timing and framing format described above, and that's now up and running on the PC. Believe it or not, this is all the transmit code that's needed to get the receiver up and running, all the way through to certification if I wanted to!

Obviously, I'll hold off on any certification attempts until after the equipment's been very thoroughly field-tested. At that point, it'll be necessary to write my own TX code and/or to partner with someone like Angelos. I'm not going to worry about that too much now, but it sure would be a shame to deliver all these low latencies, only to have the transmitter throw away >20 mS performing simple mixing that can be accomplished in less than 2 mS. That's the biggest argument for creating my own TX code: my real goal is to deliver the fastest end-to-end link around, and I doubt that any existing commercial TX code is fast enough to make sense with this receiver (until this receiver, there just hasn't been a need to go particularly fast). We'll see what happens. For now, though, I've got enough TX code done to develop the rest of the receiver.

Morbid: Don't feel alone! While I did have a background in RF thanks to being a ham radio operator, before I started this project I didn't know squat about software-defined radio, and it's only the process of slogging through it that's helping me to learn (thank God for Google - without it, I'd be nowhere). I will keep trying to cover enough material so that folks who want to learn more will hopefully have enough context to help them out. However you put it, though, this is a big hill to climb! Sorry about that.

Thank you very much to both of you! No bull, I really appreciate it when you take the time to drop by - with questions or comments, or just to say "Hi"! Good luck to you both!

Best Wishes!
MarkF
07-16-2004 Over year old.
PROFILE   PM   EMAIL   POSTS   BUDDY   IGNORE   HOMEPAGE  
 
 
12 pages [ <<    <     1     ( 2 )     3      4     NEXT    >> ]20235 viewsPOST REPLY
Fast Lad Performance . Ace Hobby . Esprit Model

.
.
Radio - Servo - Gyro - Gov - Batt > Homebrew PCM Receiver: QPSK/RF Design
 Print TOPIC Advertisers 

Subscribe to This Topic

Monday, November 23 - 7:05 pm - Copyright © 2000 - 2009 runryder.com | email | link to rr | START HERE | NF