Wednesday, March 23, 2016

Parsing Firmware

<< Prev: #include Woes       Next: The Other Shoe Drops >>

After doing a little research and proving that I could load a driver, I figured it was time for some real work. But first, a little background on firmware.

About the Firmware

The firmware for the Intel WiFi cards is distributed as a single file. However, there are at least two major styles of firmware, and each one is broken up into many sections. From a hardware perspective, there's the older models that use DVM firmware vs. the newer models that use MVM firmware. From the firmware file perspective, there's the v1/v2 firmware vs. the TLV firmware. I'm not actually sure whether the DVM/MVM axis is separate from the v1v2/TLV axis.

Since I'm only targeting fairly current hardware and current firmware, I only need to work with MVM hardware and TLV-style firmware. Unfortunately, from all appearances, TLV is the more complex of the two. The large firmware file consists of a header followed by many individual records, each of which has a type and length value and then a variable amount of payload (the size of which is in the length value). In order to be able to load the firmware onto the hardware, you need to parse this file out into all its separate components.

There's a big C function (dominated by an enormous switch statement in a while loop) that handles this. It ends up stashing some values in configuration objects, and copying other sections into freshly-allocated memory that it will hang on to for subsequent restarts or whatever. Then it lets the original file data be freed.

Firmware Parsing Code

So my next step would be calling that function to parse the TLV firmware. This would involve roping a lot of code from the Linux driver into my project. In that code, I found Linux-compiler syntax for byte-alignment in structs, which my OS X compiler was not happy about. It seemed necessary though, because many of the firmware-related structs were packed to map directly to specific bytes in the file, and extra padding would be spoiled that mapping. This effort would test whether I could convert all that successfully, plus hand off from my kext-resource-loading code to the Linux firmware-parsing code.

As I went about pulling in the functions and constants and structs that the firmware parsing used, the project suddenly bloated. I tried to be strategic about what code and files I was bringing into my project, but this referenced that and suddenly I had 10+ files. It wasn't obvious to me why some things were laid out the way they were in the original source, so in some cases I rearranged a bit. I didn't want to try real hard to keep things just as they were in the original driver, because the end goal was to port a lot of it anyway.

For instance, the C code had pretty opaque function tables (structs full of function pointers). In trying to follow the code, I'd land at ops->start(...) but there wasn't any definition of a function "start" anywhere, it was just an entry in this "ops" struct. Then I had to figure out where that was assigned in order to know what the actual function was called in its definition and where that was so I could follow the code. I guess all that makes sense in C, but it seems like an obvious candidate to make a C++ class out of. The C++ code in the other ported drivers I looked at was definitely easier to follow than the original C code.

Bottom line, I copied and rearranged, and soon I had a big pile of code that seemed to be a complete set, but still didn't compile.

Struct Alignment

For whatever reason, it seems that certain hardware is much more efficient at reading and writing values that start on a 32-bit or 64-bit boundary (or other alignments, for less common cases). For instance, on a 64-bit system like OS X, pointers need to be aligned on 64-bit boundaries.

Thus, look at a struct like this on 64-bit OS X:
struct foo {
    UInt32 a;
    some *b;
    SInt16 c;
    some *d;
}
Adding up the field sizes gives 4+8+2+8=22 bytes. But the actual struct takes 4+4(padding)+8+2+6(padding)+8 = 32 bytes. There's padding introduced after "a" to allow "b" to be on a 64-bit boundary, and likewise padding after "c" to allow "d" to be on a 64-bit boundary.

The 64-bit developer docs have an easy suggestion to fix this: just reorder the fields in the struct. If they went b,d,a,c there wouldn't need to be padding.

Well, here's the problem. If a struct with assorted field sizes like that was used to map to a section of a firmware file, I couldn't just reorder the struct without making the fields point to the wrong bytes in the firmware file. And if there isn't padding in the firmware file, there needs to also be no padding in the struct, or again, the fields will point to the wrong bytes in the file.

The Linux driver code makes it work with the __packed keyword like this:
struct iwl_fw_dbg_reg_op {
 u8 op;
 u8 reserved[3];
 __le32 addr;
 __le32 val;
} __packed;

Whereas OS X seems to prefer it with a compiler annotation (pragma) like this:
#pragma pack(1)
struct iwl_fw_dbg_reg_op {
 u8 op;
 u8 reserved[3];
 __le32 addr;
 __le32 val;
};// __packed;
#pragma options align=reset

What made me nervous was the more complicated ones, like the ieee80211.h Linux header file. That one has structs aligned(2), packed structs, and (if you look for struct ieee80211_mgmt) a struct aligned(2) containing packed sub-structs, unions of packed sub-structs, and even a sub-union with packed sub-structs.

I took a swing at converting all that to the corresponding pragma syntax, but I can't say I had any real confidence it would work. You know, what if changing the pragma and back mid-struct didn't work?

Got a better idea? It turns out my entire approach here was flawed. But naturally, I did not discover that until later. We'll come back to struct alignment in a future post.

Memory Allocation

The next problem was that the firmware-parsing logic had at least a little memory allocation going on, and naturally the syntax for that differs as well.

On Linux, it went like this:
pieces = kzalloc(sizeof(*pieces), GFP_KERNEL);
...
kmemdup(pieces->dbg_dest_tlv,
 sizeof(*pieces->dbg_dest_tlv) +
 sizeof(pieces->dbg_dest_tlv->reg_ops[0]) *
 drv->fw.dbg_dest_reg_num, GFP_KERNEL);
...
kfree(pieces);

Well, OS X doesn't have kzalloc, kmemdup, or kfree.

I though at the time kzalloc was the zone allocator. For instance, in OS X, many of the higher-level memory allocation functions pull memory out of "zones" of various sizes (32 bytes, 64 bytes, etc.) and always give you an allocation of that size. Therefor if you ask for 33 bytes, you actually get an allocation of 64 bytes from the next-larger zone.

Well, it turns out I was wrong (more about that, and the corresponding crash, later). But for now, I replaced kzalloc with IOMalloc (which I gather also uses a zone allocator under the covers), and kfree with IOFree. The problem there was that IOFree needs to be passed the original allocation size, so I had to add some fields and logic to track the allocation sizes so that I could use them when the time came around to free. I'm not sure I got that right, so there could be a leak there, but at this point I was mainly aiming for "working" over "working perfectly".

kmemdup was trickier. I found this definition of kmemdup:
void *kmemdup(const void *src, size_t len, gfp_t gfp)
{
    void *p;

    p = kmalloc_track_caller(len, gfp);
    if (p)
        memcpy(p, src, len);
    return p;
}
It looks like it allocates memory and then, if successful, copies the contents of the thing passed to it into that new memory and returns it. Actually, I took the easiest route and just copied the whole function into my project, except I again used IOMalloc instead of the oddball kmalloc_track_caller.

Other Odds and Ends

I did put the firmware parsing logic into a new C++ class. It ended up with a bunch of static utility functions down at the bottom. I could have pulled those into the class and thereby eliminated at least one of the arguments from each one... but I wasn't yet sure whether I wanted to do that. I sort of had it in my mind that I might move them again.

Then I had to comment out all the Linux library imports from the headers I did bring in, and add the linux-porting.h header file that I brought in from another driver port. That handled some things like common constants, macros, and typedefs that people had run into before.

Finally, I commented out a few fields here and there that used data types I wasn't ready to deal with yet. Overall it was a bit of a cleanup operation, but I was trying to avoid any major decisions about how to re-implement things. When in doubt, comment out.

Results

Finally, everything compiled again. I gave it a spin on my test machine, and...

Much to my surprise, the firmware-parsing logic all seemed to pretty much just work! I got a couple debug lines I took to be at least warnings (such as "GSCAN is supported but capabilities TLV is unavailable"), but with a more detailed inspection of the firmware files in a hex editor, it seems to have (and not have) exactly the chunks the parsing code said. Huh.

(Side note: what does it mean when you find success surprising?)

Kext Resource Caching

So then I figured I better try all five firmware files downloadable for the MVM firmware models. I didn't want to go claiming everything was working fine and have it turn out that only one model worked. So I put all the files into the kext Resources and hardcoded the IDs into the driver rather than detecting the real hardware.

Naturally, the first additional model I tried failed. With a little debugging, I found that for some reason my file-loading code refused to load any but the original firmware file.

Eventually I supposed there must be some kind of caching going on so it "remembered" the version of my kext with only the one firmware resource. I rebooted the machine, and surprise, surprise, it all worked. I've since found a passing reference to caching kext resources, but not a full explanation.

From "man kextutil":
-c, -no-caches
  Ignore any repository cache files and scan all kext bundles to
  gather information.  If this option is not given, kextutil
  attempts to use cache files and (when running as root) to create
  them if they are out of date or don't exist.

Sample Code and Output

This version of the code is available here (and a build here). However, it may or may not work for you. The kzalloc thing means that sometimes the counters come out totally bogus, and it seems like a crash follows quickly thereafter.

<< Prev: #include Woes       Next: The Other Shoe Drops >>

No comments:

Post a Comment