Tuesday, March 29, 2016

Calling C++ from C

<< Prev: Structs and Pointers       Next: Workqueues >>

It turns out I made a little mistake there at the end of the Wait Queues post.

It's not actually a total disaster if C code (ported from Linux) needs to call C++ code (in the OS X driver).

This one I stumbled across completely by accident, by reading someone else's code that does something unrelated.

The magic is in OSMetaClassBase::OSMemberFunctionCast. This feature of a C++ class converts a pointer to one of its member functions (including a reference to the specific instance it operates on) to a C function pointer.

So it goes like this:
  1. The C code declares a function pointer type as a typedef (specifying the number of type of arguments, etc.)
  2. The C code supplies a variable or function argument of that type
  3. The C++ code populates the variable or passes an argument to the function that it creates by calling OSMemberFunctionCast (I guess normally on one of its own member functions)
  4. The C code can call that function like normal

OK, I'm handwaving there a little. I don't yet know how you call a function if you have a function pointer — but that seems like a much easier problem to solve.

I expect that I'll be using this in anger shortly, so there should be a working example coming soon.

<< Prev: Structs and Pointers       Next: Workqueues >>

Monday, March 28, 2016

Structs and Pointers

<< Prev: Wait Queues       Next: Calling C++ From C >>

Next I went to actually try to implement the macro to replace a wait_queue_head_t with an IOLock.

That seemed pretty straightforward:
#define wait_queue_head_t  IOLock

Only, it didn't work:

Error:(352, 20) field has incomplete type 'IOLock' (aka '_IOLock')

That didn't make any sense. IOLock was defined right there in IOLocks.h in the OS X SDK:

typedef struct _IOLock IOLock;

OK, so I guess the question is what is struct _IOLock? I couldn't find it.

Well, maybe I should give a little more context:

#ifdef IOLOCKS_INLINE
typedef lck_mtx_t IOLock;
#else
typedef struct _IOLock IOLock;
#endif /* IOLOCKS_INLINE */

So maybe instead the IOLOCKS_INLINE wasn't set right?

I banged my head against the wall for quite a while, looking for that blasted _IOLock, before I went back and looked at where I had used an IOLock previously:

IOLock *firmwareLoadLock;

There was one key difference there -- that little star.

So I tried:
#define wait_queue_head_t  IOLock *

And what do you know? Problem solved.

So my conclusion here is that struct _IOLock isn't defined in the IOKit header files, the "client API". But you can create a pointer to one, so long as you treat it as opaque. So long as you don't try to go ahead and use any specific fields or otherwise interact with it. That's OK here, because all I do is pass it to other API calls (IOLockLock and etc.), and their implementation functions (which presumably have access to the _IOLock definition) can do the dirty work.

Still, it caused me quite a bit of confusion that you could create a pointer to a thing while not being able to create the thing itself -- and that an API client should just expect that.

Now I know.

<< Prev: Wait Queues       Next: Calling C++ From C >>

Sunday, March 27, 2016

Wait Queues

<< Prev: 67 Steps       Next: Structs and Pointers >>

Well, no sense in waiting. Now that I had a list of the startup procedure, I might as well get cracking.

I had covered the first two entries in the startup list already -- registering to be notified when matching hardware was found or inserted. (That part I hadn't tried to port from the Linux code; I just replaced it with the OS X equivalent.)

So the next step was iwl_trans_pcie_alloc, which had the basic PCIe setup code.

Right at the top, there were three things I needed to deal with: spin locks, a mutex, and a wait queue. Well, OS X has spin locks and mutexes covered pretty easily. I was going to have to take a peek at the syntax and see whether I could use a macro substitution or whether I needed a code replacement, but that should be easy enough either way.

Better, I thought, to start with the harder part, the wait queue. I wasn't that familiar with wait queues, but I found a nice introduction here.

Still, it wasn't exactly obvious to me what this was used for in the PCIe setup. The wait queue I was looking at was used to wait on a firmware write to complete. In comparison, the firmware read code used a completion variable, and other code waited on the completion variable to be set. So what's the difference between a wait queue and a completion variable? (Research ensues...) Ahh. There is no difference. A completion variable is a wrapper around a wait queue. (See the definition of struct completion in here.)

OK, so the bottom line is, one or more threads can end up waiting on some condition. When the condition arrives, whatever code processes that can wake up one or all of the waiting threads.

Now, what would I use on OS X to achieve this?

IOKit Options

It seems like the IOKit lock a.k.a. IOLock has that capability: IOLockSleep or IOLockSleepDeadline, and then IOLockWakeup. Functionally, it mostly seemed like it would work, with one key difference:

Linux Wait Queues
For a Linux wait queue, you can specify a boolean condition, and the thread will keep sleeping until that condition is true. Because the implementation relies on macros, it checks the condition with the code you provide every time the thread is awakened.

Code such as this:
wait_event(some_wait_queue, foo == bar);

Gets transformed by a macro that I'm seriously abbreviating here:
#define ___wait_event(wq, condition, ...)        \
({                                               \
        for (;;) {                               \
                prepare_to_wait_event(wq, ...);  \
                if (condition)                   \
                        break;                   \
                schedule();                      \
        }                                        \
        finish_wait(wq, ...);                    \
})

Into code that I gather looks something like this:
({                                               
        for (;;) {                               
                prepare_to_wait_event(some_wait_queue, ...);
                if (foo == bar)                   
                        break;                   
                schedule();                      
        }                                        
        finish_wait(some_wait_queue, ...);                    
})

In that transformed code, the prepare_to_wait_event call puts the thread into a suspended state (e.g. TASK_UNINTERRUPTIBLE), the finish_wait call sets it back to the usual TASK_RUNNING state, and in between every time the thread wakes up, if the condition isn't yet true, it goes right back to sleep thanks to the schedule() call that will let other threads run since this one is no longer in a runnable state.

IOLock Waiting
The IOLock calls, by contrast, take an event object instead of a condition to test, and only wake the thread if the event object passed to the wake call is the same as the one passed to the sleep call:
IOLockLock(some_lock);
IOLockSleep(some_lock, foo, THREAD_UNINT);
...
IOLockUnlock(some_lock);

// Then in some other thread:
IOLockWakeup(some_lock, foo, true);

This is a lot more straightforward, on the surface. But some of the differences are:
  • The thread that might sleep must acquire the lock first
  • Instead of a separate wait queue for every situation, you could use one lock with multiple event types
  • There's no obvious support for a boolean wait condition

Now in practice, the third issue may or may not matter. In the code I was looking at, the wait condition was a simple boolean variable, and the only time wake_up was called was immediately after that boolean was set to true. In other words, I could skip the condition part altogether in that case. But I saw four wait queues in the project, so if I went this way I either had to convince myself that was true of all four (and pay attention to any future updates), or go ahead and put some kind of macro around the sleep call to check the condition like in Linux. Well, I'd probably need a macro to acquire the lock anyway, so I guess I could just roll it all in there.

But since this is all part of a lock rather than a native wait queue, and would need some massaging in any case, it wasn't obvious that it would be the best fit.

Mach Options

Since IOLock wasn't just a drop-in replacement, I figured I'd better look further. A little poking around revealed a Mach wait_queue_t and a Mach mutex type lck_mtx_t.

I actually found these by inspecting the definitions of the IOLock functions above. I saw, for instance, IOLockGetMachLock which returns a lck_mtx_t. Then I followed IOLockWakeup to thread_wakeup_prim to e.g. wait_queue_wakeup_all which used wait_queue_t and so on.

So then I thought maybe I shouldn't be using the higher-level IOLock wrapper. I could just use a lck_mtx_t directly, and call lck_mtx_sleep and... Well, then, I'd have to call thread_wakeup_prim or something, which sort of spoiled it, because it wasn't very symmetrical and seemed like I would be mixing a mutex API and a thread scheduling API.

So maybe instead of mutexes, I should be just looking at the Mach wait queues, which maybe corresponded more directly to the Linux wait queues. But the Mach wait queues seemed really low-level. There were obvious wakeup methods, but the way to make the current thread wait (sleep) on a wait queue was not obvious. There were both 32-bit and 64-bit versions. I could probably have worked something out, but it was starting to look like IOLock was the convenient abstraction over all this stuff that I wanted.

However, I wasn't ready to commit just yet:

BSD Options


Having found both an IOKit option and some Mach options, I naturally assumed there must be a BSD option. (Why have one API when you can have three for thrice the price?)

Ah, yes. Google led me to a FreeBSD man page, which gave me some search terms for the XNU source, and I ended up with things like sleep and wakeup. Those inverted the parameters a bit in that you provided a "channel" first, which was just a unique address, and a wakeup on a specified channel would wake the threads that went to sleep on that channel.

But there was also a priority, which I didn't really want to deal with. I could set it to 0 ("don't change") except that I wasn't really sure whether or when I should want the thread priority to change. And for waiting on a timeout, you had to specify the timeout as a struct. And there was a char * message parameter, which the XNU source was silent on, but the BSD man page said should be no longer than 6 characters.

All of a sudden there were a lot of decisions to make, for an API that had initially looked simpler. And for all I know there are other BSD options -- this was a pretty cursory search.

All in all, it seemed like IOLock might be a pretty convenient abstraction.

Final Decision


No surprise here -- I decided that I'd try IOLock first. I still haven't decided whether to handle the condition argument, though I will probably take a stab at it.

But this whole thing made me a little grumpy. Why again should there be three or four different APIs for this? (And each a little different.) And why shouldn't there be a page somewhere saying "Here are all your options, and here's when you should use each one."

Mainly, it's not so much this one case, as I fear I'll have to go through this whole process again for every Linux function I need to translate.

And I feel like I dodged a bullet. All the IOLock functions are defined in IOLocks.cpp. Fortunately, the entire thing is wrapped with an extern "C" block, so they work just fine to call from a C program. But what if the next one is really a C++ class instead? I gather I could just change file extensions and compile all the driver code as C++ in order to call other C++ code, but there's no way I'm about to do that. At a minimum I it seems like it would change many of the struct sizes which would make a huge mess. So for the moment, C++ is to be avoided, and I only narrowly managed.

I guess we'll see what the future holds.

<< Prev: 67 Steps       Next: Structs and Pointers >>

Friday, March 25, 2016

67 Steps

<< Prev: Compiler Dysfunction       Next: Wait Queues >>

So there may have been some bumps in the road, but I felt like on the whole, I'd taken two steps forward. Obviously, then, I was due for a step back. (Have no fear, you won't have to wait long for this one...)

Anyway, I figured, what I'd really like to do next is turn on the hardware. You, know, do whatever needs to be done to light it up, even if it doesn't connect to anything. That would be a solid step.

I had the firmware file, identified, loaded, and parsed. But I wasn't exactly clear on how to send that to the hardware. I knew there had to be something in there to set up a block of DMA memory, and to set up an interrupt handler, and things like that. But at best I knew of a few little pieces, without much of a big picture. I wasn't even really sure where it all started in the Linux driver. I mean, I had seen PCI device registration code, module startup code, and so on, but what really kicked everything off?

So, it seemed like the thing to do was find that start point, and trace the code from there until it was "up." I began poking around, and here's what I found:

The iwl_drv_init module initializer registers various PCI device definitions. Then if there's any matching hardware, the kernel calls the registered PCI probe method iwl_pci_probe in a pretty similar manner to OS X, and that's what kicks the whole thing off.

That part was easy enough.

Now iwl_pci_probe calls iwl_trans_pcie_alloc, which if I'm going to be generous, has on the order of 20 things it does. Some of them are the things I expected: Sets up PCIe bus mastering and DMA, registers an interrupt handler, etc. That was going to be challenging enough to convert. Still, even with the extra stuff, I could be up to this challenge. So far so good.

Then iwl_pci_probe calls iwl_drv_start, which mainly kicks off the firmware loading and parsing. That I had mostly already converted. Yay!

But from there, it all sort of exploded.

Toward the end of the firmware loading, it calls the start function provided for the "op mode" (the firmware style, which for now would always be the MVM firmware for me). And that's where it gets ugly.

At that point, the process bounces back and forth between PCIe code, MVM firmware code, and generic driver code. Mostly there are layers upon layers, where the firmware code calls some generic code that resolves to PCIe code, which calls some generic code that resolves to MVM firmware code, and so on. I tried to count only calls that were non-trivial: not just performing some simple logic or calling just one other function or reducing to memory reads or writes. I also didn't count a call to any given function past the first time. I still got to nearly 50 major functions.

Worse were some of the convolutions. Interrupts were going up and down and up and down, there were command packets sent, calls to set various configuration bits spread out all over the place...

I get that there was an effort to make some of this stuff generic and reusable. It's not hardcoded to a specific firmware type, and it's not hardcoded to the PCIe bus (though, why not, since there's not another bus used?). Probably this would look sorta sensible if (when?) I convert it to C++ and you get a generic base class with a couple specific subclasses.

But to try to understand the startup process, it's a huge mess. What are the odds it's really all correct? And if anything goes wrong, how are you supposed to figure out where?

I don't know, I mean, obviously it works when you run this thing in Linux. But the complexity just rubs me the wrong way. Startup is the kind of thing I'd strive to make really simple, not really, er, "normalized."

In practical terms, it's going to be really hard to implement because it touches so many parts of so many files at one point or another. And yes, somewhere it in there it sends the firmware chunks I parsed out to the hardware -- but don't ask me to point to where in the list that happens. I just recall that it's in there somewhere.

So that's the step back. I'm not even talking about connecting to a wireless network yet, I'm just trying to turn the thing on. And all this progress I made on firmware loading, that's less than one step out of fifty. Yippee.

Reference


Here's my list of steps in the startup process. You can say a little prayer for me after you read it. (Even if it's only to the Flying Spaghetti Monster -- I'll take his help too.)

<< Prev: Compiler Dysfunction       Next: Wait Queues >>

Compiler Dysfunction

<< Prev: Linux Upgrade       Next: 67 Steps >>

Pop Quiz: what's the purpose of a compiler?

Three things come to mind:
  • Make it so you don't have to write in assembly
  • Ensure that you use the programming language correctly (e.g. no misspelled keywords or basic syntax errors)
  • Validate the API calls you make (e.g. not calling API functions that don't exist)

I mean, surely there are more, but that's my starter list.

Except the C compiler disagrees with me 33% of the time.

Which is to say, it apparently doesn't give a damn whether the API calls you make exist or not.

Uh, run that one by me again? So I can type code like this, and it totally compiles?

thisFunctionDoesNotExist(17);

Why yes, that's totally OK with this compiler.

Well that's easy -- the compiler is fired. Not doing it's job. I'll go find another one.

Except I don't get to do that, because you kind of have to use the Xcode compiler to build kexts.

This one is a complete mystery to me. You have to declare a struct before you use it, so why don't you have to declare a function before you call it? There exists such a thing called header files, for the express purpose of declaring things before you use them, right? We have this lovely tool called #include just to inform the compiler which header files we would like to reference, right?

OK, yes, I can compile the code and THEN run kextlibs and it can tell me if I'm calling functions that don't exist. But this is not what happens when I push the "build" button and it says "no errors".

Well, in fairness, there is a warning:
Warning:(736, 5) implicit declaration of function
                 'thisFunctionDoesNotExist' is invalid in C99

But there are (in this build) 175 warnings. Last time there were 473. (I don't know how it comes up with these numbers and why they're always different, but as my gripes with the compiler go, that's a pretty minor one.) The point is, there's no way I'm going to notice that one of the 175 or 473 or whatever warnings actually matters.

I'd written another paragraph or two of complaints, but I'll spare you the extra vitriol. There's good news! It turns out that there is a way to make this one warning into an error:
  1. In Xcode, select the Project and go to Build Settings and click All
  2. Look toward the bottom of the giant list under Apple LLVM 7.1 - Warnings - All Languages
  3. Set Treat Missing Function Prototypes as Errors to Yes
Whew!

<< Prev: Linux Upgrade       Next: 67 Steps >>

Linux Upgrade

<< Prev: kzalloc Crash       Next: Compiler Dysfunction >>

About this time, I noticed that Linux had been merrily marching on while I worked. I had started with the driver from Linux 4.4.1, and now they were up to Linux 4.4.3. Not a huge difference, but it seemed like a good time to check my theory that if I kept close to the original source, I could update easily. After all, if that didn't work, this whole approach was flawed.

The basic procedure turned out to be easy enough:

  1. I copied the updated source into the project tree
    • Now the "current state" represented the Linux code
    • A diff against source control might represent the inverse of a change I made -- if I reverted it, I'd be back to my original change
    • Or, a diff against source control might represent a change they made that wasn't in my project yet
  2. I removed the dvm/ directory for the firmware I'm not working with yet
  3. I moved iwl-7000.c and iwl-8000.c to the corresponding .h versions (I had changed them to headers since in my version there is no actual code just a bunch of declarations)
  4. I copied declarations from iwl-drv.c to iwl-drv.h (the iwl_drv struct and etc.)
  5. I copied declarations from pcie/drv.c code into device-list.h (the device list array)
  6. I had to go through file by file and examine the diffs and revert the changes that need it from theirs to mine
    • One common one was #include revisions
    • There were also a variety of minor syntax changes (MALLOC and etc.)
    • Plus any meatier changes, like all the debugging I added to the firmware parsing process

That last part was the tedious part. In the grand scheme of things I've made very few changes, and it was already a lot to review. Using a decent IDE made it all straightforward, but the time-taken-to-benefit ratio was still not in my favor.

Especially when I got through, only to discover that there were no changes except whitespace. Oops. That would have been good to know ahead of time.

Guess what? It turns out that's easy to check ahead of time. Here are the diffs from 4.4.1 to 4.4.3 (none). Whereas here are the diffs from 4.4.3 to today's current 4.4.6 (minor changes).

You know what would be really clever?

Instead of dumping the whole updated source tree and needing to review every single change... I could just apply the individual patches called out by the diffs above. Then instead of needing to review a thousand unchanged bits, I'd only need to massage the actual changed parts.

Someday, I promise, I'll come up with the better approach on the first try. Well, perhaps 'hope' would be a better word. :)

<< Prev: kzalloc Crash       Next: Compiler Dysfunction >>

kzalloc Crash

<< Prev: Thin Air       Next: Linux Upgrade >>


So there I am, merrily testing incremental changes to the driver, when BOOM! The test system restarted.

OK, so it must have been something I just did.

Except, all I just did was rearrange some code.  I still had a bunch of C functions and structs in my C++ class, and I was in the process of eliminating them in favor of the default versions included in the driver code.  I could see compile errors, but how did this cause a runtime error?

Well, one thing I had done was strip out a lot of the debugging messages I had added to the firmware parsing logic. I didn't really think that had caused the failure, but it meant I didn't know how far the process got before it died. So I added all those debug messages back in, along with a little delay at each step so I could read the messages before the restart.

Built: fine. Sent to test machine: fine. Loaded kext: fine. Log messages started pouring out. Everything looked fine. Then it finished, all good.

It worked. And that was the worst, because how could adding log messages or delays fix the problem? If it was one of those completely random errors, I was going to be devastated, because I'm not good enough to track that kind of thing down yet. I couldn't even begin to guess what I might have done to introduce a timing-sensitive or threading-sensitive kind of error.

I unloaded the kext, made some trivial change to add still more debugging, and tried again.

This time it failed, with loads of errors in the firmware parsing. It was fortunate that I was viewing the log remotely, so when the machine died, I still had the text in my terminal:

...
AppleIntelWiFiMVM       0xbcx 19 IWL_UCODE_TLV_SEC_RT
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER -1912832000
...
AppleIntelWiFiMVM    0xa7428x 20 IWL_UCODE_TLV_SEC_INIT
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER -1810778608
...
AppleIntelWiFiMVM   0x19343cx 21 IWL_UCODE_TLV_SEC_WOWLAN
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER 1449124394
...
AppleIntelWiFiMVM Allocate ucode buffers...
AppleIntelWiFiMVM  ERR failed to allocate pci memory
AppleIntelWiFiMVM Firmware parse failed :(

So I poked around a little, starting with these counters with completely bizarre values. Indeed, they were supposed to just increment by one each time a relevant block was found in the firmware. How do you add one by one with a meaningful size file and end up with -1912832000? Except another one was 1449124394. What the heck?

Oh. Oh yeah. How about, where is that counter initialized?

Nowhere, that's where.

So in Java, there are two possibilities for an int. Number one it's a variable automatically initialized to zero (such as an instance variable), or number two you're forced to manually initialize it (such as a method variable). Either way, the compiler has your back.

In C, I guess, not so much.

As best I can figure, the code was assuming the counter was initialized to zero. But if that wasn't done automatically... Then it must have been assuming the memory it received was all zeroes, such that whatever memory the counter int came from started out being zero.

I backed up farther, until I found where I had replaced kzalloc with IOMalloc. IOMalloc doesn't seem to guarantee that the memory will be zeroed. It turns out that kzalloc does. The "z" in kzalloc is not the zone allocator, but the fact that the memory will be zeroed.

Was this plausible? Well, when I booted and loaded the kext, it received clean memory and worked fine. When I then unloaded it and loaded it again, I already know that the memory I get is coming from a zone allocator. So let's imagine it allocates some page of memory, issues chunks from that page to my kext, and then I give them back on unload. Then I load again, asking for chunks of the same size again. There's the page still there, and now with several chunks marked available. My kext gets some of the same chunks back again, or maybe some that some other program used, but either way no longer clean -- they have whatever state they had when freed before. So on the second load, those counters are initialized to garbage contents, and my kext blows up. Pretty much seems like what happened, so yeah, it's plausible.

Then I had to poke around for a memory allocation routine that does zero the memory on OS X. It doesn't help that there are about seventeen options (including at least, kmem_alloc and friends, kalloc and friends, zalloc, OSMalloc and friends, MALLOC, MALLOC_ZONE, IOMalloc and friends, and in C++ code, new). I finally settled on using the MALLOC macro:
MALLOC(pieces, struct iwl_firmware_pieces *,
               sizeof (*pieces), M_TEMP, M_ZERO);

The M_TEMP is because the call requires a "type" and that's a sane default. The M_ZERO calls for zeroed memory. As a bonus, I don't need to pass the allocation size to the FREE macro, so I could whack the extra code I added just to track allocation sizes.

I tried with my new and improved memory allocation, and I could load and unload several times with no problem. It's hard to prove that was the problem and solution, but I haven't seen the same bizarre counter errors since then.

The only issue is that MALLOC requires a parameter of the pointer type, which kzalloc doesn't. So I can't just make the substitution with another macro, I have to put in an actual code change every time. It's possible I should consider using a different allocation and free call combined with memset or something to make this a macro fix instead of a code fix -- but that's a subject for another time.

Oh yeah, and, who thought it was a good idea for an operating system to have this many ways to allocate memory? Sheesh!

<< Prev: Thin Air       Next: Linux Upgrade >>

Wednesday, March 23, 2016

Thin Air

<< Prev: The Linker       Next: kzalloc Crash >>

It was clear I wasn't going to make any progress without finding these missing functions. Eventually, since they were declared in iwl-debug.h, I got around to looking in iwl-debug.c. And this is what I found:

#define __iwl_fn(fn)                                       \
void __iwl_ ##fn(struct device *dev, const char *fmt, ...) \
{                                                          \
    struct va_format vaf = {                               \
        .fmt = fmt,                                        \
    };                                                     \
    va_list args;                                          \
                                                           \
    va_start(args, fmt);                                   \
    vaf.va = &args;                                        \
    dev_ ##fn(dev, "%pV", &vaf);                           \
    trace_iwlwifi_ ##fn(&vaf);                             \
    va_end(args);                                          \
}

__iwl_fn(info)
IWL_EXPORT_SYMBOL(__iwl_info);

This, my friends, is the bytecode manipulation I'm familiar with from Java, only performed in C.  :)   

OK, also performed at compile time.

Nevertheless, it is a macro that generates a function out of thin air.

The idea is, you call the macro like in the second-to-last line above with a parameter info and it generates the function __iwl_info. Then the last line makes sure the function that's never actually plainly stated in the source code is callable. Repeat those last lines a couple times to declare a couple similar functions (_warn, _crit).

OK, so two questions:
  1. Does it work?
  2. Why? Just... Why?

And the answers:
  1. No.
  2. So you can make a few nearly-identical functions without wasting keystrokes...?

It doesn't work because the functions that do the meaty work (e.g. dev_info and trace_iwlwifi_info) don't exist. The first is a Linux API call and the second I haven't ported yet. Also, the va_format struct is (I believe) Linux-specific. So basically the whole thing doesn't work.

Really, I just want to make an IOLog call with a prefix on the message like INFO, DEBUG, WARN, etc.

Why did someone write it this way to begin with? (This one is multiple-choice.)
  • A. It seemed really clever
  • B. It saves space in the executable
  • C. It saved lines of code in the source
  • D. It made it hard for others to follow, until they really think about it, at which point they might appreciate option A.

The only way I can see arguing in favor of this approach is if the answer was option B. But since the macro is processed at compile time and thus actually generates separate functions every time it's called, I think the answer is actually A-C-D. In other words, everything else.

Alternatives, then.

I could fix up all the generic code in the macro, but it would involve defining additional functions like emit_log_dev and emit_log_info that had the specific message handling logic I wanted, so the code in the macro could construct an appropriate call. And I already don't like the macro approach. Why go through this if I have to write the specific log-level handler functions anyway?

So if I wanted to save lines of source code, I could create one master function with a switch on a parameter to control which dev_* (and later trace_iwlwifi_*) functions are called, and then define the three tiny functions that call the master function with appropriate arguments. But even when I fudged the message prefixes to be the same length, half the logic is in the handling of the printf-style arguments, and that would be a mess to try to pick out and pass and use in a master function (I suppose necessitating a struct like va_format). I don't think there would be any savings.

So I'm going to go with the option of just writing three functions. You know, not being overly clever, just writing code that does what I mean for it to do in the functions that are supposed to be called. It's still complicated by the fact that these functions take a variable number of arguments to support printf-style messages, but none of them are that long.

And I get this:
void __iwl_info(struct device *dev, const char *fmt, ...) {
    char buffer[200] = "AppleIntelWiFiMVM INFO ";
    char *remainder = &buffer[23];
    va_list args;
    va_start(args, fmt);
    vsnprintf(remainder, 177, fmt, args);
    va_end(args);
    IOLog(buffer);
}
void __iwl_crit(struct device *dev, const char *fmt, ...) {
    char buffer[200] = "AppleIntelWiFiMVM CRIT ";
    char *remainder = &buffer[23];
    va_list args;
    va_start(args, fmt);
    vsnprintf(remainder, 177, fmt, args);
    va_end(args);
    IOLog(buffer);
}
void __iwl_warn(struct device *dev, const char *fmt, ...) {
    char buffer[200] = "AppleIntelWiFiMVM WARN ";
    char *remainder = &buffer[23];
    va_list args;
    va_start(args, fmt);
    vsnprintf(remainder, 177, fmt, args);
    va_end(args);
    IOLog(buffer);
}

Oh yeah, here's the punch line: The __iwl_dbg and __iwl_err functions take different parameters and have a couple extra lines of logic — so the macro didn't work for them anyway. That means of five short and nearly identical functions, there were two written out, and three hidden away in a weird macro.

Too clever for me.

<< Prev: The Linker       Next: kzalloc Crash >>

The Linker

<< Prev: The Working Subset       Next: Thin Air >>

Umm... No.

That is, yes, in the state I reached as of the last post, the code built. So I copied the kext to my test machine, held my breath, and loaded it. I was probably 70% expecting the machine to reboot, and happily it didn't to that. But the kext didn't load either:

kxld[AppleIntelWiFiMVM]: The following symbols are unresolved for this kext:
kxld[AppleIntelWiFiMVM]:     __Z10__iwl_infoP6devicePKcz
kxld[AppleIntelWiFiMVM]:     __Z9__iwl_errP6devicebbPKcz
Can't load kext AppleIntelWiFiMVM - link failed.
Failed to load executable for kext AppleIntelWiFiMVM.

That was a new one. Not a permissions error, not an ownership error, but a linker error. I first assumed I had left out an entry from the required frameworks in Info.plist, so I ran kextlibs again. But this time it had something new for me:

For x86_64:
    2 symbols not found in any library kext.

So then, not just a missing library. A little consultation of the man page suggested I add the -undef-symbols flag, and then I got more detail:

For x86_64:
    2 symbols not found in any library kext:
 __Z10__iwl_infoP6devicePKcz
 __Z9__iwl_errP6devicebbPKcz

Same as I got when I tried to load the kext. So I could have caught this before I even tried. Now, the question was, what did that mean?

I Googled for linker errors and what the symbol names meant. It looked like this was complaining about the functions __iwl_info and __iwl_err, which were declared (along with some friends) in iwl-debug.h:

void __iwl_err(struct device *dev, bool rfkill_prefix,
                bool only_trace, const char *fmt, ...) __printf(4, 5);
void __iwl_info(struct device *dev, const char *fmt, ...) __printf(2, 3);

Maybe the arguments were somehow wrong? I grepped the code for an implementation and didn't get any hits, but with "iwl" in the name it seemed unlikely they were supposed to come from an OS framework.

But on that last Google search, I did get a hit on "name mangling" for the C++ linker. It described how because C++ functions can be overloaded, every function gets a unique name, with those odd additional characters referring to the function parameters and etc. And it also said 'this is why you should add extern "C"'. Of course, I had removed 'extern "C"' because I had no idea what it was, and why do stuff that made no sense? Well, here's why, I guess.

I put 'extern "C"' back in, around the header files I was importing from the Linux driver source:
extern "C" {
#include "iwl-config.h"
#include "iwl-drv.h"
}

Then I built (no errors), and tried kextlib again:

For x86_64:
    2 symbols not found in any library kext:
 ___iwl_err
 ___iwl_info

Well, now I just felt like a script kiddie. I got an error, Googled it, and tried whatever it said without stopping to try to understand it. Surprise, surprise, it didn't work. It just changed the error, in a way that should have been obvious.

Instead of the C++ mangled name, I got the plain-C name in the error (with an extra underscore, but whatever). Because functions can't be overloaded in C, all the extra stuff to distinguish between overloaded functions isn't necessary, and the name itself is enough. So that's what I got.

Of course, since I was compiling all the Linux driver code together, either the function was there or it wasn't. If it was there, I wouldn't get errors either way. If it wasn't there, changing the naming strategy wouldn't make it suddenly appear. And I had already grepped the code and determined that it wasn't there.

I don't know what I was thinking. I needed to find the missing code.

<< Prev: The Working Subset       Next: Thin Air >>

The Working Subset

<< Prev: Plugging Gaps       Next: The Linker >>

So with enough #defines in place, I started to feel like I ought to be able to make some progress on getting the code into a compile-able state. The big problem remaining was the huge number of Linux API calls. They were in all sorts of areas: PCI devices, memory allocation, DMA mapping, synchronization, queueing, network packets, and so on. I started keeping a list of the specific missing API calls I saw as I looked through the code, and it quickly grew to 60+ items. And that's just what I came across; not making an effort to look at ALL the code.

On the one hand, it felt a little like I could attack these one group at a time. I had already spent some time on PCI device registration and recognition in OS X, for instance, so maybe I could work on migrating of removing the PCI code from the Linux driver. And by removing, I mean defining it out, like this:
#define DISABLED_CODE 0
...
#if DISABLED_CODE
#include <linux/pci.h>
#include <linux/pci-aspm.h>
#endif

That DISABLED_CODE was defined in the basic porting header file, so I could use it anywhere. The advantage over commenting out or just plain removing the code in question would come when I tried to update to a newer driver release. It should be more obvious what was mine (as opposed to comments, which may have been in the original), and it would be a smaller and more manageable change to the source code than removing entire blocks. That's the thinking, anyway.

So I embarked on a process of:
  1. Run a build in the IDE
  2. Find the lowest-hanging fruit among the errors
  3. Fix them
  4. Repeat
If they weren't Linux macros that I hadn't migrated yet, they were often Linux header files that I'd exclude like the above. Fix and repeat. It seemed to go on and on. Eventually I started getting some more meaningful issues, but if I ignored them I could keep going.

At some point, I noticed how I had been doing this for a pretty long time, and I still kept coming across different files. So then, I really took another look at the source tree. Even removing the older DVM firmware directory, I had about 100 files. Plus 10 more header files I had pulled in from Linux in their entirety (mainly wireless-related stuff), and one more grab bag file of individual constants or structs that I pulled in where it didn't make sense to take the whole file.

I tried estimating how long it would take me to not just cover the low-hanging fruit, but actually get all that code to compile. Fix all the easy errors. Understand what all those API calls were actually doing. Find the equivalent calls in OS X. Make any changes needed when there wasn't a one-to-one correspondence. Probably over a hundred APIs in a hundred files, before I could even BUILD. Ridiculous.

So then I thought, maybe I can just flat-out comment out or define away every single statement that doesn't compile, build the rest, and then go back and fix things to actually work. Except that was going to rely on me actually fixing everything. If I just commented or defined it out, I'd have to be diligent about putting TODO messages in or something. And if I overlooked going back to fix just one, it could all be for naught, leading to outright failures, or perhaps worse, subtle bugs. It was really easier to use the compile failures as the indicator of what still needed attention.

So then, I thought, what if I could exclude the files I haven't worked on yet from the build, and get some small core of this thing to build, and then grow it out from there? I could add back one file at a time as I needed the functionality and had the time to fully migrated it.

Turns out, this is very doable. In AppCode, you can remove source files from the build in bulk (while leaving them in the project) by going into the project settings (right-click the top-level project in the project view), and then selecting the "target" on the bottom left. That gave a list of all the included .cpp and .c files, and it was easy to select all the C files and hit the big minus button. Bam! Problem solved!

Well, not quite, but close. First, I needed to figure out how to add them back in as needed. That turned out to be right-click on the .c file in the project view, select Manage Targets, and check the target again. With that established, I left them all out of the build.

Then I needed to figure out what my small core of buildable code was going to be.

I started with my .cpp files and the headers just for them. I had gotten rid of my old header file with the PCI device listings, in favor of one that was much closer to the original source. So I brought that header in, which in turn brought in a variety of others. I still had some cleanup to do in that chain of headers, but much less.

And then, voila! It built! All I had to do was remove all the actual C code from the build, and I was back to where I was before... Which from a certain point of view is not much progress. Except now I had a project full of Linux sources, some of which I was actually using, and I had managed to remove code from my original effort, and I would be able to build out from there. So it felt like a solid step.

Once again, I had surprised myself with working code.

OK, let's not get too far ahead of ourselves. With building code. But surely if it built, then it would run... (cue drumroll)

Sample Code

The code and project as of this point is on GitHub here. All the code is still present, but if you open the project in Xcode or AppCode, you'll find all the C files excluded from the build -- so most of them probably still won't compile.

<< Prev: Plugging Gaps       Next: The Linker >>

Plugging Gaps


So the previous set of compiler directives singlehandedly fixed a wide-ranging set of errors when I tried to build the Linux code. But there were plenty more in there -- here are some others I found interesting to track down.

The porting code I had brought over included a couple already, such as likely and unlikely, which are hints to the compiler about the path it would do better to optimize for:
#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

There were a couple more obvious ones, such as __init and __exit used to designate the startup and shutdown functions for a Linux module. Since this code is all going to be one "module" (e.g. kext) on OS X and it comes with its own API for startup and shutdown, I just disabled those:
#define __init
#define __exit

Another simple copy from the Linux source was BUILD_BUG_ON, which aims to cause a compiler error:
#define BUILD_BUG_ON(condition) \
       ((void)sizeof(char[1 - 2*!!(condition ? 1 : 0)]))
While a little obscure, this relies on the fact that the calculated value will either be compiled out of existence or cause an error:
(void)sizeof(char[0])   // Compiled away
(void)sizeof(char[-1])  // Compiler error

The trickier ones were more embedded into the way things work in Linux, such as debugging. WARN_ON prints a message when the provided expression evaluates to true, and returns the value for use in an if statement. So the usage looks like this:
if(WARN_ON(foo == 3)) {
    return -EINVAL;
}
Well, WARN_ON causes an Oops, similar to a panic on Linux except it doesn't stop the machine dead. OS X doesn't have an apparent analog. Further, the Linux implementation quickly descends into large functions in printk.c, which also doesn't have an OS X equivalent. Instead, I went with the workaround of defining a global function to save a bug to the log. And because it's not expected to happen very often, I went with a fairly straightforward one and then saved the message to the log with IOLog. The only tricky bit is that is uses a printf-style string, so I had to handle variable arguments:
static void porting_print_warning(char *fmt, ...) {
    char buffer[200] = "AppleIntelWiFiMVM FATAL ";
    char *remainder = &buffer[24];
    va_list args;
    va_start(args, fmt);
    vsnprintf(remainder, 176, fmt, args);
    va_end(args);
    IOLog(buffer);
}

That made the WARN_ON implementation an easy copy from the Linux source, just replacing the function to call to log the actual warning:
#define WARN_ON(condition) ({                                 \
    int __ret_warn_on = !!(condition);                        \
    if (unlikely(__ret_warn_on))                              \
        porting_print_warning("at %s:%d/%s()!\n",             \
                              __FILE__, __LINE__, __func__);  \
    unlikely(__ret_warn_on);                                  \
})

The similar BUG and BUG_ON are a little trickier, because they're supposed to print a message and then panic the kernel. Well, the panic part is OK, but I think the message part is the problem -- I believe IOLog saves to an in-memory buffer, and something else flushes the buffer to the system log file. And I expect when the line after IOLog is panic(); probably the buffer never gets flushed to the log file. I've read that OS X saves the panic data to nvram in order to display it on the next boot, exactly because nothing can be reliably saved to disk in the event of a panic. So here's my implementation, which probably doesn't really work as desired (a note for future investigation):
#define BUG() do {                             \
    IOLog("BUG: failure at %s:%d/%s()!\n",     \
                __FILE__, __LINE__, __func__); \
    panic("BUG!");                             \
} while (0)

#define BUG_ON(condition) do {      \
    if (unlikely(condition)) BUG(); \
} while (0)

Now, you may be curious about the specific syntax used in the last couple examples (I know I was):
({...; foo;})

do {...} while (0)

The first, it seems, is the way you write some code that does something and then return foo from a macro that will be substituted into the condition of an if statement (shown here after substitution):
if(({...; foo;})) {...}
But also works (albeit suboptimally) as the result of the if statement instead of the condition:
if(foo) ({...; foo;});

The other is how you write a macro that returns nothing and may be used as the result of an if block with or without curly braces:
if(...)
    do {...} while (0);
or
if(...) {
    do {...} while (0);
}
Without the do/while, you might end up with something like this after substitution:
if(...)
    IOLog("BUG: failure at %s:%d/%s()!\n", ...);
panic("BUG!");

Ah, code that sometimes logs but always panics. That would deserve its own Oops.

Compiler Directives

<< Prev: Xcode vs. AppCode       Next: Plugging Gaps >>

So there I was, faced with some inexplicable snippet of Linux code. I googled it. I landed at http://lxr.free-electrons.com/, which has got to be one of the single most useful resources for porting Linux code, hands down.

Whatever I was looking at was a compiler directive. Probably it was something like __maybe_unused. I followed the link on that page to compiler-gcc.h. And what do you know? Right above __maybe_unused is __packed. Remember __packed? I remember. It took me at least an hour to eliminate every occurrence of __packed.

But wait. That was way back when I thought __packed was complete magic. Now it turns out it's just a macro for an annotation that the compiler picks up.

But that's for GCC, and OS X has switched away from GCC. I'm pretty sure that's true -- I mean, I had to use lldb instead of gdb, and the project screen has a bunch of settings for Apple LLVM 7.0. But, I thought I heard that gcc and some other compiler forked and merged... maybe that was LLVM? (No.) Maybe they're pretty close after all? (Yes. Despite being different projects, GCC compatibility appears to be a goal of clang, the C/C++ compiler "front end" to LLVM.)

Bottom line, could I just snarf the definition of __packed instead of rewriting all that code? Yes I could. Yes I should, since that would make it a lot easier to update to the latest driver code later.

And while __packed (and __aligned) were the most obvious, there were a lot of others:
#define __packed             __attribute__((packed))
#define __aligned(x)         __attribute__((aligned(x)))
#define __printf(a, b)       __attribute__((format(printf, a, b)))
#define __attribute_const__  __attribute__((__const__))
#define __maybe_unused       __attribute__((unused))
#define __bitwise__          __attribute__((bitwise))
#define __must_check         __attribute__((warn_unused_result))

Most of those I had previously #defined to nothing, so it was nice that they would work more as intended, adding extra compile-time checks to the code.

But there were a few that didn't work too:
#define __force           __attribute__((force))
#define __acquires(x)     __attribute__((context(x,0,1)))
#define __releases(x)     __attribute__((context(x,1,0)))
#define __acquire(x)      __context__(x,1)
#define __release(x)      __context__(x,-1)

Actually, __acquires and __releases didn't cause any problems, but __acquire and __release caused a complaint about __context__, and I didn't see any further setup for that. I gather __context__(a,b) adds the second argument to a compiler variable named by the first argument (passed into the macro), while context(a,b,c) says that the compiler variable named by the first argument (passed into the macro) must have value "b" at the start of the function and value "c" when the function returns. But either those definitions are hiding somewhere else, or they're one of the areas where GCC has a feature that LLVM hasn't picked up.

Likewise it didn't seem to like the attribute force, but that just seems to suppress warnings for an odd cast, so it's not so critical for now. I'm way far away from attempting to get all the warnings out of the code!

Anyway, this let me blow another hour or more removing all the #prama pack statements. In return, though, it gave me a lot more confidence that the structs would come out as intended, and made it immensely easier to update the code to the driver included in a newer Linux release, when I later got around to that.

<< Prev: Xcode vs. AppCode       Next: Plugging Gaps >>

Xcode vs. AppCode


Xcode: just plain awful, or actually unusable?

All right, they moved my cheese again. Maybe I'm just not used to it. I'm definitely not used to it.

Some of the things I don't like:
  • Right-clicking to navigate from a name (struct, function, variable, etc.) to where it's defined (using the mouse as opposed to a keystroke)
  • An editor area with only one file at a time (as opposed to tabs), forcing me to navigate a large project with the whole project view instead of quick-switching between a subset of files.
  • The alternative of opening a jillion separate source windows, again with no convenient way to switch to just the one I want
  • The project settings, which is sort of inexplicable and has a monstrous number of options with no description or help
  • The lack of an obvious place to map all project files to source/resources/syntax highlight but don't include in output, etc.
  • The inexplicable use of includes and frameworks. You can #include seemingly anything you like. You can also add frameworks to the project, though it's not clear what that does. It doesn't seem to alter which includes work, and it doesn't seem to put the frameworks into the list of required frameworks in the Info.plist
  • The way the plist editor defaults to an unusable tree view (where among other problems, it takes between one and three clicks on a control to activate the control), and the plain text view is hard to find
  • Lack of an easy way to rename files
  • Odd Git integration, where files you add and edit are sometimes added and sometimes left untracked
  • Strange separation of source code directories and project "groups"
  • Completely inconsistent build results where a project with a trillion errors can emit just one output error, and the if you fix that you get a different one, and then if you fix that you get 60 errors and a message that you've reached the error limit, but then if you fix one of those you get 35 errors and a message that you've still reached the error limit, and etc.

I could probably go on. Every time I use it, it seems like something comes up. And I spend my time hunting around or Googling or endlessly clicking and scrolling instead of doing something useful. I don't mean to just bash the product, but I don't feel productive with it.

On a side note, I once didn't feel productive with IntelliJ either. But when you start IntelliJ, it offers an endless list of tips that ease you over the learning curve. Keyboard shortcuts. Functions otherwise hidden in menus you might not think to look at. Features that make you more productive. Sometimes I just dismiss that dialog without looking, but at least a couple times I've sat there and clicked through it for a while, and I've been rewarded. Where's the "100 ways to be more productive in Xcode" in Xcode?

So then I thought to look over at the JetBrains Web site. I know they've branched out into IDEs for other languages besides Java. Could there be one for C and C++?

There could! Actually, there could be two, which was a bit of a problem because which do you pick? Choosing the more OS X-oriented one, I went with AppCode. When I fired it up and went to create a new project it offered an option to build an IOKit kext, just same as Xcode. That made me think I probably picked the right one.

Well, that might be true, except I quickly found there are two near-crippling bugs in AppCode 3.3 and 3.4 EAP (as of this writing).
  1. If you #include IOKit features such as IOKit/IOLib.h or IOKit/pci/IOPCIDevice.h, AppCode doesn't recognize them. They're shown in red, and code highlighting also shows anything that came from them in red. This has a huge trickle-down effect, as basic data types such as u32 aren't found, and a lot of errors crop up like "incompatible assignment UInt32 to int". It means in a typical source file, there are hundreds of lines highlighted as errors, and the right gutter in the editor window is nearly solid red. Among other things, this makes the scroll knob nearly invisible, depending on the editor color scheme.
  2. When I do a build, I often get on the order of 2 errors and 500 warnings. Unfortunately, the build messages window shows the chain of includes for every file that has a warning or error, often several times. The bottom line is, the build output window has (in the test build I just did) 1665 lines of output, including 2 errors. It shows perhaps 15-20 lines at a time. It normally does not jump right to the error. Using the "don't show warnings" control reduces that to only 908 lines. Somewhere in those 908 lines, of which we'll be generous and say you can see 20, is your error. It might take longer to scroll around and FIND the error than to actually fix it!

The first problem I assume they'll just fix someday.

The second, I'm not so sure. In this one area, it made me really appreciate Xcode, which after a build lists every file with a warning or error exactly once, with a red symbol next to the ones with errors. It's completely obvious where the errors are.

So I'm left with a tough choice.

But at the end of the day, I find that while both IDEs bother me, AppCode bothers me less. Maybe I should go to the trouble of identifying the context-menu option in Xcode to navigate to the definition of a name, and assign a keyboard shortcut to it. Then if I could get Find in Xcode to navigate results as easily as Incremental Search in AppCode, I might stick with Xcode until they fix the syntax highlighting problem in AppCode.

Or, I suppose, I could try to use emacs for more than just text editing. I hear some people use it for code, too.   :)

The Other Shoe Drops...

<< Prev: Parsing Firmware       Next: Xcode vs. AppCode >>

The bad news is, I did more reading. I looked at more code, took more notes, read more background material. I'm becoming increasingly convinced my approach isn't going to work. If I try to rewrite all this code into C++, I'll never finish, and there probably won't even be decent progress (in terms of actually connecting to wireless networks) along the way. There's just too much -- driver code, PCIe interface code, MVM firmware interface code, 802.11 packet-handling code, and etc. I might be able to set up and initialize a card, but I can’t see that I'd actually be able to operate it.

Worse, all the simplifications I thought I'd make don't seem that simple. Rather than just dropping a couple files or functions because, for instance, no Mac has a hardware switch to disable wireless radios, the code for "RFKill" seems to be in little bits all over the place. Taking it out might be harder than leaving it in!

So I think I've been going about this backward. I think what I should be doing is attempting a minimal-changes type port, where I use the existing code to whatever extent is possible. Yes, I'll have to change some things I've already done, OK perhaps a lot of things I've already done, but maybe I'll find some shortcuts. Macros to work around missing Linux APIs or whatever.

In any case, I think that's maybe my only hope of achieving functionality in the not-amazingly-long term. Then once I get it all working, maybe that’s the time to think about migrating more code into C++ and making the organization more pleasing to me.

There will still be plenty of challenges, like the part where it expects the Linux wireless API to do something or other (such as give it connection settings for a network).

So really, the next step is just to sanity-check that option. Is it even reasonable to think I can fill all the gaps between Linux and OS X without a substantial rewrite? If I pull in ALL the iwlwifi driver code, and all the Linux headers they rely on for various structs and constants, can I get back to the working firmware loading-and-parsing code with the new strategy?

And if so, does it seem like a better or worse approach?

<< Prev: Parsing Firmware       Next: Xcode vs. AppCode >>

Parsing Firmware

<< Prev: #include Woes       Next: The Other Shoe Drops >>

After doing a little research and proving that I could load a driver, I figured it was time for some real work. But first, a little background on firmware.

About the Firmware

The firmware for the Intel WiFi cards is distributed as a single file. However, there are at least two major styles of firmware, and each one is broken up into many sections. From a hardware perspective, there's the older models that use DVM firmware vs. the newer models that use MVM firmware. From the firmware file perspective, there's the v1/v2 firmware vs. the TLV firmware. I'm not actually sure whether the DVM/MVM axis is separate from the v1v2/TLV axis.

Since I'm only targeting fairly current hardware and current firmware, I only need to work with MVM hardware and TLV-style firmware. Unfortunately, from all appearances, TLV is the more complex of the two. The large firmware file consists of a header followed by many individual records, each of which has a type and length value and then a variable amount of payload (the size of which is in the length value). In order to be able to load the firmware onto the hardware, you need to parse this file out into all its separate components.

There's a big C function (dominated by an enormous switch statement in a while loop) that handles this. It ends up stashing some values in configuration objects, and copying other sections into freshly-allocated memory that it will hang on to for subsequent restarts or whatever. Then it lets the original file data be freed.

Firmware Parsing Code

So my next step would be calling that function to parse the TLV firmware. This would involve roping a lot of code from the Linux driver into my project. In that code, I found Linux-compiler syntax for byte-alignment in structs, which my OS X compiler was not happy about. It seemed necessary though, because many of the firmware-related structs were packed to map directly to specific bytes in the file, and extra padding would be spoiled that mapping. This effort would test whether I could convert all that successfully, plus hand off from my kext-resource-loading code to the Linux firmware-parsing code.

As I went about pulling in the functions and constants and structs that the firmware parsing used, the project suddenly bloated. I tried to be strategic about what code and files I was bringing into my project, but this referenced that and suddenly I had 10+ files. It wasn't obvious to me why some things were laid out the way they were in the original source, so in some cases I rearranged a bit. I didn't want to try real hard to keep things just as they were in the original driver, because the end goal was to port a lot of it anyway.

For instance, the C code had pretty opaque function tables (structs full of function pointers). In trying to follow the code, I'd land at ops->start(...) but there wasn't any definition of a function "start" anywhere, it was just an entry in this "ops" struct. Then I had to figure out where that was assigned in order to know what the actual function was called in its definition and where that was so I could follow the code. I guess all that makes sense in C, but it seems like an obvious candidate to make a C++ class out of. The C++ code in the other ported drivers I looked at was definitely easier to follow than the original C code.

Bottom line, I copied and rearranged, and soon I had a big pile of code that seemed to be a complete set, but still didn't compile.

Struct Alignment

For whatever reason, it seems that certain hardware is much more efficient at reading and writing values that start on a 32-bit or 64-bit boundary (or other alignments, for less common cases). For instance, on a 64-bit system like OS X, pointers need to be aligned on 64-bit boundaries.

Thus, look at a struct like this on 64-bit OS X:
struct foo {
    UInt32 a;
    some *b;
    SInt16 c;
    some *d;
}
Adding up the field sizes gives 4+8+2+8=22 bytes. But the actual struct takes 4+4(padding)+8+2+6(padding)+8 = 32 bytes. There's padding introduced after "a" to allow "b" to be on a 64-bit boundary, and likewise padding after "c" to allow "d" to be on a 64-bit boundary.

The 64-bit developer docs have an easy suggestion to fix this: just reorder the fields in the struct. If they went b,d,a,c there wouldn't need to be padding.

Well, here's the problem. If a struct with assorted field sizes like that was used to map to a section of a firmware file, I couldn't just reorder the struct without making the fields point to the wrong bytes in the firmware file. And if there isn't padding in the firmware file, there needs to also be no padding in the struct, or again, the fields will point to the wrong bytes in the file.

The Linux driver code makes it work with the __packed keyword like this:
struct iwl_fw_dbg_reg_op {
 u8 op;
 u8 reserved[3];
 __le32 addr;
 __le32 val;
} __packed;

Whereas OS X seems to prefer it with a compiler annotation (pragma) like this:
#pragma pack(1)
struct iwl_fw_dbg_reg_op {
 u8 op;
 u8 reserved[3];
 __le32 addr;
 __le32 val;
};// __packed;
#pragma options align=reset

What made me nervous was the more complicated ones, like the ieee80211.h Linux header file. That one has structs aligned(2), packed structs, and (if you look for struct ieee80211_mgmt) a struct aligned(2) containing packed sub-structs, unions of packed sub-structs, and even a sub-union with packed sub-structs.

I took a swing at converting all that to the corresponding pragma syntax, but I can't say I had any real confidence it would work. You know, what if changing the pragma and back mid-struct didn't work?

Got a better idea? It turns out my entire approach here was flawed. But naturally, I did not discover that until later. We'll come back to struct alignment in a future post.

Memory Allocation

The next problem was that the firmware-parsing logic had at least a little memory allocation going on, and naturally the syntax for that differs as well.

On Linux, it went like this:
pieces = kzalloc(sizeof(*pieces), GFP_KERNEL);
...
kmemdup(pieces->dbg_dest_tlv,
 sizeof(*pieces->dbg_dest_tlv) +
 sizeof(pieces->dbg_dest_tlv->reg_ops[0]) *
 drv->fw.dbg_dest_reg_num, GFP_KERNEL);
...
kfree(pieces);

Well, OS X doesn't have kzalloc, kmemdup, or kfree.

I though at the time kzalloc was the zone allocator. For instance, in OS X, many of the higher-level memory allocation functions pull memory out of "zones" of various sizes (32 bytes, 64 bytes, etc.) and always give you an allocation of that size. Therefor if you ask for 33 bytes, you actually get an allocation of 64 bytes from the next-larger zone.

Well, it turns out I was wrong (more about that, and the corresponding crash, later). But for now, I replaced kzalloc with IOMalloc (which I gather also uses a zone allocator under the covers), and kfree with IOFree. The problem there was that IOFree needs to be passed the original allocation size, so I had to add some fields and logic to track the allocation sizes so that I could use them when the time came around to free. I'm not sure I got that right, so there could be a leak there, but at this point I was mainly aiming for "working" over "working perfectly".

kmemdup was trickier. I found this definition of kmemdup:
void *kmemdup(const void *src, size_t len, gfp_t gfp)
{
    void *p;

    p = kmalloc_track_caller(len, gfp);
    if (p)
        memcpy(p, src, len);
    return p;
}
It looks like it allocates memory and then, if successful, copies the contents of the thing passed to it into that new memory and returns it. Actually, I took the easiest route and just copied the whole function into my project, except I again used IOMalloc instead of the oddball kmalloc_track_caller.

Other Odds and Ends

I did put the firmware parsing logic into a new C++ class. It ended up with a bunch of static utility functions down at the bottom. I could have pulled those into the class and thereby eliminated at least one of the arguments from each one... but I wasn't yet sure whether I wanted to do that. I sort of had it in my mind that I might move them again.

Then I had to comment out all the Linux library imports from the headers I did bring in, and add the linux-porting.h header file that I brought in from another driver port. That handled some things like common constants, macros, and typedefs that people had run into before.

Finally, I commented out a few fields here and there that used data types I wasn't ready to deal with yet. Overall it was a bit of a cleanup operation, but I was trying to avoid any major decisions about how to re-implement things. When in doubt, comment out.

Results

Finally, everything compiled again. I gave it a spin on my test machine, and...

Much to my surprise, the firmware-parsing logic all seemed to pretty much just work! I got a couple debug lines I took to be at least warnings (such as "GSCAN is supported but capabilities TLV is unavailable"), but with a more detailed inspection of the firmware files in a hex editor, it seems to have (and not have) exactly the chunks the parsing code said. Huh.

(Side note: what does it mean when you find success surprising?)

Kext Resource Caching

So then I figured I better try all five firmware files downloadable for the MVM firmware models. I didn't want to go claiming everything was working fine and have it turn out that only one model worked. So I put all the files into the kext Resources and hardcoded the IDs into the driver rather than detecting the real hardware.

Naturally, the first additional model I tried failed. With a little debugging, I found that for some reason my file-loading code refused to load any but the original firmware file.

Eventually I supposed there must be some kind of caching going on so it "remembered" the version of my kext with only the one firmware resource. I rebooted the machine, and surprise, surprise, it all worked. I've since found a passing reference to caching kext resources, but not a full explanation.

From "man kextutil":
-c, -no-caches
  Ignore any repository cache files and scan all kext bundles to
  gather information.  If this option is not given, kextutil
  attempts to use cache files and (when running as root) to create
  them if they are out of date or don't exist.

Sample Code and Output

This version of the code is available here (and a build here). However, it may or may not work for you. The kzalloc thing means that sometimes the counters come out totally bogus, and it seems like a crash follows quickly thereafter.

<< Prev: #include Woes       Next: The Other Shoe Drops >>

Tuesday, March 22, 2016

#include Woes

<< Prev: Loading Firmware       Next: Parsing Firmware >>

Is Apple somehow unaware that in order to use their lovely APIs, you have to #include them first? It makes the IDE (not to mention the compiler) a little cranky when you omit that part.

Yet, have a look at this documentation for OSKextRequestResource again. It says it's documenting OSKextLib.h, so you might try something like this:

#include <OSKextLib.h>

Nope. That header is itself in something, or under something. In or under what? No indication. Maybe Apple has some other documentation on how to use that? Perhaps under Creating a Device Driver with Xcode?

Well, that gives a code sample that actually includes a #include (yay!), but it's not the one we need here.

This kind of problem came up a number of times. I needed to use an OSData object to store the results ultimately generated by the OSKextResourceRequest. Let's look at the OSData Class Reference. That page doesn't even name the header file it's in.

There's also a detailed article on kernel data structures including OSData, with code samples, big and small. Neither the big ones nor the small ones include the #includes. As far as I can tell the article never mentions what to #include either. Maybe if you always write tidy, one-class drivers that do virtually nothing, it just works based on what a "heavyweight" header such as #include <IOKit/IOLib.h> ropes in, but that's not that helpful in general.

So, it comes down to the source code I guess. Googling OSKextRequestResource turns up this source code. From the URL, you might decide to try

#include <libkern/OSKextLib.h>
or
#include <libkern/libkern/OSKextLib.h>

That first one turns out to work.

There’s also the good-old-grep method:

cd /Applications/Xcode.app/Contents/Developer/
cd Platforms/MacOSX.platform/Developer/SDKs/
cd MacOSX10.11.sdk/System/Library/Frameworks/
grep -r "OSKextRequestResource" *

The output is a bit of a mess, but you'll find Kernel.framework/Headers/libkern/OSKextLib.h in there, suggesting the libkern/OSKextLib.h used above.

For what it's worth, OSData seems to be defined in libkern/c++/OSData.h (try searching for "class OSData" if you don’t want to be overwhelmed by occurrences of just "OSData"), though again it tends to get roped in by the IOLib imports such as IOService.h. I have to think there's some middlin' header that includes just the data structures like OSData and OSDictionary and stuff, but I haven't come across it yet.

Why is this so difficult?

It makes me think that one of the huge advantages of Java is the documentation. Which in turn, is build on the package system. If you look at HashMap, for instance, right above the big words Class HashMap are the little words java.util and immediately you know precisely the import you need to write to actually use this class.

Even supposing C/C++ isn't amenable to that level of automated quality documentation generation (though why should it not be?), Apple certainly had the opportunity in the articles like this one to make the needed #include clear.

Oh, well. Grep on.

<< Prev: Loading Firmware       Next: Parsing Firmware >>

Loading Firmware

<< Prev: Basic Proof of Concept       Next: #include Woes >>

Loading Firmware

After recognizing the hardware, the next thing I wanted to do was identify and load a firmware file that would work for the hardware in question. The code with the device IDs that I originally found (in here) included a configuration struct that included most of the firmware file name -- everything except the version number -- and the range of acceptable version numbers (see, for instance, all the config objects in the bottom half of this file).

It seems like the Linux API lets you request a firmware file and get a callback with either the file data or an error (request_firmware_nowait). So the iwlwifi driver just attempts to load all possible firmware files from highest version to lowest version until it finds a match. To resolve that kind of request, Linux talks to a user-space process that looks for the files in question in a directory such as /lib/firmware, and the user is expected to put the firmware files there.

I didn't see any similar infrastructure in OS X -- something to sit on top of data or configuration files in a directory and serve them up to kernel extensions upon request. Long-term, this might be a challenge. It would be obnoxious to build a user-space daemon to sit on a directory like /lib/firmware and the methods for the kernel to communicate with it. Plus there would need to be a way to ensure that loaded before any driver that needed to request firmware from it. But other drivers (even existing Bluetooth drivers) have the same problem, and without some common option like that every driver would have to implement its own solution, which seems unfortunate.

Still, there's a short-term solution. Kexts can include arbitrary files in their Resources directory, and I found the OSKextRequestResource call load file data out of Resources. So I could load the firmware file from there. I tried bundling a firmware file into my kext and tagging it as a build resource and loading it with a hardcoded file name, and that all worked! (See my original loadFirmwareSync method, and the callback just above it.)

Actual firmware naming logic

The next step was pulling in the logic that actually held a specific firmware file name for each piece of hardware. It turned out there were five possible firmware files. I briefly considered just building my own mapping of device IDs to file names, but all the information was already in the Linux driver code... I just had to find a way to take advantage of it.

The problems were thus:
  • The code was split across three files. One defined all the 7000-series device configuration possibilities. One defined the 8000-series device configuration possibilities. And the third listed every possible device by ID, along with which configuration it mapped to. Often many specific devices mapped to a particular configuration, and more than one configuration mapped to a given firmware file.
  • All three files were .c files instead of .h files, meaning it wasn't a completely straightforward #include to pull that logic in
  • The third file had a bunch of additional logic. It was all related to how the driver registered to be loaded when those particular devices were present. The more I looked the more it seemed like the same kind of plumbing I had built into my Kext -- the Linux equivalent of the IOKit registration, startup, and shutdown processes. I didn't need that code.
So I took the easy route of converting the first two files to headers (iwl-7000.h and iwl-8000.h), and pulling the array of devices out of the third into a new header file (device-list.h). You may notice a few changes. In the first two, I just commented out the calls that link modules to firmware files, since that's all Linux-specific plumbing. If you compare the third to the device list in the original, you'll see that I simplified the struct that holds the device information. I didn't need the full Linux pci_device_id struct, just the device ID, subdevice ID, and configuration struct.

Then, I duplicated the logic of the Linux driver that checked available versions of the firmware file for each device. To do that, I sort of adopted some of the structs used by the Linux code, such as iwl_drv, iwl_cfg, and iwl_trans. These I pulled out of wherever they were defined into my own code. So I ended up with a revised loadFirmwareSync method and a few header files that I either migrated directly (like iwl-config.h) or created to hold a grab bag of #defines and structs from files I wasn't ready to convert.

The Joys of Kext Development

It was about this time I found out what I was messing with. I had allocated some object or other that I freed when I finished using it. Then (whoops) I freed it again in the overall Free function of my driver.

So I tested my new hardware-matching, firmware-loading kext: "sudo kextload AppleIntelWiFIMVM.kext". All good. Yay!

Then I cleaned it up so I'd be able to try a new version later. "sudo kextunload AppleIntelWiFiMVM.kext".

Then my screen turned off.

Then I heard the little Apple chime.

Then my laptop informed me that it had restarted because of a problem.

So much for the "good old days" of Segmentation Fault; Core Dump. Instead I had an inscrutable panic report, with some elaborate complaint about 0xDEADBEEF. I guess this is what passes for humor among kernel developers? Though, I guess I should thank them for bothering to make my pointer point to something obvious when I freed it the first time, so they could tell when I freed it a second time.

Anyway, it appears they weren't kidding around when they said in case of error, the kernel would shut itself down rather than risk filesystem corruption and associated problems. Still, I had like thirty windows open!

Well, I've learned my lesson and now test my builds on another machine. On the upside, I can test on a machine with the actual Intel wireless hardware, and I was able to turn SIP back on for my laptop. But now it's a two-machine operation. I guess that day was coming anyway, since there's only so far to go without using the real hardware.

<< Prev: Basic Proof of Concept       Next: #include Woes >>