Tuesday, March 29, 2016

Calling C++ from C

<< Prev: Structs and Pointers       Next: Workqueues >>

It turns out I made a little mistake there at the end of the Wait Queues post.

It's not actually a total disaster if C code (ported from Linux) needs to call C++ code (in the OS X driver).

This one I stumbled across completely by accident, by reading someone else's code that does something unrelated.

The magic is in OSMetaClassBase::OSMemberFunctionCast. This feature of a C++ class converts a pointer to one of its member functions (including a reference to the specific instance it operates on) to a C function pointer.

So it goes like this:
  1. The C code declares a function pointer type as a typedef (specifying the number of type of arguments, etc.)
  2. The C code supplies a variable or function argument of that type
  3. The C++ code populates the variable or passes an argument to the function that it creates by calling OSMemberFunctionCast (I guess normally on one of its own member functions)
  4. The C code can call that function like normal

OK, I'm handwaving there a little. I don't yet know how you call a function if you have a function pointer — but that seems like a much easier problem to solve.

I expect that I'll be using this in anger shortly, so there should be a working example coming soon.

<< Prev: Structs and Pointers       Next: Workqueues >>

Monday, March 28, 2016

Structs and Pointers

<< Prev: Wait Queues       Next: Calling C++ From C >>

Next I went to actually try to implement the macro to replace a wait_queue_head_t with an IOLock.

That seemed pretty straightforward:
#define wait_queue_head_t  IOLock

Only, it didn't work:

Error:(352, 20) field has incomplete type 'IOLock' (aka '_IOLock')

That didn't make any sense. IOLock was defined right there in IOLocks.h in the OS X SDK:

typedef struct _IOLock IOLock;

OK, so I guess the question is what is struct _IOLock? I couldn't find it.

Well, maybe I should give a little more context:

#ifdef IOLOCKS_INLINE
typedef lck_mtx_t IOLock;
#else
typedef struct _IOLock IOLock;
#endif /* IOLOCKS_INLINE */

So maybe instead the IOLOCKS_INLINE wasn't set right?

I banged my head against the wall for quite a while, looking for that blasted _IOLock, before I went back and looked at where I had used an IOLock previously:

IOLock *firmwareLoadLock;

There was one key difference there -- that little star.

So I tried:
#define wait_queue_head_t  IOLock *

And what do you know? Problem solved.

So my conclusion here is that struct _IOLock isn't defined in the IOKit header files, the "client API". But you can create a pointer to one, so long as you treat it as opaque. So long as you don't try to go ahead and use any specific fields or otherwise interact with it. That's OK here, because all I do is pass it to other API calls (IOLockLock and etc.), and their implementation functions (which presumably have access to the _IOLock definition) can do the dirty work.

Still, it caused me quite a bit of confusion that you could create a pointer to a thing while not being able to create the thing itself -- and that an API client should just expect that.

Now I know.

<< Prev: Wait Queues       Next: Calling C++ From C >>

Sunday, March 27, 2016

Wait Queues

<< Prev: 67 Steps       Next: Structs and Pointers >>

Well, no sense in waiting. Now that I had a list of the startup procedure, I might as well get cracking.

I had covered the first two entries in the startup list already -- registering to be notified when matching hardware was found or inserted. (That part I hadn't tried to port from the Linux code; I just replaced it with the OS X equivalent.)

So the next step was iwl_trans_pcie_alloc, which had the basic PCIe setup code.

Right at the top, there were three things I needed to deal with: spin locks, a mutex, and a wait queue. Well, OS X has spin locks and mutexes covered pretty easily. I was going to have to take a peek at the syntax and see whether I could use a macro substitution or whether I needed a code replacement, but that should be easy enough either way.

Better, I thought, to start with the harder part, the wait queue. I wasn't that familiar with wait queues, but I found a nice introduction here.

Still, it wasn't exactly obvious to me what this was used for in the PCIe setup. The wait queue I was looking at was used to wait on a firmware write to complete. In comparison, the firmware read code used a completion variable, and other code waited on the completion variable to be set. So what's the difference between a wait queue and a completion variable? (Research ensues...) Ahh. There is no difference. A completion variable is a wrapper around a wait queue. (See the definition of struct completion in here.)

OK, so the bottom line is, one or more threads can end up waiting on some condition. When the condition arrives, whatever code processes that can wake up one or all of the waiting threads.

Now, what would I use on OS X to achieve this?

IOKit Options

It seems like the IOKit lock a.k.a. IOLock has that capability: IOLockSleep or IOLockSleepDeadline, and then IOLockWakeup. Functionally, it mostly seemed like it would work, with one key difference:

Linux Wait Queues
For a Linux wait queue, you can specify a boolean condition, and the thread will keep sleeping until that condition is true. Because the implementation relies on macros, it checks the condition with the code you provide every time the thread is awakened.

Code such as this:
wait_event(some_wait_queue, foo == bar);

Gets transformed by a macro that I'm seriously abbreviating here:
#define ___wait_event(wq, condition, ...)        \
({                                               \
        for (;;) {                               \
                prepare_to_wait_event(wq, ...);  \
                if (condition)                   \
                        break;                   \
                schedule();                      \
        }                                        \
        finish_wait(wq, ...);                    \
})

Into code that I gather looks something like this:
({                                               
        for (;;) {                               
                prepare_to_wait_event(some_wait_queue, ...);
                if (foo == bar)                   
                        break;                   
                schedule();                      
        }                                        
        finish_wait(some_wait_queue, ...);                    
})

In that transformed code, the prepare_to_wait_event call puts the thread into a suspended state (e.g. TASK_UNINTERRUPTIBLE), the finish_wait call sets it back to the usual TASK_RUNNING state, and in between every time the thread wakes up, if the condition isn't yet true, it goes right back to sleep thanks to the schedule() call that will let other threads run since this one is no longer in a runnable state.

IOLock Waiting
The IOLock calls, by contrast, take an event object instead of a condition to test, and only wake the thread if the event object passed to the wake call is the same as the one passed to the sleep call:
IOLockLock(some_lock);
IOLockSleep(some_lock, foo, THREAD_UNINT);
...
IOLockUnlock(some_lock);

// Then in some other thread:
IOLockWakeup(some_lock, foo, true);

This is a lot more straightforward, on the surface. But some of the differences are:
  • The thread that might sleep must acquire the lock first
  • Instead of a separate wait queue for every situation, you could use one lock with multiple event types
  • There's no obvious support for a boolean wait condition

Now in practice, the third issue may or may not matter. In the code I was looking at, the wait condition was a simple boolean variable, and the only time wake_up was called was immediately after that boolean was set to true. In other words, I could skip the condition part altogether in that case. But I saw four wait queues in the project, so if I went this way I either had to convince myself that was true of all four (and pay attention to any future updates), or go ahead and put some kind of macro around the sleep call to check the condition like in Linux. Well, I'd probably need a macro to acquire the lock anyway, so I guess I could just roll it all in there.

But since this is all part of a lock rather than a native wait queue, and would need some massaging in any case, it wasn't obvious that it would be the best fit.

Mach Options

Since IOLock wasn't just a drop-in replacement, I figured I'd better look further. A little poking around revealed a Mach wait_queue_t and a Mach mutex type lck_mtx_t.

I actually found these by inspecting the definitions of the IOLock functions above. I saw, for instance, IOLockGetMachLock which returns a lck_mtx_t. Then I followed IOLockWakeup to thread_wakeup_prim to e.g. wait_queue_wakeup_all which used wait_queue_t and so on.

So then I thought maybe I shouldn't be using the higher-level IOLock wrapper. I could just use a lck_mtx_t directly, and call lck_mtx_sleep and... Well, then, I'd have to call thread_wakeup_prim or something, which sort of spoiled it, because it wasn't very symmetrical and seemed like I would be mixing a mutex API and a thread scheduling API.

So maybe instead of mutexes, I should be just looking at the Mach wait queues, which maybe corresponded more directly to the Linux wait queues. But the Mach wait queues seemed really low-level. There were obvious wakeup methods, but the way to make the current thread wait (sleep) on a wait queue was not obvious. There were both 32-bit and 64-bit versions. I could probably have worked something out, but it was starting to look like IOLock was the convenient abstraction over all this stuff that I wanted.

However, I wasn't ready to commit just yet:

BSD Options


Having found both an IOKit option and some Mach options, I naturally assumed there must be a BSD option. (Why have one API when you can have three for thrice the price?)

Ah, yes. Google led me to a FreeBSD man page, which gave me some search terms for the XNU source, and I ended up with things like sleep and wakeup. Those inverted the parameters a bit in that you provided a "channel" first, which was just a unique address, and a wakeup on a specified channel would wake the threads that went to sleep on that channel.

But there was also a priority, which I didn't really want to deal with. I could set it to 0 ("don't change") except that I wasn't really sure whether or when I should want the thread priority to change. And for waiting on a timeout, you had to specify the timeout as a struct. And there was a char * message parameter, which the XNU source was silent on, but the BSD man page said should be no longer than 6 characters.

All of a sudden there were a lot of decisions to make, for an API that had initially looked simpler. And for all I know there are other BSD options -- this was a pretty cursory search.

All in all, it seemed like IOLock might be a pretty convenient abstraction.

Final Decision


No surprise here -- I decided that I'd try IOLock first. I still haven't decided whether to handle the condition argument, though I will probably take a stab at it.

But this whole thing made me a little grumpy. Why again should there be three or four different APIs for this? (And each a little different.) And why shouldn't there be a page somewhere saying "Here are all your options, and here's when you should use each one."

Mainly, it's not so much this one case, as I fear I'll have to go through this whole process again for every Linux function I need to translate.

And I feel like I dodged a bullet. All the IOLock functions are defined in IOLocks.cpp. Fortunately, the entire thing is wrapped with an extern "C" block, so they work just fine to call from a C program. But what if the next one is really a C++ class instead? I gather I could just change file extensions and compile all the driver code as C++ in order to call other C++ code, but there's no way I'm about to do that. At a minimum I it seems like it would change many of the struct sizes which would make a huge mess. So for the moment, C++ is to be avoided, and I only narrowly managed.

I guess we'll see what the future holds.

<< Prev: 67 Steps       Next: Structs and Pointers >>

Friday, March 25, 2016

67 Steps

<< Prev: Compiler Dysfunction       Next: Wait Queues >>

So there may have been some bumps in the road, but I felt like on the whole, I'd taken two steps forward. Obviously, then, I was due for a step back. (Have no fear, you won't have to wait long for this one...)

Anyway, I figured, what I'd really like to do next is turn on the hardware. You, know, do whatever needs to be done to light it up, even if it doesn't connect to anything. That would be a solid step.

I had the firmware file, identified, loaded, and parsed. But I wasn't exactly clear on how to send that to the hardware. I knew there had to be something in there to set up a block of DMA memory, and to set up an interrupt handler, and things like that. But at best I knew of a few little pieces, without much of a big picture. I wasn't even really sure where it all started in the Linux driver. I mean, I had seen PCI device registration code, module startup code, and so on, but what really kicked everything off?

So, it seemed like the thing to do was find that start point, and trace the code from there until it was "up." I began poking around, and here's what I found:

The iwl_drv_init module initializer registers various PCI device definitions. Then if there's any matching hardware, the kernel calls the registered PCI probe method iwl_pci_probe in a pretty similar manner to OS X, and that's what kicks the whole thing off.

That part was easy enough.

Now iwl_pci_probe calls iwl_trans_pcie_alloc, which if I'm going to be generous, has on the order of 20 things it does. Some of them are the things I expected: Sets up PCIe bus mastering and DMA, registers an interrupt handler, etc. That was going to be challenging enough to convert. Still, even with the extra stuff, I could be up to this challenge. So far so good.

Then iwl_pci_probe calls iwl_drv_start, which mainly kicks off the firmware loading and parsing. That I had mostly already converted. Yay!

But from there, it all sort of exploded.

Toward the end of the firmware loading, it calls the start function provided for the "op mode" (the firmware style, which for now would always be the MVM firmware for me). And that's where it gets ugly.

At that point, the process bounces back and forth between PCIe code, MVM firmware code, and generic driver code. Mostly there are layers upon layers, where the firmware code calls some generic code that resolves to PCIe code, which calls some generic code that resolves to MVM firmware code, and so on. I tried to count only calls that were non-trivial: not just performing some simple logic or calling just one other function or reducing to memory reads or writes. I also didn't count a call to any given function past the first time. I still got to nearly 50 major functions.

Worse were some of the convolutions. Interrupts were going up and down and up and down, there were command packets sent, calls to set various configuration bits spread out all over the place...

I get that there was an effort to make some of this stuff generic and reusable. It's not hardcoded to a specific firmware type, and it's not hardcoded to the PCIe bus (though, why not, since there's not another bus used?). Probably this would look sorta sensible if (when?) I convert it to C++ and you get a generic base class with a couple specific subclasses.

But to try to understand the startup process, it's a huge mess. What are the odds it's really all correct? And if anything goes wrong, how are you supposed to figure out where?

I don't know, I mean, obviously it works when you run this thing in Linux. But the complexity just rubs me the wrong way. Startup is the kind of thing I'd strive to make really simple, not really, er, "normalized."

In practical terms, it's going to be really hard to implement because it touches so many parts of so many files at one point or another. And yes, somewhere it in there it sends the firmware chunks I parsed out to the hardware -- but don't ask me to point to where in the list that happens. I just recall that it's in there somewhere.

So that's the step back. I'm not even talking about connecting to a wireless network yet, I'm just trying to turn the thing on. And all this progress I made on firmware loading, that's less than one step out of fifty. Yippee.

Reference


Here's my list of steps in the startup process. You can say a little prayer for me after you read it. (Even if it's only to the Flying Spaghetti Monster -- I'll take his help too.)

<< Prev: Compiler Dysfunction       Next: Wait Queues >>

Compiler Dysfunction

<< Prev: Linux Upgrade       Next: 67 Steps >>

Pop Quiz: what's the purpose of a compiler?

Three things come to mind:
  • Make it so you don't have to write in assembly
  • Ensure that you use the programming language correctly (e.g. no misspelled keywords or basic syntax errors)
  • Validate the API calls you make (e.g. not calling API functions that don't exist)

I mean, surely there are more, but that's my starter list.

Except the C compiler disagrees with me 33% of the time.

Which is to say, it apparently doesn't give a damn whether the API calls you make exist or not.

Uh, run that one by me again? So I can type code like this, and it totally compiles?

thisFunctionDoesNotExist(17);

Why yes, that's totally OK with this compiler.

Well that's easy -- the compiler is fired. Not doing it's job. I'll go find another one.

Except I don't get to do that, because you kind of have to use the Xcode compiler to build kexts.

This one is a complete mystery to me. You have to declare a struct before you use it, so why don't you have to declare a function before you call it? There exists such a thing called header files, for the express purpose of declaring things before you use them, right? We have this lovely tool called #include just to inform the compiler which header files we would like to reference, right?

OK, yes, I can compile the code and THEN run kextlibs and it can tell me if I'm calling functions that don't exist. But this is not what happens when I push the "build" button and it says "no errors".

Well, in fairness, there is a warning:
Warning:(736, 5) implicit declaration of function
                 'thisFunctionDoesNotExist' is invalid in C99

But there are (in this build) 175 warnings. Last time there were 473. (I don't know how it comes up with these numbers and why they're always different, but as my gripes with the compiler go, that's a pretty minor one.) The point is, there's no way I'm going to notice that one of the 175 or 473 or whatever warnings actually matters.

I'd written another paragraph or two of complaints, but I'll spare you the extra vitriol. There's good news! It turns out that there is a way to make this one warning into an error:
  1. In Xcode, select the Project and go to Build Settings and click All
  2. Look toward the bottom of the giant list under Apple LLVM 7.1 - Warnings - All Languages
  3. Set Treat Missing Function Prototypes as Errors to Yes
Whew!

<< Prev: Linux Upgrade       Next: 67 Steps >>

Linux Upgrade

<< Prev: kzalloc Crash       Next: Compiler Dysfunction >>

About this time, I noticed that Linux had been merrily marching on while I worked. I had started with the driver from Linux 4.4.1, and now they were up to Linux 4.4.3. Not a huge difference, but it seemed like a good time to check my theory that if I kept close to the original source, I could update easily. After all, if that didn't work, this whole approach was flawed.

The basic procedure turned out to be easy enough:

  1. I copied the updated source into the project tree
    • Now the "current state" represented the Linux code
    • A diff against source control might represent the inverse of a change I made -- if I reverted it, I'd be back to my original change
    • Or, a diff against source control might represent a change they made that wasn't in my project yet
  2. I removed the dvm/ directory for the firmware I'm not working with yet
  3. I moved iwl-7000.c and iwl-8000.c to the corresponding .h versions (I had changed them to headers since in my version there is no actual code just a bunch of declarations)
  4. I copied declarations from iwl-drv.c to iwl-drv.h (the iwl_drv struct and etc.)
  5. I copied declarations from pcie/drv.c code into device-list.h (the device list array)
  6. I had to go through file by file and examine the diffs and revert the changes that need it from theirs to mine
    • One common one was #include revisions
    • There were also a variety of minor syntax changes (MALLOC and etc.)
    • Plus any meatier changes, like all the debugging I added to the firmware parsing process

That last part was the tedious part. In the grand scheme of things I've made very few changes, and it was already a lot to review. Using a decent IDE made it all straightforward, but the time-taken-to-benefit ratio was still not in my favor.

Especially when I got through, only to discover that there were no changes except whitespace. Oops. That would have been good to know ahead of time.

Guess what? It turns out that's easy to check ahead of time. Here are the diffs from 4.4.1 to 4.4.3 (none). Whereas here are the diffs from 4.4.3 to today's current 4.4.6 (minor changes).

You know what would be really clever?

Instead of dumping the whole updated source tree and needing to review every single change... I could just apply the individual patches called out by the diffs above. Then instead of needing to review a thousand unchanged bits, I'd only need to massage the actual changed parts.

Someday, I promise, I'll come up with the better approach on the first try. Well, perhaps 'hope' would be a better word. :)

<< Prev: kzalloc Crash       Next: Compiler Dysfunction >>

kzalloc Crash

<< Prev: Thin Air       Next: Linux Upgrade >>


So there I am, merrily testing incremental changes to the driver, when BOOM! The test system restarted.

OK, so it must have been something I just did.

Except, all I just did was rearrange some code.  I still had a bunch of C functions and structs in my C++ class, and I was in the process of eliminating them in favor of the default versions included in the driver code.  I could see compile errors, but how did this cause a runtime error?

Well, one thing I had done was strip out a lot of the debugging messages I had added to the firmware parsing logic. I didn't really think that had caused the failure, but it meant I didn't know how far the process got before it died. So I added all those debug messages back in, along with a little delay at each step so I could read the messages before the restart.

Built: fine. Sent to test machine: fine. Loaded kext: fine. Log messages started pouring out. Everything looked fine. Then it finished, all good.

It worked. And that was the worst, because how could adding log messages or delays fix the problem? If it was one of those completely random errors, I was going to be devastated, because I'm not good enough to track that kind of thing down yet. I couldn't even begin to guess what I might have done to introduce a timing-sensitive or threading-sensitive kind of error.

I unloaded the kext, made some trivial change to add still more debugging, and tried again.

This time it failed, with loads of errors in the firmware parsing. It was fortunate that I was viewing the log remotely, so when the machine died, I still had the text in my terminal:

...
AppleIntelWiFiMVM       0xbcx 19 IWL_UCODE_TLV_SEC_RT
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER -1912832000
...
AppleIntelWiFiMVM    0xa7428x 20 IWL_UCODE_TLV_SEC_INIT
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER -1810778608
...
AppleIntelWiFiMVM   0x19343cx 21 IWL_UCODE_TLV_SEC_WOWLAN
AppleIntelWiFiMVM iwl_store_ucode_sec BAD COUNTER 1449124394
...
AppleIntelWiFiMVM Allocate ucode buffers...
AppleIntelWiFiMVM  ERR failed to allocate pci memory
AppleIntelWiFiMVM Firmware parse failed :(

So I poked around a little, starting with these counters with completely bizarre values. Indeed, they were supposed to just increment by one each time a relevant block was found in the firmware. How do you add one by one with a meaningful size file and end up with -1912832000? Except another one was 1449124394. What the heck?

Oh. Oh yeah. How about, where is that counter initialized?

Nowhere, that's where.

So in Java, there are two possibilities for an int. Number one it's a variable automatically initialized to zero (such as an instance variable), or number two you're forced to manually initialize it (such as a method variable). Either way, the compiler has your back.

In C, I guess, not so much.

As best I can figure, the code was assuming the counter was initialized to zero. But if that wasn't done automatically... Then it must have been assuming the memory it received was all zeroes, such that whatever memory the counter int came from started out being zero.

I backed up farther, until I found where I had replaced kzalloc with IOMalloc. IOMalloc doesn't seem to guarantee that the memory will be zeroed. It turns out that kzalloc does. The "z" in kzalloc is not the zone allocator, but the fact that the memory will be zeroed.

Was this plausible? Well, when I booted and loaded the kext, it received clean memory and worked fine. When I then unloaded it and loaded it again, I already know that the memory I get is coming from a zone allocator. So let's imagine it allocates some page of memory, issues chunks from that page to my kext, and then I give them back on unload. Then I load again, asking for chunks of the same size again. There's the page still there, and now with several chunks marked available. My kext gets some of the same chunks back again, or maybe some that some other program used, but either way no longer clean -- they have whatever state they had when freed before. So on the second load, those counters are initialized to garbage contents, and my kext blows up. Pretty much seems like what happened, so yeah, it's plausible.

Then I had to poke around for a memory allocation routine that does zero the memory on OS X. It doesn't help that there are about seventeen options (including at least, kmem_alloc and friends, kalloc and friends, zalloc, OSMalloc and friends, MALLOC, MALLOC_ZONE, IOMalloc and friends, and in C++ code, new). I finally settled on using the MALLOC macro:
MALLOC(pieces, struct iwl_firmware_pieces *,
               sizeof (*pieces), M_TEMP, M_ZERO);

The M_TEMP is because the call requires a "type" and that's a sane default. The M_ZERO calls for zeroed memory. As a bonus, I don't need to pass the allocation size to the FREE macro, so I could whack the extra code I added just to track allocation sizes.

I tried with my new and improved memory allocation, and I could load and unload several times with no problem. It's hard to prove that was the problem and solution, but I haven't seen the same bizarre counter errors since then.

The only issue is that MALLOC requires a parameter of the pointer type, which kzalloc doesn't. So I can't just make the substitution with another macro, I have to put in an actual code change every time. It's possible I should consider using a different allocation and free call combined with memset or something to make this a macro fix instead of a code fix -- but that's a subject for another time.

Oh yeah, and, who thought it was a good idea for an operating system to have this many ways to allocate memory? Sheesh!

<< Prev: Thin Air       Next: Linux Upgrade >>