racerrehabman

Just some techie stuff

Archive for November 2013

Mavericks/Haswell Kernel Patch for Early Reboot

with 55 comments

One of the big changes for mach_kernel in OS X Mavericks is the relocation of CPU power management into the kernel (for Haswell only). What used to happen in AppleIntelCPUPowerManagement.kext, now happens in the kernel. Since this kernel writes to potentially locked MSRs, this can be a problem for computers where the BIOS cannot be patched, or simply where the owner of the computer does not want to patch the BIOS. But since Apple did not include the code to XCPM (XNU CPU Power Management) in the sources provided for the 10.9 kernel, if you build mach_kernel from sources, the XCPM code and thus the code writing to locked MSRs is not included. I wrote about this initially when I discovered it while trying to get Mavericks running on a Haswell-based HP laptop. See here: http://www.insanelymac.com/forum/topic/293503-haswell-early-reboot-mavericks-locked-msrs-and-hp-envy-15-j063cl-i7-4700mq/

As mentioned in that post, my reason for building the kernel myself was to explore the possibility of removing the code that manipulates these locked MSRs. Of course, I couldn’t find the code because Apple didn’t provide it. To my surprise, the kernel built without this code still worked and I was able to coble together a system which provided power management by using patched AppleIntelCPUPowerManagement.kext from 10.8.5 and this rebuilt from source mach_kernel. That success put the idea of finding a patch to the included kernel on hold. But my real goal was to eventually come up with a patch.

My prior attempts to do this had not worked. My attempt was to patch the kernel such that only writes to MSR 0xE2 were filtered out. But then Pike blogged about the possibility of eliminating all writes to MSRs (https://pikeralpha.wordpress.com/2013/11/23/experimental-bin-patch-for-maverick/), and even though he was targetting the wrong section of code (_wrmsr_carefully is not called, at least not by mach_kernel itself), and I was misunderstanding his post such that I thought he was targetting the correct section of code (see below), I tried the approach of filtering out all writes to locked MSRs. This worked (in fact I commented on his blog with a perl patch that does exactly that), so I wondered if the problem was that there were MSRs other than 0xE2 being written to that were also locked. This turned out not to be the case. The problem was my original patch was simply not done correctly. But the breakthrough was realizing that I was targetting the right section of code, and that my patch must be in error or incomplete.

This is the code in mach_kernel that poses the problem (otool -tV mach_kernel):

ffffff80002f9ba0 pushq %rbp
ffffff80002f9ba1 movq %rsp, %rbp
ffffff80002f9ba4 movl %edx, %r8d
ffffff80002f9ba7 testl %esi, %esi
ffffff80002f9ba9 je 0xffffff80002f9c17
ffffff80002f9bab addq $0x28, %rdi
ffffff80002f9baf nop
ffffff80002f9bb0 movl _xcpm_cpu_model(%rip), %eax
ffffff80002f9bb6 testl 0xffffffffffffffdc(%rdi), %eax
ffffff80002f9bb9 je 0xffffff80002f9c0f
ffffff80002f9bbb movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9bbe testl %r8d, %r8d
ffffff80002f9bc1 je 0xffffff80002f9bcb
ffffff80002f9bc3 cmpl %r8d, %ecx
ffffff80002f9bc6 movl %r8d, %ecx
ffffff80002f9bc9 jne 0xffffff80002f9c0f
ffffff80002f9bcb rdmsr
ffffff80002f9bcd movl %eax, %eax
ffffff80002f9bcf shlq $0x20, %rdx
ffffff80002f9bd3 orq %rax, %rdx
ffffff80002f9bd6 movq %rdx, 0xfffffffffffffff8(%rdi)
ffffff80002f9bda movq 0xffffffffffffffe8(%rdi), %rax
ffffff80002f9bde testq %rax, %rax
ffffff80002f9be1 je 0xffffff80002f9be9
ffffff80002f9be3 notq %rax
ffffff80002f9be6 andq %rax, %rdx
ffffff80002f9be9 orq 0xfffffffffffffff0(%rdi), %rdx
ffffff80002f9bed movq %rdx, %r9
ffffff80002f9bf0 shrq $0x20, %r9
ffffff80002f9bf4 movl %edx, %eax
ffffff80002f9bf6 movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9bf9 movq %r9, %rdx
ffffff80002f9bfc wrmsr
ffffff80002f9bfe movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9c01 rdmsr
ffffff80002f9c03 movl %eax, %eax
ffffff80002f9c05 shlq $0x20, %rdx
ffffff80002f9c09 orq %rax, %rdx
ffffff80002f9c0c movq %rdx, (%rdi)
ffffff80002f9c0f addq $0x30, %rdi
ffffff80002f9c13 decl %esi
ffffff80002f9c15 jne 0xffffff80002f9bb0
ffffff80002f9c17 popq %rbp
ffffff80002f9c18 ret
ffffff80002f9c19 nop
ffffff80002f9c1a nop
ffffff80002f9c1b nop
ffffff80002f9c1c nop
ffffff80002f9c1d nop
ffffff80002f9c1e nop
ffffff80002f9c1f nop

At 2f9bfc, you can see the wrmsr. This has the potential to write to any register as this function is walking through a table of entries, each 48 bytes in size, where the each entry contains the register to write to.

Note that there are seven nop codes following the function (padding), which gives us room to grow the function by 7-bytes to add additional code to filter out the writes that might be MSR 0xE2. My first attempt was to simply try a compare/conditional jump:

                 cmp 0xE2, %ecx
                 je skipwrmsr
ffffff80002f9bfc wrmsr
skipwrmsr:
ffffff80002f9bfe movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9c01 rdmsr
ffffff80002f9c03 movl %eax, %eax

The problem is fitting the two new instructions into only 7 bytes. I could not find a way (problem is a cmp byte-immed,%ecx is a sign extending compare).

Instead, I decided to simply check for zero in the table and then modify the table to zero out the entries with 0xE2:

                 testl %ecx, %ecx     ;85 c9
                 je skipwrmsr         ;74 07
ffffff80002f9bfc wrmsr
ffffff80002f9bfe movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9c01 rdmsr
skipwrmsr:
ffffff80002f9c03 movl %eax, %eax

Because we are inserting 4-bytes of extra code to implement this, a few other conditional jumps must be modified. It is useful to add some labels to the original code and add the opcodes as comments before attempting to build a perl script for the patch:

ffffff80002f9ba0 pushq %rbp                           ;55
ffffff80002f9ba1 movq %rsp, %rbp                      ;48 89 e5
ffffff80002f9ba4 movl %edx, %r8d                      ;41 89 d0
ffffff80002f9ba7 testl %esi, %esi                     ;85 f6
ffffff80002f9ba9 je 0xffffff80002f9c17 ;return1       ;74 6c
ffffff80002f9bab addq $0x28, %rdi                     ;48 83 c7 28
ffffff80002f9baf nop                                  ;90
loop1:
ffffff80002f9bb0 movl _xcpm_cpu_model(%rip), %eax     ;8b 05 5e 30 5e 00
ffffff80002f9bb6 testl 0xffffffffffffffdc(%rdi), %eax ;85 47 dc
ffffff80002f9bb9 je 0xffffff80002f9c0f ;continue1     ;74 54
ffffff80002f9bbb movl 0xffffffffffffffd8(%rdi), %ecx  ;8b 4f d8 45
ffffff80002f9bbe testl %r8d, %r8d                     ;85 c0
ffffff80002f9bc1 je 0xffffff80002f9bcb ;skip1         ;74 08
ffffff80002f9bc3 cmpl %r8d, %ecx                      ;44 39 c1
ffffff80002f9bc6 movl %r8d, %ecx                      ;44 89 c1
ffffff80002f9bc9 jne 0xffffff80002f9c0f ;continue1    ;75 44
skip1:
ffffff80002f9bcb rdmsr                                ;0f 32
ffffff80002f9bcd movl %eax, %eax                      ;89 c0
ffffff80002f9bcf shlq $0x20, %rdx                     ;48 c1 e2 20
ffffff80002f9bd3 orq %rax, %rdx                       ;48 09 c2
ffffff80002f9bd6 movq %rdx, 0xfffffffffffffff8(%rdi)  ;48 89 57 f8
ffffff80002f9bda movq 0xffffffffffffffe8(%rdi), %rax  ;48 8b 47 e8
ffffff80002f9bde testq %rax, %rax                     ;48 85 c0
ffffff80002f9be1 je 0xffffff80002f9be9 ;skip2         ;74 06
ffffff80002f9be3 notq %rax                            ;48 f7 d0
ffffff80002f9be6 andq %rax, %rdx                      ;48 21 c2
skip2:
ffffff80002f9be9 orq 0xfffffffffffffff0(%rdi), %rdx   ;48 0b 57 f0
ffffff80002f9bed movq %rdx, %r9                       ;49 89 d1
ffffff80002f9bf0 shrq $0x20, %r9                      ;49 c1 e9 20
ffffff80002f9bf4 movl %edx, %eax                      ;89 d0
ffffff80002f9bf6 movl 0xffffffffffffffd8(%rdi), %ecx  ;8b 4f d8
ffffff80002f9bf9 movq %r9, %rdx                       ;4c 89 ca
additional_code:
                 test %ecx,%ecx                       ;85 c9
                 je skip_rdmsr                        ;74 07
ffffff80002f9bfc wrmsr ;0f 30
ffffff80002f9bfe movl 0xffffffffffffffd8(%rdi), %ecx  ;8b 4f d8
ffffff80002f9c01 rdmsr                                ;0f 32
skip_rdmsr:
ffffff80002f9c03 movl %eax, %eax                      ;89 c0
ffffff80002f9c05 shlq $0x20, %rdx                     ;48 c1 e2 20
ffffff80002f9c09 orq %rax, %rdx                       ;48 09 c2
ffffff80002f9c0c movq %rdx, (%rdi)                    ;48 89 17
continue1:
ffffff80002f9c0f addq $0x30, %rdi                     ;48 83 c7 30
ffffff80002f9c13 decl %esi                            ;ff ce
ffffff80002f9c15 jne 0xffffff80002f9bb0 ;loop1        ;75 99
return1:
ffffff80002f9c17 popq %rbp                            ;5d
ffffff80002f9c18 ret                                  ;c3
ffffff80002f9c19 nop                                  ;90
ffffff80002f9c1a nop                                  ;90
ffffff80002f9c1b nop                                  ;90
ffffff80002f9c1c nop                                  ;90 ;; remove these 4 nops
ffffff80002f9c1d nop                                  ;90
ffffff80002f9c1e nop                                  ;90
ffffff80002f9c1f nop                                  ;90

As you can see, because the jump targets have moved, the conditional jumps at 2f9ba9, 2f9bb9, 2f9bc9, and 2f9c15 must be adjusted to accomodate the extra four bytes of code. My first attempt (prior to building the kernel from sources) only adjusted the last one (oops).

The final perl patch is as follows:

perl -pi -e 's|\x74\x6c(\x48\x83\xc7\x28\x90\x8b\x05\x5e\x30\x5e\x00\x85\x47\xdc)\x74\x54(\x8b\x4f\xd8\x45\x85\xc0\x74\x08\x44\x39\xc1\x44\x89\xc1)\x75\x44(\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x57\xf8\x48\x8b\x47\xe8\x48\x85\xc0\x74\x06\x48\xf7\xd0\x48\x21\xc2\x48\x0b\x57\xf0\x49\x89\xd1\x49\xc1\xe9\x20\x89\xd0\x8b\x4f\xd8\x4c\x89\xca)(\x0f\x30\x8b\x4f\xd8\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x17\x48\x83\xc7\x30\xff\xce)\x75\x99(\x5d\xc3\x90{3})\x90{4}|\x74\x70${1}\x74\x58${2}\x75\x48${3}\x85\xc9\x74\x07${4}\x75\x95${5}|g' mach_kernel

With that patch in place, the resulting patched function looks like:

ffffff80002f9ba0 pushq %rbp
ffffff80002f9ba1 movq %rsp, %rbp
ffffff80002f9ba4 movl %edx, %r8d
ffffff80002f9ba7 testl %esi, %esi
ffffff80002f9ba9 je 0xffffff80002f9c1b
ffffff80002f9bab addq $0x28, %rdi
ffffff80002f9baf nop
ffffff80002f9bb0 movl _xcpm_cpu_model(%rip), %eax
ffffff80002f9bb6 testl 0xffffffffffffffdc(%rdi), %eax
ffffff80002f9bb9 je 0xffffff80002f9c13
ffffff80002f9bbb movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9bbe testl %r8d, %r8d
ffffff80002f9bc1 je 0xffffff80002f9bcb
ffffff80002f9bc3 cmpl %r8d, %ecx
ffffff80002f9bc6 movl %r8d, %ecx
ffffff80002f9bc9 jne 0xffffff80002f9c13
ffffff80002f9bcb rdmsr
ffffff80002f9bcd movl %eax, %eax
ffffff80002f9bcf shlq $0x20, %rdx
ffffff80002f9bd3 orq %rax, %rdx
ffffff80002f9bd6 movq %rdx, 0xfffffffffffffff8(%rdi)
ffffff80002f9bda movq 0xffffffffffffffe8(%rdi), %rax
ffffff80002f9bde testq %rax, %rax
ffffff80002f9be1 je 0xffffff80002f9be9
ffffff80002f9be3 notq %rax
ffffff80002f9be6 andq %rax, %rdx
ffffff80002f9be9 orq 0xfffffffffffffff0(%rdi), %rdx
ffffff80002f9bed movq %rdx, %r9
ffffff80002f9bf0 shrq $0x20, %r9
ffffff80002f9bf4 movl %edx, %eax
ffffff80002f9bf6 movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9bf9 movq %r9, %rdx
ffffff80002f9bfc testl %ecx, %ecx
ffffff80002f9bfe je 0xffffff80002f9c07
ffffff80002f9c00 wrmsr
ffffff80002f9c02 movl 0xffffffffffffffd8(%rdi), %ecx
ffffff80002f9c05 rdmsr
ffffff80002f9c07 movl %eax, %eax
ffffff80002f9c09 shlq $0x20, %rdx
ffffff80002f9c0d orq %rax, %rdx
ffffff80002f9c10 movq %rdx, (%rdi)
ffffff80002f9c13 addq $0x30, %rdi
ffffff80002f9c17 decl %esi
ffffff80002f9c19 jne 0xffffff80002f9bb0
ffffff80002f9c1b popq %rbp
ffffff80002f9c1c ret
ffffff80002f9c1d nop
ffffff80002f9c1e nop
ffffff80002f9c1f nop

But that just allows us to replace the 0xE2 (or any MSR value) in the tables with zero such that it is never written to. But we need a patch to change the data tables.

One such table is here (xxd<mach_kernel):

062bc80: e200 0000 0200 0000 0000 0000 0000 0000 ................ ; _xcpm_core_scope_msrs (begin) esi=3
062bc90: 0004 0000 0000 0000 0700 001e 0000 0000 ................
062bca0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
062bcb0: e200 0000 0c00 0000 0000 0000 0000 0000 ................
062bcc0: 0004 0000 0000 0000 0500 001e 0000 0000 ................
062bcd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
062bce0: e200 0000 1000 0000 0000 0000 0000 0000 ................
062bcf0: 0004 0000 0000 0000 0800 007e 0000 0000 ...........~....
062bd00: 0000 0000 0000 0000 0000 0000 0000 0000 ................ ; _xcpm_core_scope_msrs (end)

The patch for this table (three entries) is:

perl -pi -e 's|\xe2(\x00{3}\x02\x00{12}\x04\x00{6}\x07\x00{2}\x1e\x00{20})\xe2(\x00{3}\x0c\x00{12}\x04\x00{6}\x05\x00{2}\x1e\x00{20})\xe2(\x00{3}\x10\x00{12}\x04\x00{6}\x08\x00{2}\x7e\x00{20})|\x00${1}\x00${2}\x00${3}|g' mach_kernel

With these two patches in place, you can boot OS X with the (patched) mach_kernel even when register 0xE2 is locked.

Update #1

I managed to find a 5-byte opcode for (non-sign extend) cmpw $e2,%cx:

            cmpw    $00e2,%cx               ;66 81 f9 e2 00
            je  skip3                   ;74 02

Note that we are comparing cx here, not ecx, so we are comparing only the low-order 16-bits of ecx instead of the full 32-bits. But in all cases, the high-order 16-bits or ecx in this case are always zero (refer to the tables referenced by esi and passed to this function). Not surprising, since MSR register values are not that large.

The side benefit of this is no patching is necessary to the data tables that drive this function, and the rdmsr is allowed to execute.

As such, the following single perl patch can be used instead:

perl -pi -e 's|\x74\x6c(\x48\x83\xc7\x28\x90\x8b\x05\x5e\x30\x5e\x00\x85\x47\xdc)\x74\x54(\x8b\x4f\xd8\x45\x85\xc0\x74\x08\x44\x39\xc1\x44\x89\xc1)\x75\x44(\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x57\xf8\x48\x8b\x47\xe8\x48\x85\xc0\x74\x06\x48\xf7\xd0\x48\x21\xc2\x48\x0b\x57\xf0\x49\x89\xd1\x49\xc1\xe9\x20\x89\xd0\x8b\x4f\xd8\x4c\x89\xca)(\x0f\x30\x8b\x4f\xd8\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x17\x48\x83\xc7\x30\xff\xce)\x75\x99(\x5d\xc3)\x90{7}|\x74\x73${1}\x74\x5b${2}\x75\x4b${3}\x66\x81\xf9\xe2\x00\x74\x02${4}\x75\x92${5}|g' mach_kernel

Note: It uses all seven “spare” nop codes at the end of the function. But a closer look at this code reveals a poorly optimized function anyway. For example there are two instances of movl %eax,%eax:

ffffff80002f9bcd    movl    %eax, %eax              ;89 c0 (like a nop)
...
ffffff80002f9c03    movl    %eax, %eax              ;89 c0 (like a nop)

Note that even the simplest post-codegen optimizer should be able to remove such code.

And the reload of ecx after wrmsr is not necessary:

ffffff80002f9bfe    movl    0xffffffffffffffd8(%rdi), %ecx      ;8b 4f d8 (not necessary!)

So there is actually seven extra bytes for a total of 14 that could be used for extra code here.

Update #2

This patch is also applicable to the 10.8.5 kernel. I haven’t tested it, but the code is exactly the same (and at the same location) in the 10.8.5 kernel.

Note also that the Clover team has added this patch as an automatic on-the-fly patch. Just set “KernelPm” to true in the your config.plist (same section as “KernelLapic” would go).

Update #3

Scratch that on the 10.8.5 kernel. It is slightly different (I got my USB sticks confused!). I will provide an updated patch in a jiffy…

Update #4

Here is the 10.8.5 patch (not tested, but code review looks ok). I will test this later when I have a chance to install 10.8.5 on my laptop.

perl -pi -e 's|\x74\x69(\x48\x83\xc7\x28\x90\x8b\x05\xfe\xce\x5f\x00\x85\x47\xdc)\x74\x51(\x8b\x4f\xd8\x45\x85\xc0\x74\x05\x44\x39\xc1)\x75\x44(\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x57\xf8\x48\x8b\x47\xe8\x48\x85\xc0\x74\x06\x48\xf7\xd0\x48\x21\xc2\x48\x0b\x57\xf0\x49\x89\xd1\x49\xc1\xe9\x20\x89\xd0\x8b\x4f\xd8\x4c\x89\xca)(\x0f\x30\x8b\x4f\xd8\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x17\x48\x83\xc7\x30\xff\xce)\x75\x9c(\x5d\xc3)\x90{7}(\x90{3})|\x74\x70${1}\x74\x58${2}\x75\x4b${3}\x66\x81\xf9\xe2\x00\x74\x02${4}\x75\x95${5}${6}|g' mach_kernel

Update #5

You may have noticed that Apple has recently released 10.9.1. There are two versions of the 10.9.1 update, one for the new Haswell MacBookPro11,x (retina) laptops and another for the rest. The one for Haswell retina machines has a new mach_kernel, version 13.0.2. The code has changed only slightly but still requires a new patch.

The changed patch is as follows:
perl -pi -e 's|\x74\x6c(\x48\x83\xc7\x28\x90\x8b\x05\x46\x37\x5e\x00\x85\x47\xdc)\x74\x54(\x8b\x4f\xd8\x45\x85\xc0\x74\x08\x44\x39\xc1\x44\x89\xc1)\x75\x44(\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x57\xf8\x48\x8b\x47\xe8\x48\x85\xc0\x74\x06\x48\xf7\xd0\x48\x21\xc2\x48\x0b\x57\xf0\x49\x89\xd1\x49\xc1\xe9\x20\x89\xd0\x8b\x4f\xd8\x4c\x89\xca)(\x0f\x30\x8b\x4f\xd8\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x17\x48\x83\xc7\x30\xff\xce)\x75\x99(\x5d\xc3)\x90{7}|\x74\x73${1}\x74\x5b${2}\x75\x4b${3}\x66\x81\xf9\xe2\x00\x74\x02${4}\x75\x92${5}|g' mach_kernel

There are only two bytes different in the search pattern, but since these two bytes are likely to change in each release of the kernel, it probably makes sense to use a wildcard for them. Here is a modified patch that works on both current 10.9 kernels (13.0.0, 13.0.2… what happened to 13.0.1?):

perl -pi -e 's|\x74\x6c(\x48\x83\xc7\x28\x90\x8b\x05..\x5e\x00\x85\x47\xdc)\x74\x54(\x8b\x4f\xd8\x45\x85\xc0\x74\x08\x44\x39\xc1\x44\x89\xc1)\x75\x44(\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x57\xf8\x48\x8b\x47\xe8\x48\x85\xc0\x74\x06\x48\xf7\xd0\x48\x21\xc2\x48\x0b\x57\xf0\x49\x89\xd1\x49\xc1\xe9\x20\x89\xd0\x8b\x4f\xd8\x4c\x89\xca)(\x0f\x30\x8b\x4f\xd8\x0f\x32\x89\xc0\x48\xc1\xe2\x20\x48\x09\xc2\x48\x89\x17\x48\x83\xc7\x30\xff\xce)\x75\x99(\x5d\xc3)\x90{7}|\x74\x73${1}\x74\x5b${2}\x75\x4b${3}\x66\x81\xf9\xe2\x00\x74\x02${4}\x75\x92${5}|g' mach_kernel

Update #6

The same patch provided above works for the 10.9.2 mach_kernel (13.1.0).

Written by racerrehabman

2013/11/25 at 05:28

Posted in Computers

Tagged with , , ,