1. Objectives

It’s 2023, and MD5 is already the most basic signature algorithm, but if you still only do a simple MD5 on the input, you will definitely be laughed at by your peers. Adding salt is a basic improvement, but in this era of severe employment, just adding salt is definitely not enough.

Today we will talk about the modified MD5 to make this algorithm more advanced.

  1. Vegetable rolls

The simplest way to modify it is to change the initial parameters of MD5.

     context->state[0] = 0x67452301;
     context->state[1] = 0xEFCDAB89;
     context->state[2] = 0x98BADCFE;
     context->state[3] = 0x10325476;

Just modify these four parameters. By modifying these parameters, we can change the result of MD5 calculation. But this method is too simple and cannot be rolled up.

Next, we will introduce more advanced curling methods.

  1. Meat roll

md5 will perform 64 rounds of operations, each of which uses a constant to form a constant table K.

The original value of K is calculated as 2^32 * |sin i |, and then the integer part is taken.

Then ideal students can change the K value, for example, change sin to cos or tan, so that it can be rolled up.

  1. Volume within volume
#define F(x,y,z) ((x & y) | (~x & z))
#define G(x,y,z) ((x & z) | (y & ~z))
#define H(x,y,z) (x^y^z)
#define I(x,y,z) (y ^ (x | ~z))

#define ROTATE_LEFT(x,n) ((x << n) | (x >> (32-n)))

To really roll it up, we need to change the four nonlinear transformation functions F, G, H, and I in MD5. We can addXORor reduceandThis advanced method can fool the boss and make the algorithm look advanced.

Our goal today is to try to restore a modified MD5 algorithm and understand the basic methods of algorithm restoration through this practice.

In this sample, our input parameter is a string: “1677038066553”

The return value is 32 characters: “DD89CA684D91818B970710F75A75743D”

2. Steps

first step

We need to use Unidbg to run the algorithm. Compared with our predecessors who used ida to debug in ancient times, the emergence of Unidbg directly reduced the difficulty of restoring the algorithm by an order of magnitude.

Step 2

We need to reverse the result Z step by step back to the original input A. This method is called reversing the cause and effect, which is a basic routine of reverse analysis.
Assuming that this sample is MD5 or a modified MD5, we can use the following methods to restore the algorithm:
1. Debug breakpoints
2. Conditional breakpoints
3. Data printing
4. Trace memory reading and writing
5. Trace code

1. Debug breakpoints

Reverse analysis is an empirical science. Although there are some basic routines, it is still based on trial and error. First, use IDA to open libnative-lib.so and find the exported function from the Exports export table.Java_com_littleq_cryptography_md5_MainActivity_sign

The start address of this function is 0x1234, and the end address is 0x12B4, but the main code logic is in the function sub_A3C. Let’s try setting a breakpoint at the end of the sub_A3C function.

text:00000000000011D4 E0 07 40 F9                 LDR             X0, [SP,#0x110+var_108]
.text:00000000000011D8 03 00 00 90+                ADRL            X3, aSSSS ; "%s%s%s%s"
.text:00000000000011D8 63 EC 0A 91
.text:00000000000011E0 E4 83 01 91                 ADD             X4, SP, #0x110+var_B0
.text:00000000000011E4 E5 43 01 91                 ADD             X5, SP, #0x110+var_C0
.text:00000000000011E8 E6 03 01 91                 ADD             X6, SP, #0x110+var_D0
.text:00000000000011EC E7 C3 00 91                 ADD             X7, SP, #0x110+var_E0
.text:00000000000011F0 01 00 80 92                 MOV             X1, #0xFFFFFFFFFFFFFFFF
.text:00000000000011F4 02 08 80 52                 MOV             W2, #0x40 ; '@'

This 0x11D8 looks like a format string.

We set a breakpoint at 0x11D8 in Unidbg

    Debugger debugger = emulator.attach();
    debugger.addBreakPoint(module.base + 0x11D8);

Run it and it will break smoothly.

debugger break at: 0x400011d8 @ Function64 address=0x40001234, arguments=[unidbg@0xfffe1640[libandroid.so]0x640, 1853170425, 2008362258]
>>> x0=0xbffff690(-1073744240) x1=0x0 x2=0x4 x3=0xbfffed20 x4=0x40230200 x5=0x402302c0 x6=0x1 x7=0xbffff708 x8=0x0 x9=0x0 x10=0x1 x11=0x0 x12=0x8 x13=0x8 x14=0x8
>>> x15=0x8 x16=0x40228d70 x17=0x40177ddc x18=0x8 x19=0x4cf3a208 x20=0x400012b8 x21=0x0 x22=0x68ca89dd x23=0x3d74755a x24=0x72e737bb x25=0xddf5ac1 x26=0xd0d5adc6 x27=0x8b81914d x28=0xf7100797 fp=0xbffff680
LR=RX@0x400011d4[libnative-lib.so]0x11d4
SP=0xbffff570
PC=RX@0x400011d8[libnative-lib.so]0x11d8
nzcv: N=0, Z=1, C=1, V=0, EL0, use SP_EL0
start + 0xae8
=> *[libnative-lib.so*0x011d8]*[03000090]*0x400011d8:*"adrp x3, #0x40001000"
    [libnative-lib.so 0x011dc] [63ec0a91] 0x400011dc: "add x3, x3, #0x2bb"
    [libnative-lib.so 0x011e0] [e4830191] 0x400011e0: "add x4, sp, #0x60"
    [libnative-lib.so 0x011e4] [e5430191] 0x400011e4: "add x5, sp, #0x50"
    [libnative-lib.so 0x011e8] [e6030191] 0x400011e8: "add x6, sp, #0x40"
    [libnative-lib.so 0x011ec] [e7c30091] 0x400011ec: "add x7, sp, #0x30"
    [libnative-lib.so 0x011f0] [01008092] 0x400011f0: "mov x1, #-1"
    [libnative-lib.so 0x011f4] [02088052] 0x400011f4: "mov w2, #0x40"
    [libnative-lib.so 0x011f8] [5bfdff97] 0x400011f8: "bl #0x40000764"

In Arm assembly, before calling a function, the input parameters are stored in x0, x1, x2 ……

From this code, we can see that the address 0x400011f8 will call the 0x40000764 function and pass in 7 parameters, from x0 to x7.

Although Unidbg’s debugging is a bit crude, it is sufficient. With such a magical tool in hand, what else do you want?

The debugging commands should first be mastered as follows:

s single step into, that is, it will enter when encountering a function call.

n Single stepping, will not enter the function when encountering a function call.

c Continue execution

b Breakpoint

r cancel the current breakpoint

m View memory

Let’s sss a few times and single-step to 0x400011f8

debugger break at: 0x400011f8 @ Function64 address=0x40001234, arguments=[unidbg@0xfffe1640[libandroid.so]0x640, 1853170425, 2008362258]
>>> x0=0xbffff690(-1073744240) x1=0xffffffffffffffff x2=0x40 x3=0x400012bb x4=0xbffff5d0 x5=0xbffff5c0 x6=0xbffff5b0 x7=0xbffff5a0 x8=0x0 x9=0x0 x10=0x1 x11=0x0 x12=0x8 x13=0x8 x14=0x8
LR=RX@0x400011d4[libnative-lib.so]0x11d4
SP=0xbffff570
PC=RX@0x400011f8[libnative-lib.so]0x11f8
nzcv: N=0, Z=1, C=1, V=0, EL0, use SP_EL0
start + 0xb08
=> *[libnative-lib.so*0x011f8]*[5bfdff97]*0x400011f8:*"bl #0x40000764"

At this point, the input parameters are ready. Let’s look at them one by one.

mx7

>-----------------------------------------------------------------------------<
[10:40:26 646]x7=unidbg@0xbffff5a0, md5=d6c164ca9ef531557fc14e1bf7173663,
size: 112
0000: 35 41 37 35 37 34 33 44 00 B3 22 40 00 00 00 00    5A75743D.."@....
0010: 39 37 30 37 31 30 46 37 00 8D 09 40 00 00 00 00    970710F7...@....
0020: 34 44 39 31 38 31 38 42 00 77 12 40 02 00 00 00    4D91818B.w.@....
0030: 44 44 38 39 43 41 36 38 00 1B 17 40 02 00 00 00    DD89CA68...@....
0040: 31 36 37 37 30 33 38 30 36 36 35 35 33 80 00 00    1677038066553...
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
^-----------------------------------------------------------------------------^

You can see that this time the function 0x40000764 is called, which is basically assembling the final result.

What we need to do is to find the locations where these results are generated to analyze how the final result is calculated, that is, the process of Y → Z.

4. Trace memory reading and writing

Now that we know the location of the result Z, the next step is to know who calculated Z.

This requires the use of a powerful function of Unidbg: memory read and write monitoring

This time we set the debug breakpoint earlier, at the beginning of the sub_A3C function.

debugger break at: 0x40000a3c @ Function64 address=0x40001234, arguments=[unidbg@0xfffe1640[libandroid.so]0x640, 1853170425, 2008362258]
>>> x0=0x40004000 x1=0xbffff690 x2=0x0 x3=0x1 x4=0x0 x5=0x1 x6=0x0 x7=0x0 x8=0xfffe0a70 x9=0x3002 x10=0x0 x11=0x1 x12=0x3 x13=0x40003018 x14=0x40003028
>>> x15=0x1 x16=0x40228910 x17=0x0 x18=0x17 x19=0xfffe1640 x20=0xbffff708 x21=0x0 x22=0x0 x23=0x0 x24=0x0 x25=0x0 x26=0x0 x27=0x0 x28=0x0 fp=0xbffff6f0
LR=RX@0x40001280[libnative-lib.so]0x1280
SP=0xbffff690
PC=RX@0x40000a3c[libnative-lib.so]0xa3c
nzcv: N=0, Z=0, C=1, V=0, EL0, use SP_EL0
start + 0x34c
=> *[libnative-lib.so*0x00a3c]*[ff8304d1]*0x40000a3c:*"sub sp, sp, #0x120"

traceWrite 0xbffff5d0 0xbffff5d8
Set trace 0xbffff5d0->0xbffff5d8 memory write success.
c
[11:41:41 656] Memory WRITE at 0xbffff5d8, data size = 1, data value = 0x0, PC=RX@0x40001168[libnative-lib.so]0x1168, LR=null
[11:41:41 657] Memory WRITE at 0xbffff5d0, data size = 8, data value = 0x0, PC=RX@0x4000116c[libnative-lib.so]0x116c, LR=null
[11:41:41 661] Memory WRITE at 0xbffff5d8, data size = 1, data value = 0x0, PC=RX@0x401b48cc[libc.so]0x648cc, LR=RX@0x401b48c8[libc.so]0x648c8

traceWrite is to monitor the memory write command.

It looks like the memory at 0xbffff5d0 is writtenDD89CA68The location of the data is: 0x116c

text:000000000000114C 14 00 00 90+                ADRL            X20, unk_12B8
.text:000000000000114C 94 E2 0A 91
.text:0000000000001154 C4 0A C0 5A                 REV             W4, W22
.text:0000000000001158 E0 83 01 91                 ADD             X0, SP, #0x110+var_B0
.text:000000000000115C 21 01 80 52                 MOV             W1, #9
.text:0000000000001160 22 01 80 52                 MOV             W2, #9
.text:0000000000001164 E3 03 14 AA                 MOV             X3, X20
.text:0000000000001168 FF A3 01 39                 STRB            WZR, [SP,#0x110+var_A8]
.text:000000000000116C FF 33 00 F9                 STR             XZR, [SP,#0x110+var_B0]
.text:0000000000001170 7D FD FF 97                 BL              sub_764

The instruction STR XZR at 0x116c is a write operation, but it does not look like writing data. SP,#0x110+var_B0The data at this address is cleared to zero.

Then let’s start over again. (The advantage of Unidbg is that it can be replayed infinitely, which is many times more convenient than debugging the App on a real machine.)

This time go a little further and set a breakpoint at 0x114C.

After breaking down, after single-stepping once every s, check m0xbffff5d0.

Finally, it was found that after running 0x1170, the value of the memory at 0xbffff5d0 changed to,DD89CA68This means that 0xbffff5d0 is written by sub_764 function.

debugger break at: 0x40001170 @ Function64 address=0x40001234, arguments=[unidbg@0xfffe1640[libandroid.so]0x640, 1853170425, 2008362258]
>>> x0=0xbffff5d0(-1073744432) x1=0x9 x2=0x9 x3=0x400012b8 x4=0xdd89ca68 x5=0xe6cd8e62 x6=0x24523012 x7=0x29b9c389 x8=0x40 x9=0x40318041 x10=0xbffff5e0 x11=0x40 x12=0x3d5ebb2b x13=0x6450c165 x14=0xfc63b7e7
>>> x15=0x49ac16b x16=0xac6af723 x17=0xf3d1564b x18=0x18 x19=0x4cf3a208 x20=0x400012b8 x21=0x0 x22=0x68ca89dd x23=0x3d74755a x24=0x72e737bb x25=0xddf5ac1 x26=0xd0d5adc6 x27=0x8b81914d x28=0xf7100797 fp=0xbffff680
LR=null
SP=0xbffff570
PC=RX@0x40001170[libnative-lib.so]0x1170
nzcv: N=0, Z=1, C=1, V=0, EL0, use SP_EL0
start + 0xa80
=> *[libnative-lib.so*0x01170]*[7dfdff97]*0x40001170:*"bl #0x40000764"

But back to 0x1170, we found a familiar string of numbersx4=0xdd89ca68, Okay, our question becomes how is the value of x4 calculated?