[0CTF 2020 Quals] - One Line JS
Introduction
I played 0CTF as a member of Shellphish last weekend. The CTF was pretty awesome and gave me a lot of excitement. I learned about v8 hacking by solving the Chromium RCE
challenge and about PHP internals by wasting my time on Baby Bypass
. Overall, I’m pretty satisfied with the experience in the CTF.
The only imperfection is that I didn’t manage to pull off One Line JS
in time before the game ended. In fact, I solved it in 3 hours after the game ended.One Line JS
is a very interesting challenge about pwning a small real-world javascript engine. Although it is “small”, it still consists of more than 10 thousand lines of code, which is daunting for a 48h CTF. And I guess that’s the reason it only got 1 solve during the game.
Here, I would like to share how I understood the challenge piece by piece and eventually pwned it in a 24-hour course.
Everything can be found here
Recon
The target JS engine(MuJS
) is actually part of Ghostscript
according to their website. The organizers introduce a backdoor to the project and wante us to pwn it.
MuJS
is super tiny and it only implements part of javascript features. By playing with the interpreter mujs
for a bit, I quickly found out that ArrayBuffer
and TypedArray
are not implemented. Thankfully, Array
is still there.
With my previous experience with JS engines, I know that objects
, values
, properties
and elements
are the most important concepts in exploiting a JS engine(see [1]). To investigate, I downloaded its source code from here and compiled a debug build to understand how these concepts are implemented in the project.
Indeed, my instinct didn’t fail me: there is something fishy here.
The same to other JS engines, objects
in mujs
are abstracted by a struct js_Object
:
1 | struct js_Object |
values
in mujs
are abstracted by a struct js_Value
:
1 | struct js_Value |
But instead of having a Shape
or Map
object to describe how to access an object’s properties or elements, it has a js_Property
object:
1 | struct js_Property |
Basically, a property is a glue connecting js_Object
and js_Value
. And mujs
keeps track of all properties of an object by maintaining a tree. For example, when I try to access obj.aaa
, the engine will traverse through the property tree and look for a js_Property
with the name
aaa
. If it finds such a js_Property
, it returns the js_Value
embeded in the js_Property
.
By some further investigation, I found out that all the leaves of the property tree are guarded by a statically defined special js_Property
called sentinel
.
I got all the results above by examining memory with debug symbols and reading header files. Natually, I tried to do the same for elements
(elmements in an array). But I found nothing, I was very confused. During the initial exploration, I was sure that Array
was implemented. So, I took some time to read the C code. it turns out that elements
are implemented as as properties
. lol.
1 | void js_getindex(js_State *J, int idx, int i) |
It took me much time to get here, but I got a lot of interesting findings at this point:
ArrayBuffer
andTypedArray
are not implemented- properties are maintained in a tree and
js_Property
-s are scattered in heap - there is a pointer in the tree that may help me leak code base(
sentinel
) - elements are implemented as properties
Analysis
Now let’s see what changes the organizers did to the project. The patch is actually long. I only show the most important changes here:
1 | +static void jsB_backdoor(js_State *J) |
1 | + js_newcfunction(J, jsB_backdoor, "backdoor", 2); |
Basically, they removed a lot of “insecure” functions to make our life harder and introduced a backdoor which is a limited OOB-write.
Without having information leak, especially with a super dynamic heap(it is deterministic, but the heap layout changes as I change my code), I had to use it smartly.
OOB-write to type-confusion
A limited OOB-write is far from enough to get a shell. So, how should I use the primitive to gain a better primitive?
My first idea was to overwrite the size
variable of an unsortedbin
(a typical heap geek’s move) to achieve chunk-overlapping. After this was implemented, I noticed that it was way too unstable. One reason is that the heap layout keeps changing as my code changes. The other reason, also the more important one is that I don’t have full control over the heap, any unexpected allocation in the overlapped region may result in a crash.
Once I realized that, I quickly abandoned that idea.
To find what to overwrite, I set breakpoint at the jsB_backdoor
function and examined the adjacent memory. I noticed that a js_Value
looks like this in memory:
1 | 0x5555557c3230: 0x00005555557cf4b4 0x0500000000000000 |
The first 8 bytes is the data, can be a double
, can be a pointer
and even a char
array. It is followed by 7 bytes of padding and a 1 byte type
variable. This type
variable decides how the engine interprets the js_Value
.
According to the definition:
1 | enum js_Type { |
If I overwrite the type
variable from 5(JS_TLITSTR
) to 4(JS_TNUMBER
), a string pointer will be interpreted as a double
number. By using this, I was able to leak a heap address.
Naturally, if I overwrite type
from 4 to 5, a double
will be interpreted as a pointer and whatever it points to can be returned back to me.
And most importantly, this plan satisfies the constraints of the OOB-write!!!
By summarizing what I had, I came up with a new plan:
- declare and allocate a string as a reference(in the backdoor function,
obj.toString()
is called, if the type ofobj
isJS_TLITSTR
, it will directly return its content to me without any transformation) - declare an empty object
obj
- assign a lot of different string properties to
obj
. This serves as both a spray ofjs_Property
and a spray ofjs_Value
because eachjs_Property
embeds ajs_Value
. - use the backdoor to write 0x04xx so the
type
of thejs_Value
will be overwritten by 4. - leak heap pointer by traversing the properties of
obj
It is implemented like this:
1 | str2='BBBBBBBBBBBBBBBB'; |
The heap layout looks like this:
1 | str2(the object fed to the backdoor function) |
The plan worked as expected smoothly. At the moment, I was able to get heap address leaked as a double value.
Similarly, after leaking a heap address, I could leak content from any address by reusing the same obj
object.
- encode the address in double value
- assign all its properties to the value
- use the backdoor to write 0x05xx
- now the double value is interpreted as a string, so we can leak whatever in that address
In this way, I was supposed to be able to have arbitrary read. Right?
WTF? Encodings? -> stable arbitrary read
Well, not really.
Now I was able to leak address in two forms:
- in the string-to-number type confusion, leak addresses in double values
- in the number-to-string type confusion, leak addresses by dereferencing a controlled pointer
Sounds like a plan.
But, how to transform a double value to an integer so arithmetic operations can be performed on the pointer? Easy, by using ArrayBuffer
and TypedArray
… Oh wait, they are not implemented. WTF?
This got me stuck for a while until I found a super old snippet that supports this by pure javascript calculations. It is not perfect, but it works most of the time.
OK, problem 1 solved. What’s wrong with directly printing strings?
The problem is: mujs
stores strings in utf-8 encoding. When it encounters an invalid utf-8 character in a string, it gets confused and returns a bad character to us. For example, s = "\xd0"
will store \xc3\x90
in memory. When we do s[0]
, it returns \xd0
correctly. However, in our exploitation, if what we want to leak has \xd0
as the first byte, s[0]
returns a bad character because \xd0
is not a valid utf-8 encoding.
I thought this was inevitable, there was no way to retore the address perfectly for some time. But at one point, I realized it may be possible to transform the bad characters into some other forms first and then transform it back to integers. Indeed, it worked. And the magic function is encodeURI
.
By using the arbitrary read, it’s easy to leak the code base by traversing a property tree to find the sentinel
leaf. Then libc can be leaked easily.
addrof and … wait a second
In modern JS engine exploitations, there are two very important primitives: addrof
and fakeobj
.addrof
: give it an object, the primitive is able to tell the address of the object.fakeobj
: prepare payload in memory and give it the payload address, the primitive is able to make the JS engine think the payload is a valid object.
By chaining these two primitives, an attacker can easily gain arbitrary read/write primitive.
In fact, I have already implemented the addrof
primitive by the string-to-number type confusion. Instead of giving it a string, giving it any object and then overwrite its type to 4 can leak the address of the object.addrof
is done. Good.
I encountered issues when trying to implement fakeobj
. To implement fakeobj
, there are two steps:
- prepare payload(a fake object) in memory
- type confusion from number to object
Step2 is easy in our case. What about step1?
Step1 is also easy in most JS engines because of the existence of ArrayBuffer
and the fact that properties are usually stored linearly. But in mujs
, ArrayBuffer
is not implemented. And properties are scattered. How to prepare a fake object in mujs
then?
In fact, I didn’t manage to find a method to make attacker controlled data stored linearly in memory after arbitray read. Instead, I tried to reuse what I had in hand.
fakestr and arbitrary write
Recall what I have achieved: string-to-number becomes addrof
primitive, what about number-to-string?
It is still fakeobj
in a sense, because it gives me a fake string. And it is possible to access it in the JS binding.
Now, what if I use the fake string as the argument to the backdoor function? Since the internal pointer of the string is provided by me, I could write to anywhere I want 2 bytes at a time. That’s a limited arbitrary write primitive!
Arbitrary read/write and RCE
At this moment, I got arbitray read/write. Although I could only write 2 bytes at a time, it should be enough to get shell in many ways.
My approach was:
- prepare a
js_Property
around__free_hook
by copy & pasting a real property. where__free_hook
corresponds to the data of thejs_Value
- hijack a property pointer to our fake property
- overwrite
__free_hook
by setting the property withsystem
(encoded in double) - trigger
free(cmd)
(When I’m writing this, I just realize hijacking an empty tcache free list is a better way… One thing worth noticing, we can’t overwrite tcachebin in the metadata because of the nature of the backdoor: the reference object will points to an unmaped region and crashes.)
Finish
The CTF ended right after I got system
execution. It was not too close though. It still took me 10-20 minutes to trigger my command stably.
Most importantly… this challenge requires the attack script in one line and the max length is 4096. But when it worked for the first time, it had more than 10000 characters… It took me almost 2 hours to shrink it to around 3500(I made a mistake so I thought the limit was 3200….).
Final exploit is here
A working and also relatively readable exploit is here
Yet another relatively readable exploit is here
Conclusion
This challenge is all about adapting previous knowledge into a new target. We need to understand the difference between the new target and what we’ve encountered before first. And then we should try to apply what we knew if there are anything similar.
Most importantly, we need to try to get out of the comfort zone and take new challenges. This is the best why to learn and gain better understanding of what we already know.