开源项目源码解读---objgraph，绘制python对象引用关系

1. objgraph

objgraph 是一个非常有趣的python项目，它可以绘制出python对象的引用关系，让你对python的对象管理由更直观的理解，它的github 地址是：https://github.com/mgedmin/objgraph

使用pip 安装

pip install objgraph

此外还要安装graphviz，下载地址为： http://www.graphviz.org/download/ 在这里根据平台选择相应的安装包，我下载的是graphviz-3.0.0 (64-bit) EXE installer [sha256] ，安装以后，将bin目录配置到机器的环境变量里，我的安装目录是 D:\Program Files\Graphviz，在Path 环境变量里增加D:\Program Files\Graphviz\bin ， windows需要重启机器，环境变量才会生效。

示例代码

x = [1, 3, 4]

y = [x, [x], dict(x=x), {'name': 'python', 'score': 3}]

import objgraph

objgraph.show_refs([y], filename='./sample-graph.png')

生成的图片
python对象引用关系

在这个项目里，都用到了哪些技术呢，一起来研究一下。

2. graphviz

graphviz 是一个绘图软件，它有一个dot工具，可以根据dot语言编写的文件生成git,png,pdf 等文件，下面给出一个小的示例，新建文件flow.dot

digraph flow{
a;
b;
c;
d;
a->b;
b->d;
c->d;
}

执行命令

dot -Tpng flow.dot -o flow.png

生成图片
python对象引用关系

用起来还是挺方便的，一些要求不高的图，使用它来绘制要快多了，dot语法也比较简单，a b c d 是节点，节点之间的关系可以用 -> 来表示，这样就完成了简单的构图。

3. 获得python对象的引用

一个python对象，都引用了哪些其他对象，可以使用gc.get_referents方法来获取

import gc

dic = {
    'language': 'python',
    'score': 100
}
print(gc.get_referents(dic))        # ['python', 100]

lst = ['python', 100]
print(gc.get_referents(lst))        # ['python', 100]

字典和 lst 在内部都引用了字符串python 和 100。

4. 绘制python对象引用图

想要绘制python对象之间的引用关系，就需要逐层的获取对象的引用关系。对象之间的引用关系最终构成一个图结构，想要遍历这张图，或是采取深度优先遍历，或是采取广度优先遍历，在objgraph 里，作者采用的是广度优先遍历的算法，同时为了避免深度过大，还对遍历的深度做了限制。

源码阅读起来比较困难，因此，我对源码进行删减，这样理解起来相对容易一些

import sys
import os
import gc
import subprocess


lst = ['python', 100]
dic = {
    'language': 'python',
    'score': 100
}
y = [lst, dic]



def _obj_node_id(obj):
    return ('o%d' % id(obj)).replace('-', '_')

def _obj_label(obj):
    labels = []
    labels.append(str(type(obj).__name__))
    if isinstance(obj, (list, tuple, dict, set)):
        labels.append(f"{len(obj)} items")
    else:
        labels.append(str(obj))

    return '\\n'.join(labels)


def _edge_label(source, target):
    if isinstance(source, dict):
        for key, value in source.items():
            if value is target:
                return ' [label="%s",weight=2]' % str(key)
    else:
        return ''

def show_graph(objs, max_depth=3, filename='./sample-graph.png'):
    dot_filename = "./graph.dot"
    f = open(dot_filename, 'w', encoding="utf-8")
    queue = []
    depth = {}
    ignore = set()
    ignore.add(id(objs))
    ignore.add(id(queue))
    ignore.add(id(depth))
    ignore.add(id(ignore))
    ignore.add(id(sys._getframe()))  # this function
    ignore.add(id(sys._getframe().f_locals))
    ignore.add(id(sys._getframe(1)))  # show_refs/show_backrefs
    ignore.add(id(sys._getframe(1).f_locals))
    f.write('digraph ObjectGraph {\n'
            '  node[shape=box, style=filled, fillcolor=white];\n')

    for obj in objs:
        f.write('  %s[fontcolor=red];\n' % (_obj_node_id(obj)))
        depth[id(obj)] = 0
        queue.append(obj)
        del obj
    gc.collect()
    nodes = 0
    while queue:
        nodes += 1
        target = queue.pop(0)
        tdepth = depth[id(target)]
        f.write('  %s[label="%s"];\n' % (_obj_node_id(target),
                                           _obj_label(target)))

        if tdepth >= max_depth:
            continue
        neighbours = gc.get_referents(target)
        ignore.add(id(neighbours))
        n = 0
        for source in neighbours:
            if id(source) in ignore:
                continue

            elabel = _edge_label(target, source)
            f.write('  %s -> %s%s;\n' % (_obj_node_id(target),
                                         _obj_node_id(source), elabel))
            if id(source) not in depth:
                depth[id(source)] = tdepth + 1
                queue.append(source)
            n += 1
            del source

        del neighbours
    f.write("}\n")
    f.close()

    stem, ext = os.path.splitext(filename)
    cmd = ['dot', '-T' + ext[1:], '-o' + filename, dot_filename]
    dot = subprocess.Popen(cmd, close_fds=False)
    dot.wait()
    if dot.returncode != 0:
        # XXX: shouldn't this go to stderr or a log?
        print('dot failed (exit code %d) while executing "%s"'
              % (dot.returncode, ' '.join(cmd)))
    else:
        print("Image generated as %s" % filename)

show_graph(y)

在遍历寻找引用关系时，有一些对象要忽略掉，因为计算对象引用关系的代码会产生新的引用，要把这些新产生的引用忽略掉，集合ignore 就是起到这个作用。包括一些地方要删除对象，也是处于这个考虑。

首先将根节点对象放入到queue中，这种数据结构是先进先出的数据结构，适合做广度优先遍历。

pop方法弹出一个节点，而后通过gc.get_referents方法得到这个对象所引用的其他对象，对这些对象进行遍历，构建他们的引用关系。这些被引用的对象也会放入到queue中，等待被pop方法弹出计算它们所引用的对象。

这样一层一层的进行遍历，depth记录了一个对象被遍历的深度，当这个深度超过限制时，就停止这条线上的进一步遍历。

最后使用dot工具生成png图片，使用了subprocess.Popen来执行命令。

生成的引用关系图
python对象引用关系

int 100 和字符串python，在内存中只存在一个，list中的100 和字典中的100，是同一个，字符串python也是如此，这就是python在内存中管理对象的方式，