使用 Python 的 JSON 模块序列化时所遇到的问题

问题

使用 Python 的 JSON 模块序列化含有 UTF-8 编码的数据，得到的字符串输出不可读。

import json

data = { '键': '值' }
json.dumps(data)

输出：

'{"\\u952e": "\\u503c"}'

解决方法

将 dumps 的参数 ensure_ascii 设为 False 即可得到可读的字符串输出。

json.dumps(data, ensure_ascii=False)

输出：

'{"键": "值"}'

原因

Python 3 文档解释：

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

即 Python 在序列化 JSON 时，默认仅使用 ASCII 编码来完成输出。对于汉字等非 ASCII 编码的字符，Python 会将其转义。

事实上，由于部分环境对 Unicode 的支持仍然不完善，甚至仅支持 ASCII 编码，输出 ASCII 编码的数据可以保证数据的一致性，避免编码问题。

参考资料

json — JSON encoder and decoder — Python 3 documentation python - Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence - Stack Overflow

问题#

解决方法#

原因#

参考资料#

问题

解决方法

原因

参考资料