Why Python Pickle is Insecure
Python pickle is a powerful serialization module. It is the most common method to serialize and deserialize Python object structures. The pickle module has an optimized cousin called cPickle that is written in C. In this post I'm going to refer to both modules by the name pickle unless I mention otherwise. The security issues I'm going to discuss apply to both of them.
What This is All About
Pickle was never claimed to be secure. In the pickle documentation there is a warning in red that says:
Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
This clearly states that pickle is insecure. Many think this is because it can load classes other than what you expect and may trick you to run their functions. But the actual security risk is far more dangerous. Unpickling can be exploited to execute arbitrary commands on your machine!
Take this little example:
import pickle pickle.loads("cos\nsystem\n(S'ls ~'\ntR.") # This will run: ls ~
Or of you are running windows try this instead:
import pickle pickle.loads("cos\nsystem\n(S'dir'\ntR.") # This will run: dir
You can replace ls and dir with any other command.
I will use pickletools.dis to disassemble the pickle and show you how this is working:
import pickletools print pickletools.dis("cos\nsystem\n(S'ls ~'\ntR.")
Output:
0: c GLOBAL 'os system' 11: ( MARK 12: S STRING 'ls ~' 20: t TUPLE (MARK at 11) 21: R REDUCE 22: . STOP
Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object. In other words the pickled instructions in our example are:
- Push
self.find_class(module_name, class_name)i.e. pushos.system - Push the string
'ls ~' - Build tuple from topmost stack items
- Apply callable to argtuple, both on stack. i.e. os.system(*('ls ~',))
The example is not exploiting a bug in pickle. Reduce is a vital step to instantiate objects from their classes. Take this example where I am unpickling an instance of the built-in object class:
import pickletools import pickle print pickletools.dis(pickle.dumps(object()))
Output:
0: c GLOBAL 'copy_reg _reconstructor' 25: p PUT 0 28: ( MARK 29: c GLOBAL '__builtin__ object' 49: p PUT 1 52: g GET 1 55: N NONE 56: t TUPLE (MARK at 28) 57: p PUT 2 60: R REDUCE 61: p PUT 3 64: . STOP
Note the REDUCE step. To create an instance of the class object, pickle has to get the __builtin__.object class and then apply it to the given arguments.
As of 2.3 Python abandoned any pretense that it might be safe to load pickles received from untrusted parties. Because no sufficient security analysis has been done to guarantee this and there isn't a use case that warrants the expense of such an analysis. As a result all tests for __safe_for_unpickling__ or for copy_reg.safe_constructors were removed from the unpickling code.
How to Make Unpickling Safer
To make unpickling saferpickle this can be done by overriding the find_class method. For example:
import sys import pickle import StringIO class SafeUnpickler(pickle.Unpickler): PICKLE_SAFE = { 'copy_reg': set(['_reconstructor']), '__builtin__': set(['object']) } def find_class(self, module, name): if not module in self.PICKLE_SAFE: raise pickle.UnpicklingError( 'Attempting to unpickle unsafe module %s' % module ) __import__(module) mod = sys.modules[module] if not name in self.PICKLE_SAFE[module]: raise pickle.UnpicklingError( 'Attempting to unpickle unsafe class %s' % name ) klass = getattr(mod, name) return klass @classmethod def loads(cls, pickle_string): return cls(StringIO.StringIO(pickle_string)).load() SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os
To extend the PICKLE_SAFE dictionary with your pickle safe classes and modules:
SafeUnpickler.PICKLE_SAFE.update({'__main__': set(['MyClass1', 'MyClass2']), 'MyModule': set(['MyClass3'])})
You need to be really careful with what you include in the PICKLE_SAFE dictionary. The __builtin__ module contains the eval method. Which can be as dangerous as the os.system method.
In cPickle this has to be implemented a bit differently. There is a special attribute called find_global that needs to be set to a function that accepts a module name and a class name, and returns the corresponding class object. cPickle.Unpickler can't be subclassed directly, instead we are going to wrap it in another class:
import sys import cPickle import StringIO class SafeUnpickler(object): PICKLE_SAFE = { 'copy_reg': set(['_reconstructor']), '__builtin__': set(['object']) } @classmethod def find_class(cls, module, name): if not module in cls.PICKLE_SAFE: raise cPickle.UnpicklingError( 'Attempting to unpickle unsafe module %s' % module ) __import__(module) mod = sys.modules[module] if not name in cls.PICKLE_SAFE[module]: raise cPickle.UnpicklingError( 'Attempting to unpickle unsafe class %s' % name ) klass = getattr(mod, name) return klass @classmethod def loads(cls, pickle_string): pickle_obj = cPickle.Unpickler(StringIO.StringIO(pickle_string)) pickle_obj.find_global = cls.find_class return pickle_obj.load() SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os
As you can see, this solution works. But it is hardly practical for many cases. You need to tell pickle what you want in advance and specifically. The moral of the story according to the pickle documentation
You should be really careful about the source of the strings your application unpickles.
Safer Alternatives
Fortunately, there are alternatives to pickle. They may not be as powerful when it comes to serializing python objects and classes. But for most cases all we need to serialize is basic types and simple data structures.
JSON
JSON is a lightweight computer data interchange format. Its human-readable format gives it an advantage over pickle. The json.org website provides a comprehensive listing of existing JSON bindings, including Python. The json module is now a standard part of python since 2.6.
YAML
YAML is a human-readable data serialization format. YAML has additional features lacking in JSON such as extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order. PyYAML is a Python binding for YAML. PyYAML allows sophisticated object instantiation to be executed which opens the potential for an injection attack. According to the PyYAML documentation, you need to use yaml.safe_load function to load data from untrusted sources.
Others
Depending on your application there are many other alternatives like: XML, Protocol Buffers, Thrift...
- 11859 reads



Comments
A drop-in (more) secure pickle replacement
Here's a fast and more secure alternative to Pickle with the same API (in pure python):
http://home.gna.org/oomadness/en/cerealizer/index.html
It's worked well in my projects.
json
Good suggestions, I have been fighting the JSON and YAML battle for them against other binary serilization, XML and old school inis. I hope it wins, JSON like python is a thing of computer science beauty. Simple but makes a new platform to use for syndicating data and content. python and json have in common basic types and they serialize and deserialize to one another very easily with simlejson. Supporting basic types, arrays and objects is all you need...
JSON and other specification limitations, e.g., Unicode
Other systems for serializing and deserializing objects will run afoul of conflicting standards. This causes objects to change during the process. For example, one would expect that "var == json.loads(json.dumps(var))" but this is not the case for strings. Strings, by the json specification, are all converted to Unicode. It is not possible to serialize and read back any structure that has byte strings in Python 2.6 using the JSON module.
Nice info
I wasn't aware of these possible "exploits". Thanks for sharing!
Comparison of data serialization formats
Glad you found the Wikipedia article useful! The syntax table is now much more complete than it was before.
No teme
Hello from Russia!
Can I quote a post in your blog with the link to you?
Sure
Of course you can.
Should be in the docs
First of all, this is a really nice article, thanks for that :)
It's a shame that this security risk is not made explained in detail in the Python documentation. You should propose this to be added! The docs only advice to subclass Unpickler but not why.
Post new comment