You could use the pickle
module in the standard library.
Here's an elementary application of it to your example:
import pickle
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
with open('company_data.pkl', 'wb') as output:
company1 = Company('banana', 40)
pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)
company2 = Company('spam', 42)
pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)
del company1
del company2
with open('company_data.pkl', 'rb') as input:
company1 = pickle.load(input)
print(company1.name) # -> banana
print(company1.value) # -> 40
company2 = pickle.load(input)
print(company2.name) # -> spam
print(company2.value) # -> 42
You could also write a simple utility like the following which opens a file and writes a single object to it:
def save_object(obj, filename):
with open(filename, 'wb') as output: # Overwrites any existing file.
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
# sample usage
save_object(company1, 'company1.pkl')
Since this is such a popular answer, I'd like touch on a few slightly advanced usage topics.
cPickle
(or _pickle
) vs pickle
It's almost always preferable to actually use the cPickle
module rather than pickle
because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they're equivalent and the C version will provide greatly superior performance. Switching to it couldn't be easier, just change the import
statement to this:
import cPickle as pickle
In Python 3, cPickle
was renamed _pickle
, but doing this is no longer necessary since the pickle
module now does it automatically—see What difference between pickle and _pickle in python 3?.
The rundown is you could use something like the following to ensure that your code will always use the C version when it's available in both Python 2 and 3:
try:
import cPickle as pickle
except ModuleNotFoundError:
import pickle
pickle
can read and write files in several different, Python-specific, formats, called protocols. "Protocol version 0" is ASCII and therefore "human-readable". Versions > 1 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0
, but in Python 3.6, it's Protocol version 3
. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL
added to it, but that doesn't exist in Python 2.
Fortunately there's shorthand for writing pickle.HIGHEST_PROTOCOL
in every call (assuming that's what you want, and you usually do)—just use the literal number -1
.
So, instead of writing:
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
You can just write:
pickle.dump(obj, output, -1)
Either way, you'd only have specify the protocol once if you created a Pickler
object for use in multiple pickle operations:
pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
etc...
While a pickle file can contain any number of pickled objects, as shown in the above samples, when there's an unknown number of them, it's often easier to store them all in some sort of variably-sized container, like a list
, tuple
, or dict
and write them all to the file in a single call:
tech_companies = [
Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')
and restore the list and everything in it later with:
with open('tech_companies.pkl', 'rb') as input:
tech_companies = pickle.load(input)
The major advantage is you don't need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt's answer the best. Here's it adapted to the examples here:
class Company:
def __init__(self, name, value):
self.name = name
self.value = value
def pickled_items(filename):
""" Unpickle a file of pickled data. """
with open(filename, "rb") as f:
while True:
try:
yield pickle.load(f)
except EOFError:
break
print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
print(' name: {}, value: {}'.format(company.name, company.value))
2018年2月4日 ... num = pickle.DEFAULT_PROTOCOL # 默认的酸洗协议(3) {3:明确支持bytes对象; 4:更多种类的对象和数据格式优化的支持}. # --- 序列化---.
The pickle and marshal modules can turn many Python data types into a stream of bytes and then recreate the objects from the bytes. The various DBM-related ...
目的: shelve 模块实现了对任意可被序列化的Python 对象进行持久性存储,也提供类字典API 给我们使用。 当使用关系数据库是一种浪费的时候,shelve 模块可以为Python ...
2017年5月2日 ... Python3 Python对象持久化(pickle / shelve) ... #coding=utf-8 # pickledemo.py Pickle # 用于对Python对象进行序列化和反序列化的二进制协议 import ...
The shelve module in Python's standard library provides simple yet effective object persistence mechanism. The shelf object defined in this module is dictionary ...
2017年3月17日 ... 默认情况下Python 2.x中pickled后的数据是字符串形式,需要将它转换为字节对象才能被Python 3.x中的pickle.loads()反序列化;Python 3.x中pickling所使用 ...
shelve — Python object persistence¶ ... A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!)
In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn't exist in Python 2. Fortunately there's shorthand for writing ...
Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent ...
The Python pickle Module: How to Persist Objects in Python ... <module 'dill' from '/usr/local/lib/python3.7/site-packages/dill/__init__.py'>), ('square', ...
The Persistent Storage module minimizes database activity by caching retrieved objects and by saving objects only after their attributes change. To relieve code ...
Translucent persistent objects. ... It was broken on Python 3 prior to 3.7.4. ... Fix the hashcode of C TimeStamp objects on 64-bit Python 3 on Windows.
This package contains a generic persistence implementation for Python. It forms the core protocol for making objects interact “transparently” with a ...
dump and load functions also accept file-like object instead of filenames. More information on data persistence with Joblib is available here.