1*da0073e9SAndroid Build Coastguard Worker# Import mangling in `torch.package` 2*da0073e9SAndroid Build Coastguard Worker 3*da0073e9SAndroid Build Coastguard Worker## Mangling rules 4*da0073e9SAndroid Build Coastguard WorkerThese are the core invariants; if you are changing mangling code please preserve them. 5*da0073e9SAndroid Build Coastguard Worker 6*da0073e9SAndroid Build Coastguard Worker1. For every module imported by `PackageImporter`, two attributes are mangled: 7*da0073e9SAndroid Build Coastguard Worker - `__module__` 8*da0073e9SAndroid Build Coastguard Worker - `__file__` 9*da0073e9SAndroid Build Coastguard Worker2. Any `__module__` and `__file__` attribute accessed inside 10*da0073e9SAndroid Build Coastguard Worker `Package{Ex|Im}porter` should be demangled immediately. 11*da0073e9SAndroid Build Coastguard Worker3. No mangled names should be serialized by `PackageExporter`. 12*da0073e9SAndroid Build Coastguard Worker 13*da0073e9SAndroid Build Coastguard Worker## Why do we mangle imported names? 14*da0073e9SAndroid Build Coastguard WorkerTo avoid accidental name collisions with modules in `sys.modules`. Consider the following: 15*da0073e9SAndroid Build Coastguard Worker 16*da0073e9SAndroid Build Coastguard Worker from torchvision.models import resnet18 17*da0073e9SAndroid Build Coastguard Worker local_resnet18 = resnet18() 18*da0073e9SAndroid Build Coastguard Worker 19*da0073e9SAndroid Build Coastguard Worker # a loaded resnet18, potentially with a different implementation than the local one! 20*da0073e9SAndroid Build Coastguard Worker i = torch.PackageImporter('my_resnet_18.pt') 21*da0073e9SAndroid Build Coastguard Worker loaded_resnet18 = i.load_pickle('model', 'model.pkl') 22*da0073e9SAndroid Build Coastguard Worker 23*da0073e9SAndroid Build Coastguard Worker print(type(local_resnet18).__module__) # 'torchvision.models.resnet18' 24*da0073e9SAndroid Build Coastguard Worker print(type(loaded_resnet18).__module__) # ALSO 'torchvision.models.resnet18' 25*da0073e9SAndroid Build Coastguard Worker 26*da0073e9SAndroid Build Coastguard WorkerThese two model types have the same originating `__module__` name set. 27*da0073e9SAndroid Build Coastguard WorkerWhile this isn't facially incorrect, there are a number of places in 28*da0073e9SAndroid Build Coastguard Worker`cpython` and elsewhere that assume you can take any module name, look it 29*da0073e9SAndroid Build Coastguard Workerup `sys.modules`, and get the right module back, including: 30*da0073e9SAndroid Build Coastguard Worker- [`import_from`](https://github.com/python/cpython/blob/5977a7989d49c3e095c7659a58267d87a17b12b1/Python/ceval.c#L5500) 31*da0073e9SAndroid Build Coastguard Worker- `inspect`: used in TorchScript to retrieve source code to compile 32*da0073e9SAndroid Build Coastguard Worker- …probably more that we don't know about. 33*da0073e9SAndroid Build Coastguard Worker 34*da0073e9SAndroid Build Coastguard WorkerIn these cases, we may silently pick up the wrong module for `loaded_resnet18` 35*da0073e9SAndroid Build Coastguard Workerand e.g. TorchScript the wrong source code for our model. 36*da0073e9SAndroid Build Coastguard Worker 37*da0073e9SAndroid Build Coastguard Worker## How names are mangled 38*da0073e9SAndroid Build Coastguard WorkerOn import, all modules produced by a given `PackageImporter` are given a 39*da0073e9SAndroid Build Coastguard Workernew top-level module as their parent. This is called the `mangle parent`. For example: 40*da0073e9SAndroid Build Coastguard Worker 41*da0073e9SAndroid Build Coastguard Worker torchvision.models.resnet18 42*da0073e9SAndroid Build Coastguard Worker 43*da0073e9SAndroid Build Coastguard Workerbecomes 44*da0073e9SAndroid Build Coastguard Worker 45*da0073e9SAndroid Build Coastguard Worker <torch_package_0>.torchvision.models.resnet18 46*da0073e9SAndroid Build Coastguard Worker 47*da0073e9SAndroid Build Coastguard WorkerThe mangle parent is made unique to a given `PackageImporter` instance by 48*da0073e9SAndroid Build Coastguard Workerbumping a process-global `mangle_index`, i.e. `<torch__package{mangle_index}>`. 49*da0073e9SAndroid Build Coastguard Worker 50*da0073e9SAndroid Build Coastguard WorkerThe mangle parent intentionally uses angle brackets (`<` and `>`) to make it 51*da0073e9SAndroid Build Coastguard Workervery unlikely that mangled names will collide with any "real" user module. 52*da0073e9SAndroid Build Coastguard Worker 53*da0073e9SAndroid Build Coastguard WorkerAn imported module's `__file__` attribute is mangled in the same way, so: 54*da0073e9SAndroid Build Coastguard Worker 55*da0073e9SAndroid Build Coastguard Worker torchvision/modules/resnet18.py 56*da0073e9SAndroid Build Coastguard Worker 57*da0073e9SAndroid Build Coastguard Workerbecomes 58*da0073e9SAndroid Build Coastguard Worker 59*da0073e9SAndroid Build Coastguard Worker <torch_package_0>.torchvision/modules/resnet18.py 60*da0073e9SAndroid Build Coastguard Worker 61*da0073e9SAndroid Build Coastguard WorkerSimilarly, the use of angle brackets makes it very unlikely that such a name 62*da0073e9SAndroid Build Coastguard Workerwill exist in the user's file system. 63*da0073e9SAndroid Build Coastguard Worker 64*da0073e9SAndroid Build Coastguard Worker## Don't serialize mangled names 65*da0073e9SAndroid Build Coastguard WorkerMangling happens `on import`, and the results are never saved into a package. 66*da0073e9SAndroid Build Coastguard WorkerAssigning mangle parents on import means that we can enforce that mangle 67*da0073e9SAndroid Build Coastguard Workerparents are unique within the environment doing the importing. 68*da0073e9SAndroid Build Coastguard Worker 69*da0073e9SAndroid Build Coastguard WorkerIt also allows us to avoid serializing (and maintaining backward 70*da0073e9SAndroid Build Coastguard Workercompatibility for) this detail. 71