TorchVision has a new backwards compatible API for building models with multi-weight support. The new API allows loading different pre-trained weights on the same model variant, keeps track of vital meta-data such as the classification labels and includes the preprocessing transforms necessary for using the models.
PyTorch is an open-source machine learning framework that accelerates the path from research prototyping to production deployment and is based on the Torch library, used for applications such as computer vision and natural language processing. It was primarily developed by Facebook's AI Research lab (now Meta) and is free and open-source software released under the Modified BSD license. Machine Learning specialsts tend to prefer PyTorch from Meta to TensorFlow from Google. For example, Tesla's Autopilot and Uber's Pyro were developed using PyTorch.
TorchVision currently provides pre-trained models which could be a starting point for transfer learning or used as-is in Computer Vision applications. However, it has limitatations such as inability to support multiple pre-trained weights, missing inference/preprocessing transforms and lack of meta-data. The new API addresses the above limitations and reduces the amount of boilerplate code needed for standard tasks.
New features in the new API are shown below:
Multi-weight support
At the heart of the new API, we have the ability to define multiple different weights for the same model variant. Each model building method (eg resnet50) has an associated Enum class (eg ResNet50_Weights) which has as many entries as the number of pre-trained weights available. Additionally, each Enum class has a DEFAULT alias which points to the best available weights for the specific model. This allows the users who want to always use the best available weights to do so without modifying their code.
Associated meta-data & preprocessing transforms
The weights of each model are associated with meta-data. The type of information we store depends on the task of the model (Classification, Detection, Segmentation etc). Typical information includes a link to the training recipe, the interpolation mode, information such as the categories and validation metrics. Additionally, each weights entry is associated with the necessary preprocessing transforms. All current preprocessing transforms are JIT-scriptable and can be accessed via the transforms attribute. Prior using them with the data, the transforms need to be initialized/constructed. This lazy initialization scheme is done to ensure the solution is memory efficient. The input of the transforms can be either a PIL.Image or a Tensor read using torchvision.io.
Get weights by name
The ability to link directly the weights with their properties (meta data, preprocessing callables etc) is the reason why our implementation uses Enums instead of Strings. Nevertheless for cases when only the name of the weights is available it offers a method capable of linking Weight names to their Enums.
PyTorch is an open-source machine learning library used for building and training neural networks. It was developed primarily by Facebook's artificial intelligence research group, and is one of the most widely used machine learning frameworks in the world.
PyTorch allows developers to build and train neural networks using dynamic computation graphs. This means that the structure of the neural network can change during runtime, making it easier to build more complex models.
PyTorch also provides a wide range of pre-built neural network layers and functions, as well as tools for data loading and transformation, optimization, and visualization. It also includes tools for distributed training, allowing for the training of large neural networks across multiple devices.
PyTorch is written in Python, but also includes support for other programming languages such as C++ and Java. Additionally, it has a large and active community of developers, making it easy to find tutorials, sample code, and other resources to help you get started with using PyTorch.
PyTorch is an open-source machine learning framework that is primarily used for building and training neural networks. It was developed by Facebook's AI research team and is widely used in the research community and in industry.
PyTorch provides a wide range of tools and functionality for developing and training machine learning models. It offers a dynamic computational graph, which allows for flexible model building and debugging, as well as efficient memory usage. It also provides automatic differentiation, which makes it easy to compute gradients for optimizing model parameters.
PyTorch has a large and active community of users, which has contributed to the development of many third-party libraries and extensions. This includes PyTorch Lightning, which provides a high-level interface for building complex machine learning models, and PyTorch Geometric, which provides tools for working with graphs and other structured data.
Overall, PyTorch is a powerful and flexible machine learning framework that can be used for a wide range of tasks, including computer vision, natural language processing, and reinforcement learning.