Neural Chip Design [2/4: Golden Model]
This is a series of articles [overview] outlining the workflow of 15 steps, which I developed over the past few years through building my own DNN accelerator: Kraken [arXiv paper].
Golden Models are essential to hardware (FPGA/ASIC) development. They model the expected behavior of a chip using a high-level language, such that they can be built relatively fast, with almost zero chance of error. The input and expected output test vectors for every RTL module are generated using them, and the simulation output from the testbench is compared against their 'gold standard.'
I first obtain pretrained DNNs from PyTorch / Tensorflow model zoo, analyze them, then load them into the custom DNN inference framework I have built with NumPy stack to ensure I fully understand each operation. I then generate test vectors from those golden models.
Steps:
- PyTorch/TensorFlow: Explore DNN models, quantize & extract weights
- Golden Model in Python (NumPy stack): Custom OOP framework, process the weights, convert to custom datatypes
1. Tensorflow / PyTorch
Tensorflow (Google) and PyTorch (Facebook) are the two competing open source libraries used to build, train, quantize and deploy modern deep neural networks.
Both frameworks provide high-level, user-friendly classes and functions such as Conv2D, model.fit() to build & train networks. Each such high-level API is implemented using their own low-level tensor operations (matmul, einsum), which also can be used by the users. Those operations are implemented using their C++ backend, accelerated by high performant libraries like eigen and CUDA. Once we define the models using Python, the C++ code underneath pulls the load, making them fast as well as user-friendly.
1.1. Download & Explore Pretrained DNN Models
As the first step, I obtained the pretrained models from either Keras.Applications or PyTorch Model Zoo.
1.2. Build Models & Retrain if needed (PyTorch)
PyTorch is more intuitive, pythonic and bliss to work with. I use it to build new models and train them if needed.
1.3. Convert Torch Models to Tensorflow
However, the support for int8 quantization for PyTorch is still experimental. Therefore, for most of my work, I use pretrained models from Tensorflow, whose quantization library (TFLite) is much superior.
Some models, like AlexNet, are not found in Keras.Applications. Therefore, I load them from PyTorch Model Zoo and convert them to ONNX (the common open-source format) and then load them in Tensorflow.
1.4. Quantize Models with TensorFlowLite
Following is an example of loading a float32 model (VGG16) from tensorflow's savedmodel format (1.1), testing it, quantizing it to int8, and testing & saving the quantized network.
import tensorflow as tf
filenames = glob("dataset/*.jpg")
'''
LOAD AND TEST FLOAT32 MODEL
'''
prep_fn = tf.keras.applications.vgg16.preprocess_input
model = tf.keras.models.load_model(f'saved_model/vgg16')
h = model.input_shape[1]
import cv2
from glob import glob
import numpy as np
def representative_data_gen():
for im_path in filenames:
im = cv2.imread(im_path)
im = cv2.resize(im, (h,h))
im = im[None,:,:,::-1]
im = prep_fn(im)
im = tf.convert_to_tensor(im)
yield [im]
images = list(representative_data_gen())
predictions = np.zeros((len(images),), dtype=int)
for i, image in enumerate(images):
output = model(image[0])[0]
predictions[i] = output.numpy().argmax()
print(predictions)
'''
CONVERT AND SAVE INT8 MODEL (STATIC QUANTIZATION)
'''
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_quant = converter.convert()
import pathlib
tflite_model_quant_file = pathlib.Path(f"tflite/vgg16.tflite")
tflite_model_quant_file.write_bytes(tflite_model_quant)
'''
LOAD AND TEST QUANTIZED MODEL
'''
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
images = list(representative_data_gen())
predictions = np.zeros((len(images),), dtype=int)
for i, image in enumerate(images):
image = image[0]
input_scale, input_zero_point = input_details["quantization"]
image = image / input_scale + input_zero_point
test_image = image.numpy().astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
predictions[i] = output.argmax()
print(predictions)
1.5 Explore Model Architecture
Netron is a great tool for opening tensorflow's 32-bit models (savedmodel), tflite's int8 models (tflite), pytorch models (pt), ONNX models, and more, to observe the architecture and tensor names.
2. Golden Model
Python (NumPy stack)
After obtaining the pretrained model, I need to 100% understand what operations are involved and how they are applied as data flows through the network. The best way to do this is to re-do it myself from scratch and obtain exactly the same results.
2.1. Custom Quantization Scheme
2.2. Custom Inference Framework (OOP, Python)
For this, I built a custom framework in Python. It is structured like Keras with the following classes, inheriting as follows:
- MyModel
- MyLayer
- MyConv
- MyLeakyReLU
- MyMaxpool
- MyConcat
- MySpaceToDepth
- MyFlatten
A MyModel object has a list of objects from MyLayer's children's classes. It's constructor extracts weights from tflite and sets them to the layers. A set of images can flow through the layers through a recursive call to the last layer. Following is the stripped-down version of the MyConv implementation.
2.3. Rebuilding the model & Debugging
I then rebuild the model using the above framework, pass data and tweak things until I get the exact same output. That tells me I have understood all the operations going on inside the model.
Once I've understood the model inside-out, I start designing the hardware on the whiteboard.