I hope my post finds the community members safe and healthy. I also wish them and their loved ones a happy, safe, and healthy New Year.
This is my first project in the field of AI (or deep learning), and I apologise if the question isn’t architectured correctly. I am happy to take feedback and explore additional topics.
The following are my goals:
Train a custom model that detects and reads the license plates of vehicles in India.
I found previous attempts to infer numberplates of Indian vehicles, but the photo quality is excellent and doesn’t represent that of a dashcam. Hence, is my attempt possible?
What is the optimal framework? I see that it supports (TensorFlow, TensorFlow Lite, Keras, PyTorch & ONNX)
I am using OpenCV to extract the frames. Does this impact the use of alternative models for training?
How do train a model on the Raspberry Pi (v5 - 8 GB version) along with the Hailo 8 accelerator (26 TOPS)
I transferred videos from the dashcam to the Raspberry Pi and used OpenCV to extract frames. For a single minute of video that is around 200 MB in size, the extracted frames are ~2 GB.
How do I build the data pipeline?
Thank you very much, and I apologise if my questions seem basic (or spoon-feeding). I have attempted to check research, but the plethora of approaches is why I am requesting assistance from the community to optimise my efforts.
Inferring number plates is like measuring any other data. The results and accuracy you get will depend on quality of the input data, the pre-processing, the CNN network and amount of training.
The input format for trained networks to be converted in the Hailo Executable Format (HEF) are ONNX and TFLite. You can use the framework of your choice for the training.
The model does not care where the data is coming from. If the images for training and inference are coming from the same source the accuracy will be higher.
The training is not part of our development flow. We do provide retraining dockers for some models, but generally we start from a trained model. The training and conversion is done on a development machine not the Raspberry Pi.
There are many examples available please have a look at our repositories.
Just to elaborate a bit more along the comments of klausk. I will try to keep the language simple.
If you insist on training your own model on Raspberry Pi, it is possible. You may use, for example, PyTorch other frameworks or even your own C/C++ codes to do this. But, in order to convert the trained model (see it as a kind of graph carrying floating point numbers, weights, biases, etc ) to the HAILO file that runs on your HAILO chip, you need a (data flow) compiler which is a key element/tool in the development flow. So far, this compiler is designed for AMD64, yet the Raspberry Pi is ARM64-based.
The solution is actually rather simple. You very likely own a laptop and that laptop is probably AMD64 based, i.e. comes with an Intel or AMD CPU. This compiler also requires a reasonable amount of RAM. Check on the HAILO Dataflow Compiler User Manual if your CPU and RAM are powerful enough. Actually the requirement is not demanding and most CPUs on modern day gaming laptops are usually sufficient. (In case you are curious, the dataflow compiler needs some instruction sets to be available.) Now, this laptop is your “development machine”, i.e. dev machine.
On this dev machine, you develop your model, train it, perfect it, convert it into file that can run on HAILO chip. Then you move this model to the HAILO-boosted Raspberry Pi and run the inferencing pipeline.
A very happy New Year to the community members and their loved ones.
I am sincerely thankful for the detailed guidance both of you have provided, especially regarding what may seem like spoon feeding.
The first step would be to get a real world dataset, by which i mean the type of data that the model will be required to infer instead of what may be ideal data. Is this assertion correct?
Hence, instead of getting clear, high quality number plates, I want to extract frames from my dashcam footage and tag the numberplates for input data. I grasp the downsides, such as the model may be my dashcam specific, and the model may underperform if there’s any change (firmware, setting or hardware) of the dashcam.
Is this an ideal approach, or is my knowledge lacking here?
Based on the community’s guidance, I will write the next steps.
I reckon the thread could be a starting point for RPi enthusiasts who, as Jeff Geerling (Thank you for his efforts) may say, “get on the AI hypetrain”
There are three parts and part three uses Hailo’s latest releases to date. (jan of 25). I’m very grateful for them. He also has some other really useful stuff on his channel. Good Luck!
Thank you very much. I looked at the video, and it helps with the technical aspects. However, I feel I am missing the architectural knowledge.
A way I can articulate this is:
Konwing the downside (I grasp the downsides, such as the model may be my dashcam specific, and the model may underperform if there’s any change (firmware, setting or hardware) of the dashcam), Should I still focus on getting good quality images or real world images which I expect the model to infer?
If the answer to the first one is to continue with real world images, I reckon I would have to extract images from the video files in my dashcam. What is the recommended framework?