Theoretically, it looks simple: define a goal, collect appropriate data, train the model with this data, validate it, and then monitor and measure the model’s quality, making adjustments if necessary. In real life, however, you deal with separate and major problems at every step. Shall we begin? :)
-
Is your defined goal clear enough? What will the model be used for? Is there a previously built model for this? Is this a research and development job, or is it solving a previously unsolved problem? Or is the main goal just keeping up with the AI trend? If it’s just for the trend, definitely don’t develop a model. Because it’s not a goal. There’s not even anything worth doing the project for.
-
I’m assuming your goal is quite reasonable. Now you’re going to collect data. But how much? Is the amount of data you can manually check enough? This would be too small for a real AI model to work, or you’d be checking data for years :) So what do we do then? To minimize the noise (i.e., unwanted elements) in the data you collect, you may need to pass it through a rule set and even keep this rule set a bit too strict. You can use rule-based systems for this. I don’t recommend a when-condition based lisp language or derivative for this. You already have a lot to learn - don’t let your learning curve converge to infinity for no reason :)
-
You’re waiting with your goal and clean data. Now your model architecture needs to be very good. Actually, I need to write a separate article on model architecture because it’s so important… In fact, whether your model works or not and whether it will be fine-tuned later is all hidden here. Still, let’s say you have a good architecture. Or you’re training a model in a proven ready-made architecture. Now we’ve come to what nobody talks about. HARDWARE. Now you’ll say my computer is good. Your computer was not designed to train an AI system. If you attempt such a thing, at best you’ll encounter a frozen computer for 1-2 minutes. When you research, they’ll slowly whisper GPU to you. Yes, GPU and CUDA-enabled at that. Meaning specialized hardware that will perform millions of calculations in milliseconds when you run it. And patience. Even if you have very good hardware, it will take quite a while to put this data of considerable size into training. Sometimes 2-3 days, sometimes more. (Varies depending on the size of your data and hardware)
-
Okay, now I’ve been patient and the hardware is good too. The training process is finished. Let’s see if it works as we wanted? Are the test results too bad? You made a mistake somewhere. Now to debug this error, you need to sit down and address the entire process holistically. Did you find the error? Can you fix it with a small touch? The answer is NO. AI pipeline errors reach a solution by restarting this tedious and patience-testing process. It can be completed with minor adjustments in performance improvement or smaller issues.
Now everyone can tell you about how beautiful and promising the structure of AI systems is. But you can’t realize how much cost, patience, and meticulous work it requires without getting into it.
Final word… If this rose smells good despite all its thorns, smell it… Otherwise, people will get tired of listening to your complaints for a lifetime :)