Heard a developer at a coffee shop in Austin say they trained a model on just 100 images

I was grabbing coffee yesterday and overheard two developers talking at the next table. One said they got a decent image recognition model working with a training set of only 100 pictures (specifically for identifying manufacturing defects). That's way smaller than I thought was possible even a year ago. It makes me think these tools are getting efficient enough for tiny companies to use now, not just the big tech firms. But it also means a small, biased dataset could cause real problems faster if people aren't careful. Has anyone else run into these new 'small data' training methods and seen how they perform?

4 comments

4 Comments

hannah_fisher582mo ago

That mop bucket story from @sandra_lane6 is a perfect example. It makes you wonder what else these quick models get wrong when the data is too narrow. Like, if you train it on 100 pictures of defects under bright white light, what happens on the factory floor with yellow lights or shadows? Does it start calling normal parts broken, or miss real problems? The speed is cool but the blind spots seem huge.

robert_ross951mo ago

But isn't the whole point to start somewhere and get better? If we waited for perfect data that covered every single light and shadow, we'd never use any new tool. You find the blind spots, like the mop bucket or the yellow light, and you fix them. That's just how this stuff works, right? It gets smarter by being wrong sometimes.

hannah_fisher582mo ago

Wow, that's wild but also kinda scary. I saw a video about these new tricks where they use a big pre-trained model and then just fine-tune it with a tiny set of pictures. It works way better than starting from zero. But you're dead on about the bias problem. If your 100 pictures are all from one factory or one type of light, the model will be totally useless anywhere else. People will trust it because it's "AI" and not check the work.

sandra_lane62mo ago

Totally get what you mean about people trusting it too much. I tried one of those photo sorting apps for my business and it kept calling my mop bucket a plant vase. The training pictures must have all been in fancy homes with weird decor. You have to double check everything it does, which sort of defeats the point of saving time, right?