#The problem we were trying to solve

The goal of this project was not to train a deep neural network from scratch. It was more practical than that.

We wanted a web application where users could look at handwritten digit images and submit what they think each number is. Then the system would aggregate those responses and show statistics. In other words, we built a lightweight human labeling workflow around MNIST-like images.

That sounds simple, but it touches a lot of real software concerns:

Frontend usability
Data collection quality
Backend storage design
Aggregation and reporting

If people cannot use your UI comfortably, your data quality drops. If your backend is messy, your statistics become unreliable. This project made that very clear.

#What I built

The repository includes planning documents (Functional specification, Requirements specification, System plan) plus implementation files and database assets. The implementation stack was:

HTML and CSS for structure and layout
Bootstrap for quicker UI components
JavaScript for interactions
PHP for backend handling
MySQL for storing user submissions and reporting stats

The app concept had three main areas:

A home/input area where users classify digits.
A statistics page that shows aggregated results.
Supporting project pages (team/about and documentation-style content).

#Why this project mattered for me

At that stage, I already had basic coding experience, but this project forced me to work across the full request lifecycle.

A user clicks a number. That input gets validated. The backend receives it, stores it, and updates statistics logic. Then another page renders meaningful percentages from accumulated data.

Going through this loop taught me a core lesson: features are easy to describe, but reliable data flows are harder to implement.

#How the data model thinking changed my approach

The project required tracking repeated inputs and extracting patterns. We had to think in terms of:

Which image was shown
Which label the user selected
How often each label was chosen
What percentage each label represents for a given image

Even without advanced machine learning, this is already a useful analytics pipeline. It creates feedback signals that can later guide model tuning, dataset cleanup, or UI adjustments.

Working on this gave me a stronger instinct for database-first design. Before writing too much UI code, I learned to ask:

What exactly are we storing?
How will we query it later?
What report do we need to generate from it?

#Challenges that were real (not tutorial-level)

#1. Keeping the input flow simple

The audience was mixed. Some users only needed to submit quick guesses. Others wanted to inspect stats. The interface had to support both without confusion.

#2. Balancing speed vs clarity

If you ask users to click through many digit images, small friction adds up fast. Button placement, visual hierarchy, and feedback messages all mattered more than expected.

#3. Making stats understandable

Raw counts are not enough. Percentages and clear labels were necessary so users could actually interpret outcomes.

#4. Team communication through docs

This repository has several planning and specification files. Writing those helped us align scope and avoid random feature drift.

#What I learned technically

This project improved my skills in:

Designing a PHP + MySQL backend for structured data capture
Building query-driven statistics pages
Connecting frontend interaction with backend persistence cleanly
Documenting requirements before implementation
Thinking about data quality as a product concern, not only an ML concern

I also learned that "accuracy" discussions need context. If humans provide noisy labels, the right question is not "Is this perfect?" but "How can we make the collection process more reliable over time?"

#What I would improve now

If I rebuilt this project today, I would add:

User/session tracking with clearer consent and privacy notes
Better validation and anti-spam controls
More detailed analytics views (per-image confusion distribution)
Exportable datasets for external model training workflows
Automated tests for backend statistics calculations

I would also include a small dashboard with trend lines over time, not just static percentages.

#SEO value and relevance

For anyone researching a PHP MNIST project or a digit classification web app with MySQL, this case study is useful because it focuses on the product side of data collection.

Most MNIST examples online jump directly to model training notebooks. This project covers an earlier but important stage: building the system that captures and organizes human-labeled inputs in a usable way.

That stage is often ignored, but in real projects it is where many quality issues begin.

#Repository

If you want to review the full project artifacts and source files:

GitHub repository