#The problem we were trying to solve
The goal of this project was not to train a deep neural network from scratch. It was more practical than that.
We wanted a web application where users could look at handwritten digit images and submit what they think each number is. Then the system would aggregate those responses and show statistics. In other words, we built a lightweight human labeling workflow around MNIST-like images.
That sounds simple, but it touches a lot of real software concerns:
- Frontend usability
- Data collection quality
- Backend storage design
- Aggregation and reporting
If people cannot use your UI comfortably, your data quality drops. If your backend is messy, your statistics become unreliable. This project made that very clear.
#What I built
The repository includes planning documents (Functional specification, Requirements specification, System plan) plus implementation files and database assets. The implementation stack was:
- HTML and CSS for structure and layout
- Bootstrap for quicker UI components
- JavaScript for interactions
- PHP for backend handling
- MySQL for storing user submissions and reporting stats
The app concept had three main areas:
- A home/input area where users classify digits.
- A statistics page that shows aggregated results.
- Supporting project pages (team/about and documentation-style content).
#Why this project mattered for me
At that stage, I already had basic coding experience, but this project forced me to work across the full request lifecycle.
A user clicks a number. That input gets validated. The backend receives it, stores it, and updates statistics logic. Then another page renders meaningful percentages from accumulated data.
Going through this loop taught me a core lesson: features are easy to describe, but reliable data flows are harder to implement.
#How the data model thinking changed my approach
The project required tracking repeated inputs and extracting patterns. We had to think in terms of:
- Which image was shown
- Which label the user selected
- How often each label was chosen
- What percentage each label represents for a given image
Even without advanced machine learning, this is already a useful analytics pipeline. It creates feedback signals that can later guide model tuning, dataset cleanup, or UI adjustments.
Working on this gave me a stronger instinct for database-first design. Before writing too much UI code, I learned to ask:
- What exactly are we storing?
- How will we query it later?
- What report do we need to generate from it?
#Challenges that were real (not tutorial-level)
#1. Keeping the input flow simple
The audience was mixed. Some users only needed to submit quick guesses. Others wanted to inspect stats. The interface had to support both without confusion.
#2. Balancing speed vs clarity
If you ask users to click through many digit images, small friction adds up fast. Button placement, visual hierarchy, and feedback messages all mattered more than expected.
#3. Making stats understandable
Raw counts are not enough. Percentages and clear labels were necessary so users could actually interpret outcomes.
#4. Team communication through docs
This repository has several planning and specification files. Writing those helped us align scope and avoid random feature drift.
#What I learned technically
This project improved my skills in:
- Designing a PHP + MySQL backend for structured data capture
- Building query-driven statistics pages
- Connecting frontend interaction with backend persistence cleanly
- Documenting requirements before implementation
- Thinking about data quality as a product concern, not only an ML concern
I also learned that "accuracy" discussions need context. If humans provide noisy labels, the right question is not "Is this perfect?" but "How can we make the collection process more reliable over time?"
#What I would improve now
If I rebuilt this project today, I would add:
- User/session tracking with clearer consent and privacy notes
- Better validation and anti-spam controls
- More detailed analytics views (per-image confusion distribution)
- Exportable datasets for external model training workflows
- Automated tests for backend statistics calculations
I would also include a small dashboard with trend lines over time, not just static percentages.
#SEO value and relevance
For anyone researching a PHP MNIST project or a digit classification web app with MySQL, this case study is useful because it focuses on the product side of data collection.
Most MNIST examples online jump directly to model training notebooks. This project covers an earlier but important stage: building the system that captures and organizes human-labeled inputs in a usable way.
That stage is often ignored, but in real projects it is where many quality issues begin.
#Repository
If you want to review the full project artifacts and source files: