Timetable Image Recognition and Prompt Improvement

kakao-tech-campus-3rd-step3/Team12_BEUniSchedule backend repository

In UniSchedule, one goal was to reduce the burden of entering university timetables manually. For a schedule management service, if the initial input is too annoying, users can leave before even trying the main feature.

So I considered a feature where the user uploads a timetable image, and the service extracts the course name, day, and time, then converts them into schedule data. The goal was not just to describe the image. The goal was to convert the image into a form that the service could store.

This post is a record of how I used GPT Vision and Structured Output while building the timetable image recognition feature, and how I improved the prompt.

Problem

Timetable images had many different shapes. Each school had a different UI, screenshot sizes varied, and in some cases the text was small or table borders were unclear.

Even when the model could read some course information from the image, the result was not good enough to use directly in the service.

Course names could be cut off or merged.
Days and times could be matched incorrectly.
Natural language descriptions were readable, but hard to store in the database.
I needed a rule for handling missing values.

In other words, the problem was not only reading the image. The result also had to be converted into schedule data.

Approach

I used GPT Vision for image recognition. But if I only asked the model to "read this timetable," the response often included natural language explanations.

The service did not need an explanation. It needed schedule objects. So I clearly specified the output format and made the model return structured results based on fields such as course name, day, start time, and end time.

The points I checked were:

Is each course separated as an independent item?
Are the day and time matched correctly?
Does the model avoid filling missing values on its own?
Is the result in a form that the service can validate and store?

Applying an AI feature was not just about showing the model response as-is. From the backend point of view, the response still had to become input data that the service could handle.

Implementation

When writing the prompt, I focused on preventing the model from guessing values that were not visible.

If some information is not clearly visible in the timetable image, it becomes a problem when the model fills the blank on its own. So I guided the model to leave uncertain values empty or mark them with low confidence.

I also adjusted the output format repeatedly. At first, natural language explanations were mixed into the response. Then the JSON structure was roughly correct, but field names sometimes changed. So I fixed the field names and value formats, and included examples to limit the response shape.

The prompt points I checked were:

Make the model return data rather than explanations.
Prevent it from guessing unknown values.
Match day and time values to the service format.
Limit the result to a form that the backend can validate.

Backend Validation

Even if the AI returned structured results, I did not think the service should store them directly. The backend still had to validate the result once more.

For example, if the start time is later than the end time, the day value is outside the allowed range, or the course name is empty, the service should not store it as-is. AI output is closer to assisted input, and before it becomes service data, it should go through the same validation process as normal user input.

At this point, I felt that AI features are not very different from normal API input handling. Whether the value comes from a user or from a model, once it is stored in the service, it has to pass the domain rules.

Takeaway

The takeaway was simple. AI features can reduce user input burden, but the model should not be trusted with the final responsibility for service data validation.

After this work, when building similar features, I try to check these points first.

Is the model output in a data structure that the service can accept?
Did I prevent the model from guessing values it cannot know?
Can the backend validate the result again?
Is there a flow where the user can manually fix the result when recognition fails?

Timetable image recognition looked like a flashy AI feature at first, but the actual problem was closer to converting unstable image input into schedule data. Once I saw it that way, I had to design not only the prompt, but also the output format, validation, and correction flow together.