Analysing posts from student portfolios using ChatGPT

Getting a glimpse of what’s going on in the hundreds of student portfolios is impossible by hand. In the past few months, I’ve occasionally worked on the Analytics section of KISK’s Scrapbook which would give teachers and students a better intuition of what KISK’s students post on their portfolios. I decided to use OpenAI API for the analysis and I would like to share some findings about the process.

Scrapbook is a daily diary of what students learn and do daily, inspired by Hack Club’s own version. As opposed to its ancestor, our Scrapbook works as an aggregator, so students can run portfolios on their own web pages. This gives more freedom to students, but unfortunately, it increases the overall complexity of the whole portfolio ecosystem at our department.

The first analysis by Illyria Brejcha based on existing metadata gives us a solid understanding of where Scrapbook is as the platform that informed our next steps, which is mainly about lowering the barrier of working with student portfolios for teachers. Scrapbook Analytics as I made it provides observability over student portfolios from a single place, saving teachers extra time navigating between different portfolios. Students can use the application as well to find the next course they wanna enrol in, lurk on their friend's posts or seek inspiration for their next work.

Scrapbook Analytics

Scrapbook Analytics

To build a search feature for students and analytical utilities for educators, we needed to gather more extensive data. In this experiment, I collected more data about the posts, most importantly their full text which I automatically fed into ChatGPT or, OpenAI API respectfully. I planned to use some open machine-learning models before the experiment. As we already use OpenAI API in some parts of Scrapbook, I naturally tried to send a few prompts for the analysis of posts as well. OpenAI models are generally expensive for any task on huge datasets but I decided we could use this model, as we collected just a few thousand posts in the platform and a simple calculation based on average post size and number of posts indicated the cost would be small for such a task.

Getting structured deterministic responses

Knowing that ChatGPT and GitHub Copilot are quite good at programming tasks, I tried to force the model to respond in a structured manner that I could later parse and feedback to the application. The simplest way to describe the desired response was using typescript which I already know ChatGPT is good at. For those who don’t know TypeScript, imagine explicitly stating to the model, what are the exact possible content types or post categories of the student text, so the model has restricted options to choose from, e.g. postType: “essay” | “review”[]; says the post type can be either essay, review, both or none of them. In the actual prompt, we can even save some money for the prompt length, and remove some of the special characters, and the model will still understand what we want from it.

Running a simple prompt with a desired response type and URL of the post worked better than I expected with the GPT-4 of the paid pro version. Later on, I stopped paying for the pro version of the product and wanted to use a cheaper model instead. I choose the GPT-3.5-Turbo model with 16k context window (the student’s posts can be very long) that is priced at $0.003 per 1K input tokens and $0.004 per 1K output tokens.

Using the simple TypeScript-based prompt on more examples with this model showed it does not generalize well. I started to add comments above each type to help the model decide what key to use in the classification. I then generalized the idea of using TypeScript typings for the response into a generator code, which I feed with a desired schema based on the application which gets converted into prompt text. The commented typing may look like this:

// select one or more types that match the content of the post
postType: “essay” | “review”[];

Wanting the model to respond in a structured manner, meant wanting less creativity for the model since we want to be sure the result is valid with respect to the desired schema. Overall, we want the model to be as deterministic as possible, so tuning the p-value in the API request which stands for model temperature or basically creativity. Higher p values are more suitable for writing tasks where we want the model to respond with diverse texts and ideas, which is not our case. However, we don't want the responses to be completely conservative either, so I choose p=0.2 as a good practice in our case.

The prompt, many iterations later

Not having examples of posts whose classification I knew in advance proved to be non-productive, so we then refined the schema and prompt together with Illyria, whose posts were present in the dataset. Looking back, I think it took about 70 small iterations to get the ‘final’ prompt which is used in the app.

Running the prompt over 3200 posts costs $22.11, which is $0,007 per post. Keep in mind that we sent a full text of each post into the model. Right now, I have not developed any automated evaluation of the analysis, so we rely on our subjective looks on a few post examples. There are many ways in which we could make the model cheaper e.g. send smartly truncated text to the model, remove unnecessary characters, use the conversational API to save system prompt characters… I will post more updates on this later.

Below is the last version of the system prompt, to which we then send the text of the student’s post.

You are an assistant that classifies university student blog posts based on its content. You can choose multiple classes, if suitable. If you dont have enough information to provide any classification, return empty object. Provide your answer in a json object with the following typescript type:
{
contentTypes: (thesis | end-of-study-reflection | content-review | research-result | opinionated-essay | course-reflection | tutorial | interview | creative | infographics | podcast | about-me | study-abroad | internship)[];
// select one or more categories that best match the topic of the text
categories: (design | analytics | edtech | librarianship)[];
// write a one sentence descriptive summary of the text in the Czech language
description: string;
// a short i18n code representing the primary language of the document
dominantLanguage: string;
// select one or more tones from the following array that match the post author’s tone in the text.
tones: (formal | informal)[];
// one or more tags to describe the content of the text, the tags must be in czech except for names and words that are generally used without translation eg. transition design
tags: string[];
}

To decide categories, use the following rules:
design: posts about user needs, service design, design methods, UX, UI;
analytics: posts about data analytics, information management, databases, long term preservation, computer science, visualisations;
edtech: posts about technology in education, AI, online courses, life long learning, andragogics, pedagogy;
librarianship: posts about library management, literature, reading, books;

To decide contentTypes use the following rules:
about-me: the main topic of this post are the author's long term goals, interests, work experience, motivation for choosing the study programme and future plans;
thesis: posts about a student's bachelor's or master's thesis and progress on it;
end-of-study-reflection: wraping up the whole study before the state exam (státnice/státní závěrečná zkouška/obhajoba/SZZ);
research-result: a research paper or results of applied research;
opinionated-essay: an essay or a humanities academic text stating an opinion on a topic while citing sources;
content-review: a review of a book, movie, podcast, event, academic article or other content;
course-reflection: a post reflecting one or more courses, what the student learned, what they liked, what they didn't like, what they would change;
tutorial: a post containing instructions or a guide, in steps;
interview: a post containing an interview with someone;
infographics: a short text refering to an image with data visualisation;
creative: a fictional text with artistic intent;
podcast: a short text linking to a podcast episode or sound recording;
study-abroad: a post about a study abroad experience (erasmus, freemover, etc.);
internship: a post about an internship experience (working in a company, library, design firm, agency, or NGOs);