Jorge | FalaGym - Devlog 005

Here we are again, another update on the progress with FalaGym.

One meta-reflection; it has been very good to have this project as an outlet where I can code in a different context than what I usually do at work. It has also proven to be an excellent test-bed for new patterns and to test new things.

Since the last update I have done a couple of updates on the app. Namely:

Styling and responsive layout updates.
Refactor a few utilities to make them more re-usable
New feature: sentence listening! 🗣️

Styling, responsive layout and refactoring

From the start I only built the web version of the design, meaning that it had not optimization at all for mobile users. And it surprised no one that my first 2 users mostly use it on the phone. So it became important to make that experience better.

The changes here were mainly CSS related but most importantly, now I shifted the paradigm such that my starting point has become the mobile design. This poses a few interesting but mostly positive contraints.

The work on the design also surfaced some new challenges; namely how to display the definitions of all the words and at the same time avoid cluttering the ui.

For that I decided to take my time and have some fun coding an accordion component, with some nice transitions. It turned out quite satisfying actually.

Also took the time to refactor a few utilities I've created for parsing dates, formatting things etc such that I can re-use them everywhere. #chore.

Audio generation

Now to the meat of the updates.

After a day or two of use my mom wrote to me that she'd really like to listen to the sentences. She is quite new on her learning journey and listening to the pronounciation would be very useful for her.

I thought that that could be fun and it could be a good opportuninty for me to dive a little into speech to text which I hadn't tried up to now.

A little context, for all the interactions with AI in the app, I am using the AI SDK by vercel.

However it did get a little trickier for this feature given that the text to speech support was not stable yet. And while I wanted to use the gpt-4o-mini-tts model, but not have to add the openAI library to my stack I had to do some more research.

It turned out that the support for that is in beta on the newer versions which led me to update the package and all the other ones. Additionally one has to import it through import { experimental_generateSpeech as generateSpeech } from "ai";

The good news is that it has worked very well so far. The audio quality has been excellent as well as the quality of the pronounciation itself in the different languages I support up to now.

The basic architecure and design is:

I generate a sentence for a given language
Request the audio generation in an async job (event driven)
Store the file on a cloud bucket
When the job concludes I update the database to include the url for the audio
When displaying the sentence in the app I show a button that allows the user to play it

Of course this led to updates on the queries, adding a media player component and so on, but that was rather trivial.

For the next sessions I intend to implement ways to keep track of each user's progress and to use that progress to determine the complexity of the generated sessions. The idea is to progressively increase the difficulty such that it matches the user's ability.

I have some ideas on how to bring this one to bear and excited to see if it will work out.