How I created Voicera? (A Voice-as-a-Service App)

I’m an avid reader. From books to blogs and articles. I like to read. A lot. I also like podcasts. While on the commute or doing something that doesn’t require my intellectual prowess. So, when I can’t read, I listen and vice versa. I believe it’s a good way to spend time while learning new things from intellectuals around the world, on different topics. However, I’m sure we all have come across a situation where we have found a great article, but we don’t have time to read it. So, we bookmark it for later reading. Little did we know, after a certain amount of time it gets stockpiled like a stack of papers in an attorney’s room. It’s frustrating, isn’t it? What if there was an option to listen to the article on the website where you found that article? How great it would be?

Being a curious learner, I thought I should apply my full-stack skills to this project. That’s why I created Voicera. A voice-as-a-service platform that allows bloggers and media-houses to enable voice dictation to their articles along with their content with a simple click. Now, reach your busy readers via interactive blogs that can speak and increase your retention. It’s extremely simple to use with one just click.

This is post is about the journey, primarily the technical journey, I took to build this platform from scratch, decisions I made, problems I faced, etc. while building the platform.

The Voicera platform consists of 5 modules or should I say “Microservices” (if I could sound like an expert). They have all been built in isolation from each other and yet are tightly connected through APIs. One thing that is common among all these modules is that they are built on Typescript. Why you ask? I think it’s a no-brainer to use TS for any kind of project, especially not-so-small ones. Static-typing coupled with tight integration to VS Code for type suggestions, compile-time type checking makes it less error-prone as compared to plain old JS. Not to mention, I love to write my code on TS.

1. Backend

Voicera’s backend is built on NodeJS. The APIs are built on GraphQL, using Apollo Server to communicate with other services. I chose GraphQL over the regular RESTful APIs because I think it’s a cleaner approach to build APIs. There are fewer endpoints than REST if you’re building an ever-changing application. It’s more flexible and easier to change than REST. Also, it adds a type system on top of TS, which makes it hard to write error-prone code. Moreover, I love working with GraphQL. Sure, sometimes it can be tricky to figure some niche stuff out, but I think it’s worth it to learn and work on GraphQL in a long run. But, hey, that’s my choice. You’re free to disagree with this.

Voicera uses a customized AI voice solution using Google’ and Amazon’ TTS algorithms to convert text to speech, AWS S3 to store converted media files, which uses AWS CloudFront for higher availability of media files. It is deployed on AWS Lambda, connected to API Gateway to access GraphQL APIs. The reason I chose serverless because I don’t want to manage my servers. DevOps is a tiring job to do on top of building a SaaS app from scratch. I just didn’t want to do all that. So I used the Serverless framework’s templating tool, which converts serverless.yml to AWS CloudFormation template and creates your Lambda function and APIs with a single click/command. Kind of like Terraform, but for Serverless applications only. I love one-click DevOps. I mean, who doesn’t? There’s only one problem, the development feedback cycle on serverless functions is very slow, and that’s why there are way too many trial and errors involved to see what works. So, finding and fixing bugs is a bit complicated in a serverless environment because you can’t completely emulate the AWS cloud environment locally. However, one thing I suggest for a better experience on bug-fixing is to combine console.logs with AWS CloudWatch (AWS service to view logs and events on your server), which will help you narrow down the problems in your code. Overall, working in serverless is tricky but, once you get the hang of it, it’ll be a breeze.

Voicera uses PostgreSQL on AWS RDS for database operations. I chose RDS because, again, I don’t want more DevOps overhead than necessary. AWS manages all the platform updates and backups so that I can focus more on building great things. The DB is connected with Prisma ORM which helps in CRUD operations. I chose Prisma because of its tight integration with Postgres and GraphQL, strong type-safety (You can use Prisma generated types for your Typescript code). However, deploying Prisma on a Lambda function is tricky, as Lambdas have a limit of 50 MB compressed and 250 MB uncompressed files. Prisma uses multiple engines to manage database operations during development, which all measured at more than 250 MB alone. But, before deploying, you can remove most of the engines and use only the necessary ones for Lambda operation.

Serverless monitoring is done on Dashbird.

2. Dashboard

Voicera’s client dashboard is built on NextJS with Apollo Client to call backend’s GraphQL APIs. NextJS was an obvious choice for me because I have worked on NextJS more than I did on CRA. NextJS / Vercel (NextJS’ parent organization), has released many tools to simplify frontend development. One such thing is NextAuth.js, which enables you to add multiple authentication providers in your app with minimal boilerplate. NextJS also optimizes static files for production by default (when you use their core components). Developer’s experience while building a React app is also very good. I get a much faster response on code changes on Next as compared to CRA. Overall, it’s a highly recommended tool to have in your arsenal if you’re building a rather complex application.

Frontend styling is handled by TailwindCSS because of the control it provides on the styling of the components. You can write entire CSS styling without leaving your JSX. Now, that’s what you call an amazing developer experience.

One thing I can say for sure, there’s a mountain of difference between building a project for yourself and one for the world. You have to consistently think about UX and customer experience. How to make it better? How to optimize every little thing, including performance and design. I strongly recommend every new developer to at least once, build something for the world to use. The development cycle of building production-grade apps is very rewarding in terms of gained skills and experience, in the long run.

The client dashboard is deployed on AWS Amplify. A fully managed serverless service to deploy your frontend apps directly from your GitHub. It has a built-in CI/CD which enables you to deploy the production changes without running commands to the server. Additionally, it provides something called “Performance Mode” which optimizes your code for faster-hosting performance by keeping content cached at the content delivery network (CDN) edge for a longer interval.

3. Embed

This is what it looks like. You can’t redesign the default HTML audio player according to your needs so initially, I tried to use pre-built audio libraries from the internet (because as developers, we know you don’t build something that’s already been built. Exceptions exist, of course, like right now). All of them have missing features, bad design, or were quite heavy. Since this thing is going to be on a blogger’s website, it’s imperative to make it lightweight such that it doesn’t slow down their website.

I have built the current iteration from scratch using HTML’s Audio constructor using React. Building from scratch helped me to learn how to build media players in react, and after the development, the entire embed page size went from 100 KB+ for existing libraries to just 2.25 KB now. Styled using Tailwind.

4. Home Page

Now to the less complex stuff. Home Page is built on React as well and styled using Tailwind. Nothing exciting here to tell. Good luck running it on a decade-old version of your browser.

5. Blog

Voicera’s blog is built on NextJS - SSG mode. I used Vercel’s blog template to build this. Styled using Tailwind. Deployed on Amplify.

Others: DevOps

AWS has definitely created the most powerful cloud platform in the world. But with great power didn’t come great responsibility. It’s equally complicated. To do a simple thing like, for example, giving your VPC (private cloud inside AWS where your server sits), you have to go through like 10+ steps from different services to enable internet access so that your server can fetch data from the internet to work on. I mean, just look at this diagram.

Their services sure are complicated, but all hail their customer service. They’re the lifesavers of AWS’ ecosystem. They have saved me multiple hours of internet research on their different services and how to connect them for interoperability.

Final notes

Overall, it took me 12 days to build the platform. It was a great learning experience for me. A few of the tools and services I used to build this platform were my first time using it. But, being the curious learner that I am, I loved the experience of building and learning new things.

My advice for developers just starting on the development path is to** learn it by doing**. If you can’t figure out what to build, read this article I wrote on the exact same topic. If you find something exciting enough, execute it. Learn things the hard way. And, just my suggestion, but I believe it will be worth it in a long run. Try not to learn by building clones. There are already way too many out there. Build something original. Something you can be proud of in the future. Not to mention, since you are spending time, you might as well build something new rather than making yet another clone.