Day 15: Building an MVP — Backend Code

On Day 12, I worked with CofounderGPT to create a simple frontend structure and plan the backend architecture. This process worked well, so we planned to develop the backend in the next few days and then develop the frontend application and connect it. Today, we’ll implement the critical flow in our backend: uploading a CSV file and creating embeddings.
Preparing the Backend Application
I already have an idea of what we need to create and how, and which tools we want to use, but let’s see if CofounderGPT has a better idea.
Slobodan: In the context of the blog post above, what should I use to create that infrastructure on AWS using Node.js, TypeScript, AWS Lambda, API Gateway, S3, and DynamoDB?
The key idea is to create MVP as fast as possible but to keep the code testable and evolve the architecture over time.
CofounderGPT: To set up your infrastructure on AWS using Node.js, TypeScript, AWS Lambda, API Gateway, S3, and DynamoDB, you should use the AWS Serverless Application Model (AWS SAM), a framework for building serverless applications.
AWS SAM is a model to define serverless applications and is an extension of AWS CloudFormation, which is a service that helps you model and set up your Amazon Web Services resources so you can spend less time managing those resources and more time focusing on your applications that run in AWS.
Using AWS SAM, you define your serverless application’s resources using YAML or JSON, and then you can use AWS SAM CLI to create, test, package, and deploy your application. SAM CLI is a command-line interface that accelerates the development of serverless applications. It provides you with a local development environment that simulates the AWS runtime environment.
Here’s a simple example of how your AWS SAM template might look:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: An AWS Serverless Application.
Resources:
MyLambdaFunction:
Type: AWS::Serverless::Function
Properties:
Handler: src/handlers/myFunction.handler
Runtime: nodejs14.x
CodeUri: .
Events:
MyApi:
Type: Api
Properties:
Path: /path
Method: post
MyDynamoDBTable:
Type: AWS::Serverless::SimpleTable
Properties:
PrimaryKey:
Name: id
Type: String
MyBucket:
Type: AWS::S3::Bucket
This template creates a Lambda function, an API Gateway for the function, a DynamoDB table, and an S3 bucket.
For writing the code for your Lambda functions, you will use Node.js and TypeScript. To use TypeScript with AWS Lambda, you’ll need to transpile your TypeScript code to JavaScript as AWS Lambda natively supports JavaScript.
As for authentication with Amazon Cognito, AWS SDK provides a way to interact with the service in Node.js. You can use the Amazon Cognito Identity SDK for JavaScript on the client-side to authenticate users and on the server-side (AWS Lambda) to validate the tokens.
To ensure your code is testable, you should write unit tests for your business logic. AWS SAM CLI provides a local Lambda runtime and supports running your Lambda functions locally, which is perfect for testing. You can use testing frameworks like Jest or Mocha for writing tests in Node.js.
Finally, to evolve the architecture over time, follow best practices for serverless applications, like having single-purpose, stateless functions, using managed services wherever possible, and decoupling your components. This way, you can scale or modify each component independently as your application grows.
Remember to keep your functions small, focused, and do one thing well. This approach will help you maintain your application over time and allow you to evolve your architecture as your needs change.
Everything CofounderGPT suggested makes sense. We are using AWS SAM for our other project, Vacation Tracker, and it works great. However, writing YAML is not fun, and we want to make working on the Knowlo app fun while still keeping it serious. There’s another tool we could use where we actually write TypeScript instead of YAML: AWS Cloud Development Kit (AWS CDK). Let’s see what CofounderGPT thinks about it!
Slobodan: What about AWS CDK? It seems it’s a very popular tool recently. Do you know anything about it, and do you think we can use it instead of AWS SAM?
CofounderGPT: Yes, the AWS Cloud Development Kit (AWS CDK) is another excellent tool for defining cloud infrastructure in code and could be a great alternative to AWS SAM. The major difference is that AWS CDK allows you to define your cloud resources using familiar programming languages like TypeScript, JavaScript, Python, C# or Java, whereas AWS SAM uses YAML or JSON files.
AWS CDK allows you to leverage the power of modern programming languages to define your infrastructure, meaning you can use loops, conditionals, functions, and classes. This can lead to more concise, readable, and maintainable infrastructure code.
Just like AWS SAM, AWS CDK also has support for high-level abstractions called “constructs” that encapsulate AWS best practices. These constructs can represent a single resource, like an S3 bucket, or a higher-level component consisting of multiple AWS resources.
Here’s an example of how you might define a Lambda function and an API Gateway in AWS CDK using TypeScript:
import * as cdk from '@aws-cdk/core';
import * as lambda from '@aws-cdk/aws-lambda';
import * as apigw from '@aws-cdk/aws-apigateway';
export class MyStack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// Define the Lambda function
const myFunction = new lambda.Function(this, 'MyFunction', {
runtime: lambda.Runtime.NODEJS_14_X,
handler: 'myFunction.handler',
code: lambda.Code.fromAsset('lambda'),
});
// Define the API Gateway
const api = new apigw.RestApi(this, 'MyApi');
const getResource = api.root.addResource('myresource');
getResource.addMethod('GET', new apigw.LambdaIntegration(myFunction));
}
}
You would also need to install the necessary CDK packages for this code to work, such as @aws-cdk/aws-lambda
and @aws-cdk/aws-apigateway
.
AWS CDK also includes a CLI tool for deploying your applications, and it integrates with AWS CloudFormation behind the scenes to manage deployments.
As for testing, AWS CDK includes a testing library that allows you to write unit tests for your infrastructure code.
To summarize, if your team is more comfortable with TypeScript (or another supported language) and prefers to define infrastructure in a general-purpose programming language, AWS CDK could be an excellent choice. It allows for greater flexibility and more complex infrastructure setups compared to AWS SAM, and you can still use all the services you mentioned: Lambda, API Gateway, S3, DynamoDB, and Cognito.
Slobodan: Are there any disadvantages of using AWS CDK over AWS SAM or similar tools?
CofounderGPT: While the AWS CDK provides many advantages, it also comes with certain disadvantages, especially when compared to AWS SAM or other Infrastructure as Code (IaC) tools. Here are some potential downsides:
- Learning Curve: The AWS CDK has a steeper learning curve compared to AWS SAM or AWS CloudFormation. Developers need to understand how to work with AWS CDK constructs and the CDK CLI. They also need to understand the underlying resources that the CDK is creating, which requires a good understanding of AWS services.
- Maturity and Support: As of my knowledge cutoff in September 2021, AWS SAM and CloudFormation have been around longer than AWS CDK and therefore have more established communities, more examples and tutorials, and more mature tooling around them. However, AWS CDK is gaining traction and community support is growing.
- Potential for Complexity: Because AWS CDK allows you to use general-purpose programming languages, there’s a risk of creating overly complex infrastructure code. With AWS SAM or CloudFormation, the declarative nature of YAML or JSON can sometimes lead to simpler and more straightforward infrastructure code.
- Fewer Languages Supported: While AWS CDK does support multiple popular programming languages, AWS SAM and CloudFormation’s use of JSON or YAML means developers can use any language that can output JSON or YAML. Therefore, SAM and CloudFormation can be more language-agnostic.
- Integration with CI/CD: AWS SAM has a tighter integration with AWS CodePipeline and AWS CodeBuild, which are AWS’s continuous integration and continuous delivery (CI/CD) services. Although AWS CDK can certainly be used with these services, setup might require more configuration.
- Portability: As with CloudFormation, AWS CDK code is specific to AWS and doesn’t translate to other cloud providers. If you think you might want to move to or support another cloud provider in the future, a more platform-agnostic tool like Terraform might be a better choice.
Remember, the right tool will depend on your specific use case, team skills, and project requirements. Both AWS CDK and AWS SAM are powerful tools for building serverless applications in AWS.
Again, everything CofounderGPT said makes sense. But, while there are some disadvantages of AWS CDK, I can’t entirely agree with all the points from the disadvantages listed above.
For example, I don’t think AWS CDK has a steeper learning curve. You’ll eventually need to learn AWS CloudFormation (AWS’ infrastructure-as-code service) because AWS SAM and AWS CDK use CloudFormation under the hood. However, for people unfamiliar with AWS CloudFormation, CDK is easier to start and work with. Also, I know many large and serious projects using AWS CDK, so it’s safe to assume it is not mature enough and well-supported.
Some of these disadvantages probably come from the CofounderGPT training data set. It’s evident that CofounderGPT doesn’t know about many recent events. For example, it uses the CDK code version 1, and the new major version (version 2) is a default version form more than six months and has some breaking changes. Also, CofounderGPT uses AWS Lambda with Node.js version 14, and the most recent supported version is 18 now.
I think that the main disadvantage of CDK is its main advantage at the same time: it’s an abstraction over AWS CloudFormation. The abstraction works nicely as long as the CloudFormation generates during deployment, but it can be painful if you hit an edge case bug in CloudFormation. Another a bit annoying thing is the process of updating the CDK version every week.
But these disadvantages are not significant compared to CDK benefits, so let’s build Knowlo using AWS CDK!
Creating a CDK Stack
Before we begin, let’s prepare the monorepo to look similar to the following image.

I’ll create a new folder for the backend application and update the package.json file in the root to have the following content:
{
"name": "knowlo",
"version": "1.0.0",
"description": "Monorepo for Knowlo app",
"private": true,
"workspaces": [
"frontend",
"backend"
]
}
There are tools that can make a monorepo more powerful, but NPM’s monorepo functionality will be enough for Knowlo, at least for this stage of our project.
The next step is to initialize and bootstrap the CDK project. I already prepared the AWS account for Knowlo and set up credentials. After we finish the MVP version, we’ll set the AWS SSO, or AWS IAM Identity Center (successor to AWS Single Sign-On), as they call the SSO service now, for Knowlo because it simplifies the flow and makes our app more secure.
I’ll create the “backend” repository and run the following command to initialize the new CDK project: npx cdk init sample-app –language typescript. This command downloads the latest version of AWS CDK from NPM in a temporary folder and runs the init command with it. The init command will give us a sample CDK app with TypeScript support built-in. The command will output the result similar to the following code snippet:
Applying project template app for typescript
Initializing a new git repository...
Executing npm install...
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN tst@0.1.0 No repository field.
npm WARN tst@0.1.0 No license field.
# Welcome to your CDK TypeScript project!
You should explore the contents of this project. It demonstrates a CDK app with an instance of a stack (`CdkWorkshopStack`)
which contains an Amazon SQS queue that is subscribed to an Amazon SNS topic.
The `cdk.json` file tells the CDK Toolkit how to execute your app.
## Useful commands
* `npm run build` compile typescript to js
* `npm run watch` watch for changes and compile
* `npm run test` perform the jest unit tests
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk synth` emits the synthesized CloudFormation template
It’ll also create a few files and folders for us. The most important one is the lib/backend-stack.ts, which represents the starting point for our CDK stack. I renamed all generated “backend” resources to “knowlo-backend.” The new name is still not perfect, but it’s slightly more precise.
The core of our CDK stack file (“backend/lib/knowlo-backend-stack.ts”) looks like this:
import { Construct } from 'constructs'
import { CfnOutput, CfnParameter, Duration, RemovalPolicy, Stack, StackProps } from 'aws-cdk-lib'
import * as s3 from 'aws-cdk-lib/aws-s3'
import * as apiGateway from 'aws-cdk-lib/aws-apigateway'
import * as lambda from 'aws-cdk-lib/aws-lambda-nodejs'
import { Runtime } from 'aws-cdk-lib/aws-lambda'
import * as ssm from 'aws-cdk-lib/aws-ssm'
import { RetentionDays } from 'aws-cdk-lib/aws-logs'
import { S3EventSource } from 'aws-cdk-lib/aws-lambda-event-sources'
export class KnowloBackendStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props)
// We'll add AWS resources here!
}
}
I already imported a few things we’ll need at the top of the file because, at the moment of writing this article, CofounderGPT and I already wrote this stack. Let’s go over the AWS resources we added one by one.
First, we want to be able to pass some parameters to our infrastructure. For example, we want to tell AWS CDK if our app is in production or development mode, and if what do we want to see in the logs. These parameters are not secrets, so we’ll use CloudFormation Parameters to pass them. The following code snippet defines these two CloudFormation parameters:
const environmentParameter = new CfnParameter(this, 'Environment', {
default: 'develop',
allowedValues: ['develop', 'production'],
description: 'The application environment'
})
const logLevelParameter = new CfnParameter(this, 'LogLevel', {
default: 'INFO',
allowedValues: ['DEBUG', 'INFO', 'WARN', 'ERROR', 'SILENT'],
description: 'A log level for Lambda functions'
})
Besides non-secret parameters, we have some secret parameters, such as the OpenAI API key. While there are many ways to handle secrets in AWS, we’ll use the simplest (but not the most secure one) for our MVP: AWS SSM Parameter Store:
const openAiApiKey = new ssm.StringParameter(this, 'OpenAiApiKey', {
description: 'The OpenAI API Key',
stringValue: 'change-me'
})
Then, we’ll create an S3 bucket for CSV files and embeddings:
const csvBucket = new s3.Bucket(this, 'CsvBucket', {
encryption: s3.BucketEncryption.KMS_MANAGED,
removalPolicy: environmentParameter.valueAsString === 'production' ? RemovalPolicy.RETAIN : RemovalPolicy.DESTROY,
autoDeleteObjects: environmentParameter.valueAsString === 'develop' ? true : false,
accessControl: s3.BucketAccessControl.PRIVATE,
})
We’ll keep it simple at the moment. The key parts here are that our bucket is private (not publicly accessible) and that we’ll remove it in non-production environments if the CDK stack is deleted but keep it in production.
After the S3 bucket, we’ll create an API. We’ll keep it as simple as possible and update it later if needed.
const api = new apiGateway.RestApi(this, 'AppApi', {})
api.root.addCorsPreflight({
allowOrigins: ['*'],
allowMethods: ['OPTIONS', 'GET', 'POST'],
allowHeaders: ['x-apigateway-header', 'Authorization', 'Content-Type', 'x-amz-date', 'x-amz-security-token'],
maxAge: Duration.seconds(600),
})
We’ll keep it simple at the moment. The key parts here are that our bucket is private (not publicly accessible) and that we’ll remove it in non-production environments if the CDK stack is deleted but keep it in production.
After the S3 bucket, we’ll create an API. We’ll keep it as simple as possible and update it later if needed.
Besides the API, we’ll need two Lambda functions: one to create a signed S3 URL that will be used to upload the CSV file and one to process the CSV file and create embeddings.
Here’s the first one; we’ll call it “GetUploadUrlFunction.”
const getUploadUrlFunction = new lambda.NodejsFunction(this, 'GetUploadUrlFunction', {
entry: './lib/functions/get-upload-url/lambda.ts',
handler: 'handler',
runtime: Runtime.NODEJS_18_X,
timeout: Duration.seconds(30),
environment: {
LOG_LEVEL: logLevelParameter.valueAsString,
CSV_BUCKET: csvBucket.bucketName,
NODE_OPTIONS: '--enable-source-maps',
},
logRetention: environmentParameter.valueAsString === 'production' ? RetentionDays.INFINITE : RetentionDays.ONE_WEEK,
bundling: {
sourceMap: true,
}
})
csvBucket.grantPut(getUploadUrlFunction, 'uploads/*')
csvBucket.grantPut(getUploadUrlFunction, 'status/*')
const getUploadUrlLambdaIntegration = new apiGateway.LambdaIntegration(getUploadUrlFunction, {
proxy: true,
})
const getUploadUrlApiResource = api.root.addResource('upload')
getUploadUrlApiResource.addMethod('POST', getUploadUrlLambdaIntegration, {})
And here’s the second one; let’s call it “ProcessCsvFileFunction.”
const processCsvFileFunction = new lambda.NodejsFunction(this, 'ProcessCsvFileFunction', {
entry: './lib/functions/process-csv-file/lambda.ts',
handler: 'handler',
runtime: Runtime.NODEJS_18_X,
timeout: Duration.minutes(15),
memorySize: 1024,
environment: {
LOG_LEVEL: logLevelParameter.valueAsString,
CSV_BUCKET: csvBucket.bucketName,
OPEN_AI_API_KEY: openAiApiKey.stringValue,
NODE_OPTIONS: '--enable-source-maps',
},
logRetention: environmentParameter.valueAsString === 'production' ? RetentionDays.INFINITE : RetentionDays.ONE_WEEK,
bundling: {
sourceMap: true,
}
})
csvBucket.grantRead(processCsvFileFunction, 'uploads/*')
csvBucket.grantWrite(processCsvFileFunction, 'embeddings/*')
csvBucket.grantWrite(processCsvFileFunction, 'status/*')
processCsvFileFunction.addEventSource(new S3EventSource(csvBucket, {
events: [ s3.EventType.OBJECT_CREATED ],
filters: [{
prefix: 'uploads/',
suffix: '/file.csv',
}]
}))
Finally, we’ll output the API URL using the CloudFormation Output:
new CfnOutput(this, 'ApiEndpoint', {
value: api.url,
})
As I already mentioned, this code is not perfect, but it’s good enough for our MVP. Some parts of the code will be changed later this week because we need to add more API routes, authentication, database tables, and many other things. We’ll do that later, but before that, let’s write code for our Lambda functions.
Writing Lambda Functions
There’s no “the right” way to write a Lambda function. If you ask 100 people to see their functions, you’ll most likely have 100 at least slightly different ways they wrote their functions. That’s one of the things I like the most about serverless – do whatever works for you. If many other people do something similar, there’s a big chance that your thing will become a best practice, even if that thing was considered the opposite of best practice at some point in the past.
I like using Hexagonal Architecture for my Lambda functions. Hexagonal Architecture, also known as the Ports and Adapters pattern, is a design principle that advocates for the separation of an application’s core logic from the services it uses, thereby making it independent of external factors and facilitating interchangeability and isolation for testing. It emphasizes clear boundaries and the ability for the application to be driven by either user interfaces, programmatic interfaces, or test automation.
My Lambda functions with Hexagonal Architecture look similar to the following image:

This architecture pattern is powerful, because it allows you to test your business logic in isolation of the AWS services it uses, similar to the following image.

Let’s see this in practice. As mentioned, we’ll create two Lambda functions. But before we start, here’s the backend project folder structure that we’ll end up with at the end of Day 15:
.
' README.md
' bin
? ' knowlo-backend.ts
' cdk.json
' jest.config.js
' lib
? ' common
? ? ' api-gateway-response-generator.ts
? ? ' errors
? ? ' file-upload.ts
? ? ' generate-error.ts
? ? ' generic.ts
? ? ' types.ts
? ' functions
? ? ' get-upload-url
? ? ? ' lambda.ts
? ? ? ' src
? ? ? ? ' main.ts
? ? ? ? ' parser.ts
? ? ? ' types.ts
? ? ' process-csv-file
? ? ' lambda.ts
? ? ' src
? ? ? ' main.ts
? ? ? ' parser.ts
? ? ' types.ts
? ' knowlo-backend-stack.ts
? ' repositories
? ' file-repository.ts
? ' open-ai-repository.ts
' package.json
' test
? ' backend.test.ts
' tsconfig.json
Let’s see this in practice. As mentioned, we’ll create two Lambda functions. But before we start, here’s the backend project folder structure that we’ll end up with at the end of Day 15:
Let’s first see the GetUploadUrlFunction Lambda function (“backend/lib/functions/get-upload-url”). It contains the following files:
- The “lambda.ts” file simply wires the dependencies
- The “src/main.ts” file holds the business logic
- The “src/parser.ts” file parses the API gateway request
- The “types.ts” file defines TypeScript types specific to this Lambda function
Going into all details would take too much of my time at the moment. Instead of doing that, I’ll paste all important files below with some basic comments. You can see more details about this architecture here.
The “lambda.ts” file:
import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda'
import { Logger } from '@aws-lambda-powertools/logger';
import { httpResponse } from '../../common/api-gateway-response-generator'
import { parseApiEvent } from './src/parser'
import { getUploadUrl } from './src/main'
import { FileRepository } from '../../repositories/file-repository';
import { LogLevel } from '@aws-lambda-powertools/logger/lib/types';
import { generateError } from '../../common/errors/generate-error';
import { ConfigurationError } from '../../common/errors/generic';
export async function handler(event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> {
// Create an instance of the Lambda Power Tools Logger
const logger = new Logger({
serviceName: 'GetUploadUrlFunction',
logLevel: process.env.LOG_LEVEL as LogLevel || 'WARN',
})
try {
// Validation
const bucket = process.env.CSV_BUCKET
if (!bucket) {
throw generateError(ConfigurationError)
}
// Initialize repositories
const fileRepository = new FileRepository(bucket)
// Run the business logic
const { statusUrl, requestId, uploadUrl } = await getUploadUrl<APIGatewayProxyEvent>({
event,
parser: parseApiEvent,
repositories: {
fileRepository,
},
bucket: bucket,
region: process.env.AWS_REGION || '',
logger,
})
// Return the response
return httpResponse({
body: {
statusUrl,
requestId,
uploadUrl
},
})
} catch (err) {
// Log an error
logger.error('Handler error', err as Error)
// Return an error
return httpResponse({
statusCode: 400,
body: err as Error,
})
}
}
The “lib/main.ts” file:
import { randomUUID } from 'crypto'
import { PresignedUrlFailed, UnsupportedMimeTypeException } from '../../../common/errors/file-upload'
import { generateError } from '../../../common/errors/generate-error'
import { IGetUploadUrlParams, IGetUploadUrlresponse } from '../types'
// Helper function that should probably go to a separate file
function getFileExtension(mimeType: string): string {
switch (mimeType) {
case 'text/csv':
return 'csv'
default:
throw generateError(UnsupportedMimeTypeException)
}
}
// The main business logic
// We use TypeScript generics because we want to be able to test it
// in isolation and without mocking AWS event
export async function getUploadUrl<T>({
event, parser, repositories, logger
}: IGetUploadUrlParams<T>): Promise<IGetUploadUrlresponse> {
try {
// Get the file extension
const { mimeType } = parser(event)
const extension = getFileExtension(mimeType)
// Generate the UUID
const requestId = randomUUID()
// Prepare file paths (in S3 but our business logic don't care about the actual service)
const uploadFileName = `uploads/${requestId}/file.${extension}`
const statusFileName = `status/${requestId}.json`
// Create the file using fileRepository
await repositories.fileRepository.createFile(statusFileName, JSON.stringify({
requestId,
status: 'IN_PROGRESS',
}))
// Generate presinged URLs for uploading a file and checking the status
const uploadUrl = await repositories.fileRepository.getPresignedUploadUrl(uploadFileName)
const statusUrl = await repositories.fileRepository.getPresignedUploadUrl(statusFileName)
// Return the result
return {
uploadUrl,
statusUrl,
requestId,
}
} catch (error) {
logger.error('getUploadUrl error', error as Error)
throw generateError(PresignedUrlFailed)
}
}
The “parser.ts” file:
import { APIGatewayProxyEvent } from 'aws-lambda'
import { IParsedEvent } from '../types'
export function parseApiEvent(event: APIGatewayProxyEvent): IParsedEvent {
// Parse the API Gateway proxy event
const mimeType = event.queryStringParameters?.type || 'text/csv'
// Return the value
return {
mimeType,
}
}
The ProcessCsvFileFunction Lambda function (“backend/lib/functions/process-csv-file”) is similar, but I’ll show the most important details below.
The “lambda.ts” file:
import { S3Event } from 'aws-lambda'
import { Logger } from '@aws-lambda-powertools/logger'
import { LogLevel } from '@aws-lambda-powertools/logger/lib/types/Log'
import { FileRepository } from '../../repositories/file-repository';
import { generateError } from '../../common/errors/generate-error';
import { ConfigurationError } from '../../common/errors/generic';
import { processCsvFile } from './src/main'
import { parseS3Event } from './src/parser';
import { OpenAiRepository } from '../../repositories/open-ai-repository';
export async function handler(event: S3Event): Promise<void> {
const logger = new Logger({
serviceName: 'GetUploadUrlFunction',
logLevel: process.env.LOG_LEVEL as LogLevel || 'WARN',
})
const bucket = process.env.CSV_BUCKET
if (!bucket) {
throw generateError(ConfigurationError)
}
const openAiApiKey = process.env.OPEN_AI_API_KEY
if (!openAiApiKey) {
throw generateError(ConfigurationError)
}
// We have 2 repositories now
const fileRepository = new FileRepository(bucket, logger)
const aiRepository = new OpenAiRepository(openAiApiKey, logger)
await processCsvFile<S3Event>({
event,
parser: parseS3Event,
logger,
repositories: {
fileRepository,
aiRepository,
},
bucket,
region: process.env.AWS_REGION || '',
})
logger.info(JSON.stringify(event, null, 2))
}
The “main.ts” file:
import { PresignedUrlFailed } from '../../../common/errors/file-upload'
import { generateError } from '../../../common/errors/generate-error'
import { ICsvRecords, IGetUploadUrlParams } from '../types'
import { parse } from 'csv-parse/sync'
// Helper function
function getUUID(input: string): string | null {
const uuidRegex = /uploads\/([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\/file\.csv/;
const match = input.match(uuidRegex);
return match ? match[1] : null;
}
// Business logic
export async function processCsvFile<T>({
event, parser, repositories, logger
}: IGetUploadUrlParams<T>): Promise<void> {
try {
const files = parser(event)
// There can be multiple files, we'll do the same for each of them
await Promise.all(files.map(async file => {
// Get a file
repositories.fileRepository.setBucket(file.bucket)
const csvFile = await repositories.fileRepository.getFile(file.filePath)
logger.debug('CSV File', csvFile)
// Parse CSV and filter out unpublished Helpdesk articles,
// this part will change as it's too specific for Crisp chat at the moment
const records: ICsvRecords[] = parse(csvFile, {
columns: true,
skip_empty_lines: true,
})
const publishedRecords = records.filter(rec => rec['published_at'] !== '')
// Create embeddings for each Helpdesk article
const output = await Promise.all(
publishedRecords.map(async rec => {
const embedding = await repositories.aiRepository.createEmbeddings(JSON.stringify(rec))
return {
...rec,
embedding,
}
})
)
logger.debug('Embeddings output', JSON.stringify(output, null, 2))
const requestId = getUUID(file.filePath)
logger.info(`Request ID: ${requestId}`)
if (requestId) {
// Save embeddings to the S3 bucket
const embeddingsFilePath = `embeddings/${requestId}/embeddings.json`
const statusFilePath = `status/${requestId}.json`
logger.info(`Embeddings URL: ${embeddingsFilePath}`)
await repositories.fileRepository.createFile(embeddingsFilePath, JSON.stringify(output, null, 2))
// And update the request status
await repositories.fileRepository.createFile(statusFilePath, JSON.stringify({
requestId,
status: 'DONE',
}))
}
}))
} catch (error) {
logger.error('getUploadUrl error', error as Error)
throw generateError(PresignedUrlFailed)
}
}
The last important pieces of the puzzle are the repositories. Below is the code for both of them.
The S3 File Repository (“backend/lib/repositories/file-repository.ts”):
import { Logger } from '@aws-lambda-powertools/logger'
import { S3Client, PutObjectCommand, PutObjectCommandOutput, GetObjectCommand, GetObjectCommandOutput } from '@aws-sdk/client-s3'
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'
import { generateError } from '../common/errors/generate-error'
import { FileNotFound } from '../common/errors/file-upload'
export class FileRepository {
bucket: string
region: string
s3Client: S3Client
logger: Logger
constructor(bucket: string, logger?: Logger) {
this.bucket = bucket
this.region = process.env.AWS_REGION || 'eu-central-1'
this.s3Client = new S3Client({})
this.logger = logger || new Logger()
}
setBucket(bucketName: string): void {
this.bucket = bucketName
}
async getPresignedUploadUrl(fileName: string, expiresIn: number = 900): Promise<string> {
const command = new PutObjectCommand({
Bucket: this.bucket,
Key: fileName,
ACL: 'private',
})
return getSignedUrl(this.s3Client, command, { expiresIn })
}
async getPresignedReadUrl(fileName: string, expiresIn: number = 900): Promise<string> {
const command = new GetObjectCommand({
Bucket: this.bucket,
Key: fileName,
})
return getSignedUrl(this.s3Client, command, { expiresIn })
}
async createFile(fileName: string, fileContent: string | Buffer): Promise<void> {
const command = new PutObjectCommand({
Bucket: this.bucket,
Key: fileName,
ACL: 'private',
Body: fileContent,
})
const response = await this.s3Client.send(command)
this.logger.debug('FileRepository.createFile > S3 PutObjectCommand Output', JSON.stringify(response))
}
async getFile(fileName: string): Promise<string> {
const command = new GetObjectCommand({
Bucket: this.bucket,
Key: fileName,
})
const response = await this.s3Client.send(command)
if (!response.Body) {
throw generateError(FileNotFound)
}
return await response.Body.transformToString()
}
}
The OpenAI Repository (“backend/lib/repositories/open-ai-repository.ts”):
import { Logger } from '@aws-lambda-powertools/logger'
export class OpenAiRepository {
apiKey: string
logger: Logger
constructor(apiKey: string, logger?: Logger) {
this.apiKey = apiKey
this.logger = logger || new Logger()
}
async createEmbeddings(input: string, model = 'text-embedding-ada-002'): Promise<any> {
try {
const rawResponse = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
Authorization: `Bearer ${this.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input,
model
})
})
const response = await rawResponse.json()
this.logger.debug('OpenAI usage', response.usage)
return response
} catch (err) {
this.logger.error('OpenAiRepository.createEmbeddings > Error', err as Error)
throw err
}
}
}
These repositories are talking directly to Amazon S3 and OpenAI API, so they know the context.
After creating an MVP version, we’ll go back and add some proper tests for the functions and repositories. Here’s what we want to test in this setup (the image below uses a generic repository to explain the point):

Deploying the Application
Now that the application is ready, we can deploy it. But before doing that, we want to install the esbuild and CDK as dev dependencies. After installing these dependencies, our Backend package.json file looks like this:
{
"name": "backend",
"version": "0.1.0",
"bin": {
"knowlo-backend": "bin/knowlo-backend.js"
},
"scripts": {
"build": "tsc",
"watch": "tsc -w",
"test": "jest",
"cdk": "cdk"
},
"devDependencies": {
"@types/jest": "^29.4.0",
"@types/node": "18.14.6",
"aws-cdk": "2.78.0",
"esbuild": "^0.17.18",
"jest": "^29.5.0",
"ts-jest": "^29.0.5",
"ts-node": "^10.9.1",
"typescript": "~4.9.5"
},
"dependencies": {
"@aws-lambda-powertools/logger": "^1.8.0",
"@aws-sdk/client-s3": "^3.331.0",
"@aws-sdk/s3-request-presigner": "^3.331.0",
"@middy/core": "^4.4.1",
"@types/aws-lambda": "^8.10.115",
"@types/csv-parse": "^1.2.2",
"aws-cdk-lib": "2.78.0",
"aws-lambda": "^1.0.7",
"constructs": "^10.0.0",
"csv-parse": "^5.3.10"
}
}
Now we can simply run the following command to deploy the application:
npm run cdk deploy -- --parameters LogLevel=DEBUG
If everything goes well, the output will look similar to the following:
? KnowloBackendStack (no changes)
? Deployment time: 2.89s
Outputs:
KnowloBackendStack.ApiEndpoint = https://srxxxxxxr0.execute-api.eu-west-1.amazonaws.com/prod/
KnowloBackendStack.AppApiEndpoint979256A8 = https://srxxxxxxr0.execute-api.eu-west-1.amazonaws.com/prod/
Stack ARN:
arn:aws:cloudformation:eu-west-1:xxxxxxxxxxxx:stack/KnowloBackendStack/cbxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx8b
? Total time: 6.58s
We can use the API Endpoint from the output to test the API. CofounderGPT and I already did it, and it works!
Scoreboard
Time spent today: 8h
Total time spent: 99h
Investment today: $0
Total investment: $913.54 USD
Paying customers: 0
Revenue: $0
What’s Next
CofounderGPT and I will continue writing the backend application. Our next goal is to add Amazon Cognito for authentication and Amazon DynamoDB as a database.
Comments are closed